The Decatur Makers Community Lab was excited to welcome Ms. Rutland’s biology students from The New School to the space for a hands-on DNA Barcoding Lab. Using real-world molecular biology techniques, these students extracted and analyzed DNA from invertebrate specimens in order to confirm their identities down to the species level. At the completion of the two sessions, students left with new skills and a new understanding about how DNA techniques can be used to better understand the world around us. If you want to learn more about DNA Barcoding, see the students results, and learn how you can use the Community Lab, keep on reading.
What is DNA Barcoding?
Let’s say you were to step outside and look around. You might see lots of different types of organisms. Maybe some are familiar, but what happens if you spot something interesting but you have no idea what it is? Traditionally, we could consult a field guide or taxonomic key to help us attempt to identify the specimen. But there are still some problems. Scientists estimate that there exist over 8.7 million different species on earth, yet only about 1.25 million have been described and cataloged.1 That field guide you’re referencing is only going to have the most common species listed. And even your specimen of interest is a common species, often multiple species can look so similar that even experts have trouble telling them apart. For example Astraptes fulgerator is a common type of skipper butterfly that was first described in 1775. Not until 2004 did scientists learn what was thought to be a single species of butterfly was actually a species complex consisting of at least 10 different species.2 With this much complexity, how can non-experts ever hope to get a handle on species identification?
Enter DNA Barcoding. Like how a barcode found on a box of cereal allows point-of-sale systems to automatically identify a product, DNA barcodes use a short section of DNA to uniquely identify organisms to the species level. This technique was first proposed by a scientist named Paul Hebert in his landmark paper Biological Identification through DNA Barcodes.3 In the paper, Hebert described the use of short, highly variable regions of DNA, that he called barcodes, as a means to identify species. And since it’s publication, the protocol for DNA barcoding has been robustly developed by a team of scientists out of Cold Spring Harbor Laboratory (CSH).
The Makings of a Good DNA Barcode
As great as the idea of a DNA barcode sounds, determining which short sequence of DNA makes a good barcode can be challenging. DNA barcodes need to have the following properties in order to be useful:
- Barcodes need to be short, standardized gene regions. In order to make the process of barcoding efficient and cost effective, barcodes must be short enough that they can be sequenced in a single read, yet long enough that they provide enough information to make an identification.
- They must be flanked by conserved genomic regions. We want to be able to use the same primer sequences to amplify our barcodes across as many organisms within the same group as possible. In order to do this, we must chose conserved, or nearly identical, genomic regions in before and after our barcode sequence.
- Barcodes must have low intra-species variation. In order for a barcode to be of any use, it needs to show virtually no variation between organisms in the same species.
- There must be discontinuous variation between species. Again, in order to a barcode to be effective it must also not show up in multiple species. If it did show up in multiple species, which would be an example of continuous variation, we wouldn’t be able to say for sure what our specimen is.
- And lastly, barcodes must be easy to amplify. That is to say, we can isolate and amplify our barcode with a simple PCR reaction.
Luckily for us, scientists have already determined good regions that meet all this criteria for the different groups we might be interested in (minus bacteria and archea). In plants, a good barcode is a portion of the rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit) gene found in the chloroplast which encodes for a protein that plays a key role in the first step of carbon fixation. In animals, a portion of the COI (cytochrome c oxidase subunit I) gene is typically used. COI is found in the mitochondria and helps with cellular respiration. Genes like these are essential for life and are often referred to as housekeeping genes. For fungi and lichens, a good barcode choice is a region of the ITS (Internal Transcribed Spacer) since the COI gene isn’t varied enough and non-existent in some fungi. The ITS is a region that surrounds the ribosomal RNA gene and is highly variable.4 While this is not a complete list of all possible barcodes, it should give you an idea of how and why barcodes sequences are chosen.
How to Barcode
While DNA barcoding is a relatively straightforward process, it is a great activity as it teaches many relevant molecular biology techniques. The detailed protocol from CSH can be found here.
Step 1: Collect, Document, and Identify Specimens
Our first step in DNA barcoding is to find something to barcode. There are many ways to go about this, but whatever way you choose you want to make sure you’re consistent during the entire experiment. A good way to collect specimens is following practices used by ecologists. A good resource of various sampling methods can be found here. Just remember, use common sense and stay safe when collecting specimens.
After you have collected your specimens, you will need to document them including information on when and where they were found. It’s also a good idea to snap a picture. And lastly, you should make an attempt at identifying your specimen with classical methods, like using a field guide as mentioned earlier. While our goal is identification with DNA, often times you can make a pretty good qualitative identification and use the DNA barcoding as confirmation.
Step 2: Isolate DNA
We want to analyze DNA, and so we need to get it out of the cells of our specimen. There are several methods for DNA extraction and you can view the DNA barcoding protocol for details on the methods available for this lab. You can also check out my blog post on DNA Necklaces for details of how we can get DNA out of cells.
The extraction of DNA can be inhibited by several factors, including how the specimen was preserved and which part of the specimen was used. An easy way to check the success of your DNA extraction is run a simple quality control gel and see if you have any genomic DNA before spending time and resources on further processing of your extract. Simply run a 1% agarose gel with a small amount of your DNA extract and look for heavy bands that will indicate genomic DNA is present.
Step 3: PCR
Once you have some extracted DNA, you need to isolate and amplify the barcode. To do this, we use a technique called PCR or polymerase chain reaction. PCR is an important molecular biology technique used to create lots of copies of certain regions of DNA– think of it as a molecular photocopier. PCR will use primer sequences, short pieces of DNA that match genomic regions ahead of and behind our barcode, to isolate just the region we’re interested in. These primers are the reason we mentioned earlier that our barcode needed to be flanked by conserved genomic regions. After PCR we should be left with lots of copies of our barcode that can then be sequenced. Here’s a video that I think explains the process better than if I described it in text.
Step 4: Gel Electrophoresis
But before we send our samples off to a lab for sequencing, it’s a good idea to double check everything went according to plan. That’s where gel electrophoresis comes in. Gel electrophoresis, often referred to as a running a gel, is a technique used to separate molecules by molecular weight. In terms of DNA, this means that we can separate pieces of DNA based on how long they are (measured in base-pairs or bp). Gel electrophoresis is pretty simple, but for a good explanation, check out the video below.
As mentioned above, we’re using a gel here to check that everything worked out as we were expecting. In the case of this lab, we want to ensure that just the barcode was amplified. Because we know what genomic region we’re targeting and what primers we are using, we have an idea of how long out PCR product should be. So when we look at our gel results, we should expect to see a bright band right around 700 bp. If we see this band, we know we have amplified our barcode and we’re ready to move to the next step. If we don’t see a band where we expect one or at all, then it’s time to revisit our DNA extraction and PCR to see if we can determine what went wrong.
Step 5: DNA Sequencing
We’ve successfully extracted DNA and confirmed that our barcode was amplified. Now it’s time to read the barcode. And to do that we will be using a DNA sequencing technique called Sanger Sequencing. This method, developed in 1977 by Frederick Sanger, is still widely used in research today. It would easily double the length of this post to write out how this technology works, so instead here’s a video explanation:
Step 6: Analyze Sequencing Results
We’ve finally reached the last step of our DNA barcoding journey and it’s now time to analyze our sequencing results. Our results will come back as a text file containing the sequence as determined by Sanger Sequencing. So what do we do now? Typically if we want to look up some information, we probably will just fire up the web browser, navigate to our favorite search engine and type in whatever we’re looking for. But typing a 700 bp DNA sequence into Google won’t yield anything useful. Google stores a lot of information, but matching pieces of DNA sequences are not what Google was designed for. Luckily, researchers at the National Institute for Biotechnology Information or NCBI have built lots of publicly accessible databases for everything from genomics to literature. And these databases are just what we need!
But searching for DNA sequences isn’t as straightforward as it might appear. Let’s consider some of the factors that can influence a sequence search. First, we are often searching for a very small part of a very large genome. We need to make sure we have an efficient way to search for matches within all genomes in the databases– and there’s a whole lot. As of December 2018 there are over 980 million sequence records in the GenBank database.5 Second, there are lots of genomic regions that are conserved between species and taxonomic groups. The search algorithm needs to be sensitive enough so that it can pick up the small differences in our query and give us correct results. Third, we need a search that can handle species specific modifications, like mutations, and still give us appropriate results. And finally, we need to know with what certainty our sequence is matched with a record.
Well the good folks over at NCBI have a solution to all those considerations. The Basic Local Alignment Search Tool (BLAST) is a special search tool that takes a DNA sequence that we provide and searches it against all records NCBI’s databases returning records of similar sequences. BLAST is an alignment tool meaning that it attempts to line up the query sequence with database records. As it aligns, it takes into account where base pairs don’t match and where there may be insertions (extra bases) or deletions (missing bases). Sequences are given quality scores based on how well they align and the highest scoring matches are reported along with the associated statistics so you can know how confident you are in your result.
By putting our Sanger Sequencing results into BLAST, we can find records that point the species of our specimen. From there, we can share our results with others or do more analysis and study of our identified species.
DIYbio at Decatur Makers
DNA barcoding is just one of the many exciting activities taking place at the Decatur Makers Community Lab. If you’re an educator and want to collaborate with us, drop us a line at info[at]decaturmakers.org. We love welcoming classes into the space or we can come to you. If you’re a community member wanting to learn more about DIYbio and participate in some of our upcoming family-friendly events, head over to the Decatur Makers’ Meetup page for a complete listing of all upcoming events. Also, you can join the discussion of all things DIYbio on the DIYbioATL Facebook group.
Lastly, before we go, I want to extend a special thanks to Dr. Christine Marizzi and Cold Spring Harbor Labs for the donations of materials to make this event possible. Without their gracious support we wouldn’t have been able share this exciting bit of science with eager students.
- Mora C, Tittensor DP, Adl S, Simpson AGB, Worm B (2011) How Many Species Are There on Earth and in the Ocean? PLoS Biol 9(8): e1001127. https://doi.org/10.1371/journal.pbio.1001127
Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator
- Hebert PDN, Cywinska A, Ball SL, DeWaard JR. (2003) Biological identifications through DNA barcodes. Proceedings of the Royal Society B: Biological Sciences 270(1512):313-321.