Volume 49, Issue 3 p. 416-421
ARTICLE
Free Access

DNA barcoding: A different perspective to introducing undergraduate students to DNA sequence analysis

Daniel J. Erasmus

Corresponding Author

Daniel J. Erasmus

Department of Chemistry and Biochemistry, University of Northern British Columbia, Prince George, Canada

Correspondence

Daniel J. Erasmus, Department of Chemistry and Biochemistry, University of Northern British Columbia, Prince George, BC V2N 4Z9, Canada.

Email: [email protected]

Search for more papers by this author
First published: 02 February 2021
Citations: 6

Abstract

Education in biochemistry teaching laboratories focus primarily on applying biochemical techniques to understanding human disease, biochemistry, and biotechnology. With anthropogenic climate change, there is a renewed interest in quantifying biodiversity, especially with the use of molecular-based approaches such as DNA barcoding. This 3-week laboratory exercise allowed undergraduate students to explore DNA sequencing, analysis, and DNA barcoding. Students extracted DNA from insect legs and amplified a 650 bp section of Cytochrome C oxidase I gene by PCR, and confirmed the success of their PCR by DNA gel electrophoresis. The PCR products were submitted for sequencing and students analyzed the sequences using FinchTV, Genbank, and the Barcode of Life Database. Based on the DNA sequences of their PCR products students were able to identify the species of insects. This lab exercise provides a different context to introducing students to analyzing DNA sequences and using DNA databases.

1 INTRODUCTION

DNA barcoding has become a more prominent approach in the understanding of biodiversity and identification of species as it has increased the accuracy and rate of taxonomic classification.1-3 DNA barcoding classifies organisms based on their DNA sequences.3, 4 Like the universal product code (UPC), also known as a barcode, for consumer products that is unique to each product, certain sections of DNA sequences are also unique to a species. Animals, including insects, are identified using Cytochrome C Oxidase I (COI) gene. Plants are identified using the rbcL and matK genes, and fungi are identified using the ITS region.4 In the case of insects, a 648 base pair fragment of the COI gene is used to delineate between species.5 Usually a 2% difference in sequence of the 648 bp COI DNA fragment for the insect order Plecoptera (stoneflies) is sufficient to indicate a separate species.5

DNA barcoding has made the identification of many invertebrates much easier, especially insects in the absence of an entomologist. In addition, there are still many undescribed species and DNA barcoding is greatly improving the process of identifying, recording, and classifying these organisms.6 To classify these new DNA sequences interim taxonomic nomenclature is used. The DNA barcoding process relies on organizing DNA sequences from specimens into operational taxonomic units (OTUs).7 Each OTU corresponds to a separate species or putative new species.7 These OTUs are generated using barcode index numbers (BINs) that take into account existing taxonomy (geographic location, previous morphological taxonomy) and DNA barcodes.6 The analyses of these DNA sequences for DNA barcoding is facilitated by databases such as the Barcode of Life Database (BOLD).4

Although next generation sequencing (NGS) platforms are heavily used in DNA barcoding, automated Sanger-based sequencing is still being used as it is more affordable and allow for small-scale studies.8 For the typical undergraduate teaching lab the cost of NGS based sequencing is probably still out of reach, whereas Sanger-based sequencing has been used in teaching laboratories.9, 10

The DNA barcode setting provides a different perspective on the application of biochemical techniques, and an opportunity to introduce undergraduate students to DNA sequencing, DNA sequence analysis, and working with DNA databases.

2 MATERIALS AND METHODS

2.1 Lab design

This laboratory exercise is part of a third year biochemistry lab-based course required for the Biochemistry and Molecular Biology major at the University of Northern British Columbia, Canada. This course consists of a 1 h lecture and two three-hour laboratories per week in a 13-week semester. This lab exercise requires three lab periods and one lecture time. Prior to this laboratory exercise, students performed PCR and gel electrophoresis, therefore the 1 h lecture was dedicated to lecturing on automated Sanger-based sequencing, NGS, DNA sequence file formats (plain sequence format, GCG, FASTA, EMBL format, Genbank format), and DNA databases.

Day 1: DNA extraction, PCR of COI gene.

Day 2: Electrophoresis, DNA quantification, and DNA sequencing.

Day 3: DNA sequence analysis using FinchTV, BOLD, and Genbank.

Each student performed all the experiments individually, except that several students analyzed PCR amplicons for DNA gel electrophoresis on a single gel.

2.2 DNA extraction from stoneflies

Each student was provided with a stonefly nymph (order: Plecoptera) specimen that was preserved in 95% (vol/vol) ethanol. Stoneflies were obtained from nearby streams and rivers in the spring by using a kick net or by turning over rocks along the shore. After capture, stonefly nymphs were placed immediately in 95% ethanol and stored at –20 C as soon as possible. Specimens can be preserved for years in this form.

To extract DNA, a specimen was placed in new weight boat, followed by cutting off a leg with a scalpel blade. The femur was cut in half and then a small piece of tissue (the size of a period) was cut from one of the exposed ends of the femur (Figure 1). The tissue was transferred using the scalpel blade to a 1.5 ml microfuge tube containing 300 ul Chelex resin solution (10% [wt/vol] Chelex-100 resin {Biorad, cat#: 1421253}, 0.1% [vol/vol] Tween20 {Biorad, cat#: 1706531}, 0.1 ug/ul Proteinase K {Thermofisher, cat#: AM2546}). To assist with the lysis of the tissue the tubes was vortexed in Chelex slurry for 15 s and then pulse centrifuged briefly (10 s) at 13000 rpm. The tissue samples were then incubated for 45 min at 50°C in a heating block, followed by inactivating Proteinase K at 95°C for 15 min. After incubation, the samples were vortexed again for 15 s and centrifuged at 13000 rpm for 1 min to ensure Chelex resin form a pellet at the bottom of the tube. From the upper part of the supernatant, 100 ul was transferred to a new microfuge tube. Care should be taken to not transfer some of the Chelex with the supernatant as the Chelex will inhibit the PCR reaction. The saved supernatant was used as template for the PCR.

Details are in the caption following the image
A stonefly larva used for DNA extraction. The arrow points to the tissue that is used for DNA extraction

2.3 PCR of COI gene

To amplify the COI gene fragment by PCR the primer pair: the LCO1490 (5’ GGTCAACAAATCATAAAGATATTGG 3′) and HCO2198 (5’ TAAACTTCAGGGTGACCA AAAAATCA 3′) was used.11 The thermal cycling included an initial denaturation at 94°C for 180 s and then 35 cycles of 94°C for 30 s, 60°C 30 s, and 72°C for 60 s. Final extension was for 10 min at 72°C. The PCR was performed using the TopTaq Master mix kit {Qiagen, cat#: 200403}. Success of the PCR was confirmed by performing a 0.9% (wt/vol) agarose gel electrophoresis using 1 x TBE buffer for 45 min at 120 V. PCR products was purified by using QIAquick PCR Purification Kit {Qiagen, cat#: 28104}. Students quantified their purified PCR products by using two microliters of PCR product on a Nanodrop. Automated Sanger sequencing was performed at the UNBC Genetics facility using 50 ng of PCR product and LCO1490 primer at 50 μM using an Applied Biosystems 3130XL.

2.4 DNA sequence analysis

The UNBC Genetics facility provided sequence files as ABI format chromatogram files. These files were uploaded to Blackboard for students to access. Students used FinchTV (https://digitalworldbiology.com/FinchTV) to inspect and evaluate the electrophoretogram of their DNA sequence. FinchTV is an open source application that is easy to use, and allows the user to do basic editing of a DNA sequence file.

Using FinchTV, students inspected their sequences to determine if they have a sufficiently long DNA sequence to use for further analysis. We were aiming for at least 400 bp, but ideally, DNA sequences of 650 bp provide the most reliable results. Students also inspected the DNA sequence to see that the peaks for each nucleotide is clear and that there were no peaks called “N.” Students can change the call to the appropriate nucleotide if there was a clear discernible signal. They were also prompted to save the edited and original DNA sequence files separately.

After editing, the DNA sequences students were asked to open their web browser and go to http://www.barcodinglife.org/. Students were asked to select the “Identification” tab in the Identification Engine, and in the case of stoneflies also the “Animal Identification” tab and “Species Level Barcode.” In the space provided, students pasted the edited DNA sequence from FinchTV.

For instructors: it should be noted that BOLD only accepts FASTA sequences in the forward direction. To test this, students generated the reverse compliment of their DNA sequence in FinchTV and pasted it into BOLD. The results of the reverse compliment produced “no species identification.”

Students also performed a sequence alignment using the Basic Local Alignment Tool (BLAST) in Genbank https://www.ncbi.nlm.nih.gov/genbank/. Students compared the results for species identification between the two databases. Most often the results were consistent between the two databases, but in some instances, discrepancies may arise as BOLD has been populated with many DNA sequences that are not yet in Genbank. Students can BLAST search both the forward and reverse complement sequences in Genbank and should note that both generate the same response.

A set of questions were provided to guide students in their analyses while using Genbank:
  1. Which species' DNA sequences were returned in the BLAST search?
  2. What percentage of the query sequence was used in the alignment (Query cover)? Why is this important?
  3. What percent similarity exists between the query sequence and the sequences being returned (Ident)?

To explore the importance of the total score and expectation-value (E-value) in Genbank, students entered fragments of 20, 50, and 200 bases from their DNA sequence. Students were required to record the changes in these values and explain why the total score and e-value changed.

3 RESULTS

In lab one student prepared their DNA template and performed the PCR. As stonefly nymphs have six legs up to six students can cut a leg off a single insect to extract DNA. Great care must be taken by students when preparing the tissue sample for DNA extraction. The challenge is to cut a piece of tissue that is small and not too big. Typically, the tissue should be the size of a size 12 font period (Figure 1). It is also important that students do not transfer any of the Chelex into the PCR reaction as the Chelex will inhibit the reaction. Chelex chelates polyvalent metal ions. To prevent the chelation of Mg2+, students pulsed the microfuge tubes to form a Chelex resin pellet at the bottom of the tube. In addition, students transferred 100 μl of the upper part of the supernatant to a new microfuge tube. This supernatant is used as template during the PCR.

During lab 2 students confirmed the success of their PCR reactions by using 10% of the PCR reaction's volume in DNA gel electrophoresis (Figure 2). Students need to confirm that they have a successful amplification of a 650 bp fragment of the COI gene and that there are no additional DNA bands visible. PCR products were then cleaned-up with a PCR purification kit and quantified.

Details are in the caption following the image
Agarose gel electrophoresis of the PCR amplifications of a section of the COI gene. L- DNA size marker, lanes 1–7 corresponds to the PCR reactions performed by seven different students. COI, cytochrome C oxidase I

DNA samples and the LCO 1490 primer were submitted to our in-house genetics facility for sequencing. The results are usually available within 24 h in electronic file format (see Supplemental material), and uploaded by the instructor to our online teaching system (Blackboard) for students to download (Supplemental material 2 and 3). Additional DNA sequences generated by the instructor and/or previous students are also made available for DNA sequence analysis during lab 3 (Supplemental material 4, 5, and 6).

In lab 3 students perform an in silico analysis of their DNA sequences. FinchTV is available to our students through our student desktops on campus. As it is open source software students can also download it to their own personal laptops. Students first evaluate the quality of the stonefly COI gene sequence in FinchTV. A typical useable sequences file contains a DNA sequence of around 400 nucleotides or more, ideally greater than 600 nucleotides. Students typically find that approximately the first 30 nucleotides will be called N as the signal is not clear or too string at the beginning. As the nucleotide signals become more readable students evaluate the rest of the sequence to see if there are any Ns that can be called either A, T, G, or C. After the sequence was evaluated and edited, students use this sequence to identify the species in BOLD and Genbank.

4 DISCUSSION

4.1 Developing DNA analysis skills

This laboratory exercise was developed with two learning objectives in mind: students generate DNA sequences that they analyze themselves, and students developing experience working with DNA databases.

The ability to analyze DNA sequences is an essential skill for majors in biochemistry and molecular biology. This lab exercise facilitates the development of these skills. In our course, students already had exposure to PCR and electrophoresis, and therefore provided the opportunity to further practice their lab skills. However, this exercise can easily be used as an introduction to PCR and DNA as well.

Students work both with an universal nucleic acid database and a database dedicated to DNA barcoding. Genbank provides a wider set of tools to use when working with DNA sequences. BOLD specializes in the identification of species, phylogenetics, and biodiversity. Since we were doing DNA barcoding we could have limited the lab to BOLD only, but it was important to expose students to Genbank as it is the major DNA sequence database. It is important for students to learn about databases as they are moving up into the upper division of undergraduate education in biochemistry and molecular biology.

There are many DNA databases in existence and they all are setup slightly differently to fulfill different purposes. Genbank is used by the larger scientific community as an universal database. Whereas organism specific databases such as yeastgenome.org (Saccharomyces cerevisiae) and flybase.org (Drosphophila melanogaster) exist to serve those species specific communities. BOLD is used as DNA barcoding database to identify and classify organisms.

The different databases often use different methods of entering information and analysis of the information. For example, the BOLD database only uses the forward sequence (from the forward primer) and not the reverse sequence (rom the reverse primer). If you enter the reverse sequence a “Unable to match any records in the selected database” result is provided. As an example I provided students the reverse complement of the Salmon fly (Pteronarcys californica) (Supplemental data material 5), which provides a “Unable to match any records in the selected database.” But when you take the reverse complement of the reverse sequence you generate the forward sequence, which does generate a result. When performing the same exercise in Genbank the correct results are generated for both forward and reverse sequences. However, if the main goal is to perform DNA barcoding Genbank is not as user friendly and useful as BOLD. BOLD provides an easy and quick way to setup phylogenetic trees and as it is populated with more verified COI DNA sequences than Genbank.

After students entered their sequences they generated a genus and species name of their insect. Students also explored other features of the BOLD, such as generating a phylogenetic tree that placed their specimen sequence in a BIN with sequences from the database.

To analyze DNA in Genbank, students' had to answer a series of questions on the metrics generated when doing a BLAST search [see attached worksheet]. In addition to evaluating the DNA sequence alignments and percent similarity, students explored the value and meaning of the E-value and Total Score, by entering 20, 50, and 200 bases of their DNA sequence into BLAST search. Naturally, E-values decreased and total score values increased with longer DNA sequences. This provided an illustration on the importance of sequence length. In addition to their own sequences, three more sequences were provided to the students (Supplemental data sequences 4, 5, and 6). During the in silico DNA analysis laboratory component a discussion was also facilitated on the importance of Maximum Score, Total Score, Query cover, and Accession Number. It is important to address these metrics as too many students only focus on the percentage similarity without understanding the meaning of the “other numbers.”

4.2 Providing a different context

Previous PCR-based and DNA sequencing labs have explored the application of biochemical techniques in human biology and food sciences fields.10, 12 This laboratory exercise also aimed to provide a different context in how biochemical techniques are applied to address a wider range of topics such as DNA barcoding and biodiversity.

The use of insects as starting material is not typical of most biochemistry and molecular biology laboratories. Nearly all the students in this course are biochemistry majors, most of them have not dealt with insects at university level, and if they have, usually in first year biology courses. The specimens chosen are large stoneflies that are anywhere from 3 to 5 centimeters in length and were collected from nearby rivers. We used stonefly nymphs as the instructor has a lot of experience working with stoneflies in DNA barcoding. The lab can easily be adapted for other insects as well. We have used the LCO and HCO primer pair for other insect orders such as Trichoptera (caddisflies) and Ephemeroptera (mayflies) successfully,13 both aquatic insects. If there is a challenge in sourcing aquatic insects, terrestrial insects can also be used such as Lepidoptera (moths and butterflies),5, 14 Hemiptera (true bugs),15 and Musca domestica (house flies).16 The use of insects intrigued the students and it stimulated a lot of discussion within the class. The level of conversations varied from being academic to the expression of surprise to learn that such large insects live in rivers. This provided an informal teaching moment on aquatic systems within the teaching laboratory and how biochemistry techniques can be applied to fields beyond what is typically associated with the biochemistry and molecular biology.

This laboratory exercise is easy to do and students enjoyed analyzing DNA sequences they generated themselves. They also enjoyed working with different materials such as insects that they are not familiar with. They commented that this was satisfying—hopefully leading to greater student engagement. The DNA barcoding scenario provides an excellent environment for discovery in a teaching lab and teaches students how to work with DNA sequences.