PCR amplification
PCR amplification introduces biases, such as primer biases and errors, such as nucleotide substitutions and chimeras (e.g. Polz & Cavanaugh 1998; Haas et al. 2011; Murray et al.2015; Piñol et al. 2015). Two of the three main metabarcoding strategies allow practitioners to carry out only a single PCR step before sequencing, namely the one-step PCR with fusion primers approach and the tagged metabarcoding PCR approach in which PCR-free library building is carried out (Fig. 2B and D). Because an extra PCR step adds an additional risk of introducing errors, these two approaches offer an advantage over the two-step PCR method and the tagged PCR approach in which the workflow includes an index PCR step (Fig. 2C and D).
Apart from minimizing the number of PCR steps, the 5’ nucleotide additions to metabarcoding primers should be considered. Bulk sample and eDNA extracts consist of complex mixtures of DNA from a large number of organisms, which in the case of eDNA can be degraded (Taberlet et al.2012a). With such DNA extracts, the primers are faced with the task of amplifying (trace) copy number target DNA from different taxa (Taberlet et al.2012b) potentially distorted by primer biases, inhibitors and potentially abundant predator or host DNA (e.g. Deagle et al. 2014; Clarke et al. 2014b; Murray et al. 2015). To add to this, nucleotide additions to primers can decrease PCR efficiency (Schnell et al.2015; Murray et al. 2015).
The three main metabarcoding strategies have different lengths of nucleotide additions on the 5’-end of metabarcoding primers. The longest 5ʹ-nucleotide additions are found in the one-step PCR approach where up to 60 nucleotides (sequence adapters and indices) are added to one or both of the primers, making the complete primer often over 80 bp long (e.g. Elbrecht & Leese 2015). In the two-step PCR approach (Fig. 2C), the sequence overhangs on the metabarcoding primers used in the first PCR are approximately half the length of the fusion primers, e.g. 33-34 nucleotides, if using Illumina® Nextera Indices. The tagged PCR approach has the shortest nucleotide additions to the metabarcoding primers (Fig. 2D) with tags of typically 5-10 nucleotides in length (Coissac 2012; De Barba et al. 2014; e.g. Alberdi et al. 2018). The long additions to the metabarcoding primers cause a decrease in PCR efficiency (Murray et al. 2015) and in line with this, the two-step PCR approach has been shown to have a marginal increase in detection of taxa as compared to the one-step fusion primer approach (Zizka et al. 2019). Even the short nucleotide additions in the tagged PCR approach have been shown to decrease PCR efficiency (Schnell et al.2015). Thus, no method is free of decreased PCR efficiency caused by the nucleotide additions to 5’-end of metabarcoding primers. However, it has to our knowledge not been formally tested whether - and to what extent - the shorter nucleotide tag additions in the tagged PCR approach offers greater PCR efficiency and taxonomic detection than the two other approaches, and thereby it can only be speculated that it is the most sensitive when it comes to detection of taxa in low abundance amongst the three main approaches. Regardless of metabarcoding strategy, we stress the importance of optimising PCR amplifications (usually by qPCR) to detect PCR inhibition, identify samples with low template quantity and track PCR efficiency issues (Murray et al.2015; Yang et al. 2021).
Theoretically, the reduced PCR efficiency in the one-step and two-step PCR approaches caused by the long overhangs on primers might be counteracted by spiking the PCRs with metabarcoding primers without any 5ʹ attachments (e.g. Murrayet al. 2015). However, this has been shown to have modest PCR efficiency improvements for the one-step approach (e.g. Murrayet al. 2015). Alternatively, a pre-enrichment before the metabarcoding PCR can be carried out, i.e. running a PCR with metabarcoding primers (with no nucleotide additions) prior to the metabarcoding PCR as done in Zizka et al. (2019) and Elbrecht & Steinke (2018) for the one-step PCR approach. However, this not only introduces another PCR amplification step, but can increase the risk of cross-contamination between PCR products due to the initial unlabelled PCR amplification step (e.g. Murray et al. 2015).
Apart from the length of the nucleotide additions, it has been investigated whether differences in nucleotide tag sequences can result in biases in the tagged PCR approach. Although one study shows that such tag bias is an issue (O’Donnell et al.2016), other studies show that it is not (Leray & Knowlton 2017; Yang et al. 2021). If tag bias does exist, it should theoretically be minimised if different tags are used on each sample’s PCR replicates.
Chimeras & tag-jumpsChimeras can be formed during all PCR steps in any metabarcoding workflow (Fig. 2B-D). Chimeras are sequences that consist of two or more different template sequences, and the majority are thought to result from incomplete primer extension during the elongation phase of the PCR cycle (Meyerhanset al. 1990; Wang & Wang 1997; Judo et al. 1998; Shinet al. 2014). The probability of chimera formation increases when similar template sequences are amplified in the same PCR reaction (e.g. Judo et al. 1998; Smyth et al. 2010, but see also Fonsecaet al. 2012), such as during the metabarcoding PCR (Fig. 2B-D) or during the index PCR-amplification of pools of tagged amplicons (Fig. 2D). There are different consequences of chimeric sequences depending on where they arise. If they are created during a PCR-amplification of a single sample’s DNA extract, the chimeras will be intra-sample chimeras, which can be falsely interpreted as novel taxa and erroneously inflate measures of diversity. If, on the other hand, chimeras are created during a PCR-amplification of pooled tagged amplicons, such as in the tagged PCR approach (Fig. 2D), the chimeras may be inter-sample chimeras, which can result in tag-jumps and false attribution of amplicon sequences to samples (Schnell et al.2015). This can also lead to false positives and inflation of diversity.
All metabarcoding approaches are prone to intra-sample chimeras. However, as chimera formation increases when similar sequences are amplified in the same PCR reaction (e.g. Judo et al. 1998; Smyth et al. 2010), the use of metabarcoding primers with long 5’ overhangs, as in the one-step and two-step approaches, might be more prone to chimera formation since they carry long and similar sequences at the 5ʹ end of the primers. However, this hypothesis requires testing. Intra-sample chimeras can be reduced by limiting the number of PCR cycles (Haas et al. 2011). Also, if samples are subjected to multiple, independent PCRs, chimeras can be filtered out by keeping only sequences that occur in multiple PCR replicates, the ‘restrictive approach- described in Alberdi et al, (2018). Chimera detection programmes such as UCHIME (Edgar et al. 2011) can be used for further clean-up.
Inter-sample chimeras can cause havoc in metabarcoding studies. They can only occur in the tagged PCR approach where library build is carried out on pooled tagged amplicons from different samples (Fig. 2D). Here, tag-jumps can create sequences with new combinations of the nucleotide tags used in the amplicon pool (Schnell et al.2015). If the new combinations of tags are already used in the amplicon pool, it will cause false assignment of sequences to samples, which should be avoided at all costs (Schnell et al.2015; Esling et al. 2015). Such tag-jumps can also have the consequence that negative controls are seemingly not negative following bioinformatic sorting of sequences to samples. It should be noted that tag-jumps can also occur due to T4 DNA Polymerase activity in the blunt-ending step during library preparation, as demonstrated in library building for the Roche/454 sequencing platform (van Orsouw et al. 2007; Palkopoulou et al. 2016) and for the Illumina sequencing platform (Carøe & Bohmann 2020). The rate of tag-jumping has been estimated from ca. 2% to up to 49% of total sequences (Schnell et al. 2015; Esling et al. 2015; Carøe & Bohmann 2020). This broad range can be caused by factors affecting inter-sample chimera formation during the index PCR. For example, DNA template and primer concentration, PCR cycle number, and sequence similarity (e.g. Judo et al. 1998; Smyth et al. 2010; Carøe & Bohmann 2020). The range of tag-jump proportions highlights the unreliability of including an index PCR step in the tagged PCR approach.
To avoid tag-jumps in the tagged PCR approach, and thereby prevent false assignment of sequences to samples, it is important to refine index PCR parameters to decrease the likelihood of chimera formation - or better yet, to omit the index PCR step (Fig. 2D). Further, blunt-ending using T4 DNA Polymerase should be circumvented during library preparation (Schnell et al. 2015; Palkopoulou et al. 2016; Carøe & Bohmann 2020). If both T4 DNA Polymerase blunt-ending and index PCR are eliminated during library preparation of pools of tagged amplicons, tag-jumps can practically be eliminated (Carøe & Bohmann 2020).
If the library preparation protocol contains a T4 DNA blunt-ending step and/or an index PCR step, and thereby can be assumed to generate tag-jumps, they can be detected and removed by using ‘twin-tags’ during the original PCRs (e.g. F1-R1, F2-R2,…), because tag-jumped sequences would then produce non-twinned tag combinations not used in the set-up (e.g. F1-R2, F2-R3,…) (e.g. Schnell et al. 2015; Yang et al. 2021). However, using twin tags comes at the price of buying many more versions of tagged primers and building more libraries (Schnell et al.2015). If twin tags are not used, chimera removal software can remove some chimeric sequences carrying false combinations of used tags (Schnell et al.2015).
The extent of tag-jumping and spillover of taxa between samples can be detected through inclusion of positive controls consisting of synthetic oligos or taxa not expected to occur in the dataset. However, note that such controls do not enable confident elimination of false positives caused by tag-jumps. The extent of tag-jumping can also be assessed by comparing all observed combinations of used tags to all originally used tag combinations (Schnell et al.2015; Zepeda Mendoza et al. 2016).
Misassignment of library indicesIncorrect assignment of indices between pooled libraries can cause sequence reads to be incorrectly assigned to libraries. Misassigned indices have been attributed to the formation of mixed clusters on the sequencing flow cell, i.e. clusters originating from two different template molecules or clusters growing into each other, to low levels of free index primers present in the sequence library and to bulk amplification of pooled libraries (Nelsonet al. 2014; Sinha et al. 2017; Vodak et al. 2018; Costello et al. 2018; Valk et al. 2019). Regardless of how index misassignment occurs, if it occurs in metabarcoding studies it can cause incorrect assignment of amplicon sequences to libraries, which can cause incorrect assignment of sequences to samples and false positives. This phenomenon can affect all three metabarcoding approaches (Fig. 2). To avoid index misassignment it is recommended to dual-index libraries with unique library index combinations (Kircher et al.2012; Sinha et al. 2017),www.illumina.com). Further, stringent bead purification (or size selection) can remove free adapters/primers from the libraries (Owens et al. 2018). The labelling in the different metabarcoding approaches further allows for accounting for potential incorrect assignment of sequences to libraries. In the tagged PCR approach, unique tagging of PCR replicates across all pooled libraries can be used to account for (and detect) index misassignment. However, this can be costly. In the one-step PCR with fusion primers approach, a tweaked protocol where nucleotide tags are used instead of i7 and i5 of library indices (e.g. Elbrecht & Steinke 2018) creates one single library that is thereby free of index misassignment. As with tag-jumping, the extent of incorrect assignment of indices and spillover of taxa between samples can be detected through inclusion of positive controls consisting of taxa not expected to occur in the data set and by comparing all observed to all used combinations of used indices when demultiplexing libraries.
It is important not to mistake tag-jumping, index misassignment or cross-contamination between PCR products with cross-contamination of the primers themselves. Due to the high concentration of primers upon synthesis, cross-contamination (e.g. by aerosols) can manifest itself as low numbers of sequence reads and could be misinterpreted as tag-jumps or index-bleeding. Due to the risk of primer cross-contamination, some laboratories avoid ordering primers in 96-well plates. There are anecdotal reports that primer contamination can also occur at primer synthesis (or purification). As mentioned, the risk of cross-contamination between nucleotide tagged primer stocks and indexed primer stocks, which could e.g. occur during resuspension of primers, will generally be the same no matter which of the three overall metabarcoding approaches is used. In the first PCR step in the two-step PCR approach, the primers are unlabelled and any cross-contamination that might occur will not have consequences.
Cost Metabarcoding primers in the tagged and one-step PCR approaches have to be labelled with either nucleotide tags or indices, whereas the metabarcoding primers in the two-step approach are generally not individually labelled. Due to the different labelling systems in the three primary metabarcoding approaches, there are different costs associated with them.
The fusion primers for the one-step PCR approach are the most expensive metabarcoding primers amongst the three approaches. This is (i) because differently indexed versions are purchased for each metabarcoding primer set and (ii) because the increased oligo length results in lower yield of the full length product. If unique matching indices are used to account for index misassignment, one-step PCR can become increasingly expensive for larger scale studies. However, this needs to be factored against the potential cost of repeating runs due to artifacts and contamination, and the fact that only a single PCR step is needed to go from sample extract to library.
In the tagged PCR approach (Fig. 2D), the metabarcoding primers are relatively inexpensive compared to the one-step PCR fusion primers as they only add 5’ tags of 5-10 nucleotides in length. However, as with the one-step PCR approach, these need to be purchased in many tagged editions for each metabarcoding primer set. Furthermore, if tag-jumping is to be taken into account by only using each tag once in a library amplicon pool, e.g. by only amplifying with twin forward and reverse tags, then metabarcoding primer sets have to be ordered in many differently labelled editions (Schnell et al. 2015). To keep costs down, this needs to be balanced by pooling fewer PCR products into each library and thereby creating more sequence libraries (Fig. 2D). However, if a library preparation protocol is used that does not create tag-jumps, tags can be freely combined, which lowers the number of tagged primers that must be purchased (Schnell et al.2015; Carøe & Bohmann 2020). In contrast to the other two metabarcoding approaches, the tagged PCR approach includes library preparation on pools of amplicons, and the cost of this therefore has to be taken into account. This can however be kept low if a protocol that does not generate tag-jumps is used and only a few libraries have to be made.
If a large number of metabarcoding primer sets are used, the two-step approach offers a relatively inexpensive solution. In the two-step PCR approach, the metabarcoding primers are generally synthesized with 5’ tails containing no tags or indices. This means that the same primer set can be used across multiple samples and projects. This has the benefit that trying out new metabarcoding primer sets does not entail buying many labelled versions of the metabarcoding primer sets, as it does in the other metabarcoding approaches (Fig. 2B-D). However, the second primer set in the two-step PCR approach is costly as it has to include both the sequence complementary to the sequence overhang, the sequence adapters and the library indices (Fig. 2C). It is worth noting that, just as with the one-step PCR approach, many labelled index primers will have to be purchased if twin dual-indices are used to account for incorrect assignment of indices to libraries. This second primer set is, however, applicable across different metabarcoding primer sets and can thereby be used across many metabarcoding studies.
Laboratory workload The one-step PCR approach is without doubt the quickest method for generating sequence-ready libraries, as it only requires a single PCR-step to achieve both amplification and library preparation of the metabarcoding amplicons (Fig. 2B), and it has been used in the field to rapidly turn-around sequence data. The workload for the two-step PCR approach and the tagged PCR approach depends, to some extent, on how many sample extracts and PCR replicates are to be processed. If it is a relatively high number, the tagged PCR approach is the quickest due to the library build being performed on pooled amplicons rather than through a PCR step on individual PCR products. However, as with all molecular biological workflows, carefully organised liquid handling and automation provide solutions to high-throughput studies.