PCR amplification
PCR amplification introduces biases, such as primer biases and errors,
such as nucleotide substitutions and chimeras
(e.g.
Polz & Cavanaugh 1998; Haas et al. 2011; Murray et al.2015; Piñol et al. 2015). Two of the three main metabarcoding
strategies allow practitioners to carry out only a single PCR step
before sequencing, namely the one-step PCR with fusion primers approach
and the tagged metabarcoding PCR approach in which PCR-free library
building is carried out (Fig. 2B and D). Because an extra PCR step adds
an additional risk of introducing errors, these two approaches offer an
advantage over the two-step PCR method and the tagged PCR approach in
which the workflow includes an index PCR step (Fig. 2C and D).
Apart from minimizing the number of PCR steps, the 5’ nucleotide
additions to metabarcoding primers should be considered. Bulk sample and
eDNA extracts consist of complex mixtures of DNA from a large number of
organisms, which in the case of eDNA can be degraded
(Taberlet et al.2012a). With such DNA extracts, the primers are faced with the task of
amplifying (trace) copy number target DNA from different taxa
(Taberlet et al.2012b) potentially distorted by primer biases, inhibitors and
potentially abundant predator or host DNA
(e.g.
Deagle et al. 2014; Clarke et al. 2014b; Murray et
al. 2015). To add to this, nucleotide additions to primers can
decrease PCR efficiency
(Schnell et al.2015; Murray et al. 2015).
The three main metabarcoding strategies have different lengths of
nucleotide additions on the 5’-end of metabarcoding primers. The longest
5ʹ-nucleotide additions are found in the one-step PCR approach where up
to 60 nucleotides (sequence adapters and indices) are added to one or
both of the primers, making the complete primer often over 80 bp long
(e.g. Elbrecht
& Leese 2015). In the two-step PCR approach (Fig. 2C), the sequence
overhangs on the metabarcoding primers used in the first PCR are
approximately half the length of the fusion primers, e.g. 33-34
nucleotides, if using Illumina® Nextera Indices. The tagged PCR approach
has the shortest nucleotide additions to the metabarcoding primers (Fig.
2D) with tags of typically 5-10 nucleotides in length
(Coissac
2012; De Barba et al. 2014; e.g. Alberdi et al. 2018).
The long additions to the metabarcoding primers cause a decrease in PCR
efficiency (Murray et
al. 2015) and in line with this, the two-step PCR approach has been
shown to have a marginal increase in detection of taxa as compared to
the one-step fusion primer approach
(Zizka et al. 2019).
Even the short nucleotide additions in the tagged PCR approach have been
shown to decrease PCR efficiency
(Schnell et al.2015). Thus, no method is free of decreased PCR efficiency caused by
the nucleotide additions to 5’-end of metabarcoding primers. However, it
has to our knowledge not been formally tested whether - and to what
extent - the shorter nucleotide tag additions in the tagged PCR approach
offers greater PCR efficiency and taxonomic detection than the two other
approaches, and thereby it can only be speculated that it is the most
sensitive when it comes to detection of taxa in low abundance amongst
the three main approaches. Regardless of metabarcoding strategy, we
stress the importance of optimising PCR amplifications (usually by qPCR)
to detect PCR inhibition, identify samples with low template quantity
and track PCR efficiency issues
(Murray et al.2015; Yang et al. 2021).
Theoretically, the reduced PCR efficiency in the one-step and two-step
PCR approaches caused by the long overhangs on primers might be
counteracted by spiking the PCRs with metabarcoding primers without any
5ʹ attachments
(e.g. Murrayet al. 2015). However, this has been shown to have modest PCR
efficiency improvements for the one-step approach
(e.g. Murrayet al. 2015). Alternatively, a pre-enrichment before the
metabarcoding PCR can be carried out, i.e. running a PCR with
metabarcoding primers (with no nucleotide additions) prior to the
metabarcoding PCR as done in Zizka et al.
(2019) and
Elbrecht & Steinke
(2018) for the
one-step PCR approach. However, this not only introduces another PCR
amplification step, but can increase the risk of cross-contamination
between PCR products due to the initial unlabelled PCR amplification
step (e.g.
Murray et al. 2015).
Apart from the length of the nucleotide additions, it has been
investigated whether differences in nucleotide tag sequences can result
in biases in the tagged PCR approach. Although one study shows that such
tag bias is an issue
(O’Donnell et al.2016), other studies show that it is not
(Leray & Knowlton
2017; Yang et al. 2021). If tag bias does exist, it should
theoretically be minimised if different tags are used on each sample’s
PCR replicates.
Chimeras & tag-jumpsChimeras can be formed during all PCR steps in any metabarcoding
workflow (Fig. 2B-D). Chimeras are sequences that consist of two or more
different template sequences, and the majority are thought to result
from incomplete primer extension during the elongation phase of the PCR
cycle
(Meyerhanset al. 1990; Wang & Wang 1997; Judo et al. 1998; Shinet al. 2014). The probability of chimera formation increases
when similar template sequences are amplified in the same PCR reaction
(e.g.
Judo et al. 1998; Smyth et al. 2010, but see also Fonsecaet al. 2012), such as during the metabarcoding PCR (Fig. 2B-D)
or during the index PCR-amplification of pools of tagged amplicons (Fig.
2D). There are different consequences of chimeric sequences depending on
where they arise. If they are created during a PCR-amplification of a
single sample’s DNA extract, the chimeras will be intra-sample chimeras,
which can be falsely interpreted as novel taxa and erroneously inflate
measures of diversity. If, on the other hand, chimeras are created
during a PCR-amplification of pooled tagged amplicons, such as in the
tagged PCR approach (Fig. 2D), the chimeras may be inter-sample
chimeras, which can result in tag-jumps and false attribution of
amplicon sequences to samples
(Schnell et al.2015). This can also lead to false positives and inflation of
diversity.
All metabarcoding approaches are prone to intra-sample chimeras.
However, as chimera formation increases when similar sequences are
amplified in the same PCR reaction
(e.g.
Judo et al. 1998; Smyth et al. 2010), the use of
metabarcoding primers with long 5’ overhangs, as in the one-step and
two-step approaches, might be more prone to chimera formation since they
carry long and similar sequences at the 5ʹ end of the primers. However,
this hypothesis requires testing. Intra-sample chimeras can be reduced
by limiting the number of PCR cycles
(Haas et al. 2011).
Also, if samples are subjected to multiple, independent PCRs, chimeras
can be filtered out by keeping only sequences that occur in multiple PCR
replicates, the ‘restrictive approach- described in Alberdi et al,
(2018). Chimera
detection programmes such as UCHIME
(Edgar et al. 2011)
can be used for further clean-up.
Inter-sample chimeras can cause havoc in metabarcoding studies. They can
only occur in the tagged PCR approach where library build is carried out
on pooled tagged amplicons from different samples (Fig. 2D). Here,
tag-jumps can create sequences with new combinations of the nucleotide
tags used in the amplicon pool
(Schnell et al.2015). If the new combinations of tags are already used in the amplicon
pool, it will cause false assignment of sequences to samples, which
should be avoided at all costs
(Schnell et al.2015; Esling et al. 2015). Such tag-jumps can also have the
consequence that negative controls are seemingly not negative following
bioinformatic sorting of sequences to samples. It should be noted that
tag-jumps can also occur due to T4 DNA Polymerase activity in the
blunt-ending step during library preparation, as demonstrated in library
building for the Roche/454 sequencing platform
(van Orsouw et
al. 2007; Palkopoulou et al. 2016) and for the Illumina
sequencing platform (Carøe
& Bohmann 2020). The rate of tag-jumping has been estimated from ca.
2% to up to 49% of total sequences
(Schnell et
al. 2015; Esling et al. 2015; Carøe & Bohmann 2020). This
broad range can be caused by factors affecting inter-sample chimera
formation during the index PCR. For example, DNA template and primer
concentration, PCR cycle number, and sequence similarity
(e.g.
Judo et al. 1998; Smyth et al. 2010; Carøe & Bohmann
2020). The range of tag-jump proportions highlights the unreliability
of including an index PCR step in the tagged PCR approach.
To avoid tag-jumps in the tagged PCR approach, and thereby prevent false
assignment of sequences to samples, it is important to refine index PCR
parameters to decrease the likelihood of chimera formation - or better
yet, to omit the index PCR step (Fig. 2D). Further, blunt-ending using
T4 DNA Polymerase should be circumvented during library preparation
(Schnell et
al. 2015; Palkopoulou et al. 2016; Carøe & Bohmann 2020). If
both T4 DNA Polymerase blunt-ending and index PCR are eliminated during
library preparation of pools of tagged amplicons, tag-jumps can
practically be eliminated
(Carøe & Bohmann 2020).
If the library preparation protocol contains a T4 DNA blunt-ending step
and/or an index PCR step, and thereby can be assumed to generate
tag-jumps, they can be detected and removed by using ‘twin-tags’ during
the original PCRs (e.g. F1-R1, F2-R2,…), because tag-jumped
sequences would then produce non-twinned tag combinations not used in
the set-up (e.g. F1-R2, F2-R3,…)
(e.g.
Schnell et al. 2015; Yang et al. 2021). However, using
twin tags comes at the price of buying many more versions of tagged
primers and building more libraries
(Schnell et al.2015). If twin tags are not used, chimera removal software can remove
some chimeric sequences carrying false combinations of used tags
(Schnell et al.2015).
The extent of tag-jumping and spillover of taxa between samples can be
detected through inclusion of positive controls consisting of synthetic
oligos or taxa not expected to occur in the dataset. However, note that
such controls do not enable confident elimination of false positives
caused by tag-jumps. The extent of tag-jumping can also be assessed by
comparing all observed combinations of used tags to all originally used
tag combinations
(Schnell et al.2015; Zepeda Mendoza et al. 2016).
Misassignment of library indicesIncorrect assignment of indices between pooled libraries can cause
sequence reads to be incorrectly assigned to libraries. Misassigned
indices have been attributed to the formation of mixed clusters on the
sequencing flow cell, i.e. clusters originating from two different
template molecules or clusters growing into each other, to low levels of
free index primers present in the sequence library and to bulk
amplification of pooled libraries
(Nelsonet al. 2014; Sinha et al. 2017; Vodak et al. 2018;
Costello et al. 2018; Valk et al. 2019). Regardless of
how index misassignment occurs, if it occurs in metabarcoding studies it
can cause incorrect assignment of amplicon sequences to libraries, which
can cause incorrect assignment of sequences to samples and false
positives. This phenomenon can affect all three metabarcoding approaches
(Fig. 2). To avoid index misassignment it is recommended to dual-index
libraries with unique library index combinations
(Kircher et al.2012; Sinha et al. 2017),www.illumina.com). Further,
stringent bead purification (or size selection) can remove free
adapters/primers from the libraries
(Owens et al. 2018).
The labelling in the different metabarcoding approaches further allows
for accounting for potential incorrect assignment of sequences to
libraries. In the tagged PCR approach, unique tagging of PCR replicates
across all pooled libraries can be used to account for (and detect)
index misassignment. However, this can be costly. In the one-step PCR
with fusion primers approach, a tweaked protocol where nucleotide tags
are used instead of i7 and i5 of library indices
(e.g. Elbrecht
& Steinke 2018) creates one single library that is thereby free of
index misassignment. As with tag-jumping, the extent of incorrect
assignment of indices and spillover of taxa between samples can be
detected through inclusion of positive controls consisting of taxa not
expected to occur in the data set and by comparing all observed to all
used combinations of used indices when demultiplexing libraries.
It is important not to mistake tag-jumping, index misassignment or
cross-contamination between PCR products with cross-contamination of the
primers themselves. Due to the high concentration of primers upon
synthesis, cross-contamination (e.g. by aerosols) can manifest itself as
low numbers of sequence reads and could be misinterpreted as tag-jumps
or index-bleeding. Due to the risk of primer cross-contamination, some
laboratories avoid ordering primers in 96-well plates. There are
anecdotal reports that primer contamination can also occur at primer
synthesis (or purification). As mentioned, the risk of
cross-contamination between nucleotide tagged primer stocks and indexed
primer stocks, which could e.g. occur during resuspension of primers,
will generally be the same no matter which of the three overall
metabarcoding approaches is used. In the first PCR step in the two-step
PCR approach, the primers are unlabelled and any cross-contamination
that might occur will not have consequences.
Cost Metabarcoding primers in the tagged and one-step PCR approaches have to
be labelled with either nucleotide tags or indices, whereas the
metabarcoding primers in the two-step approach are generally not
individually labelled. Due to the different labelling systems in the
three primary metabarcoding approaches, there are different costs
associated with them.
The fusion primers for the one-step PCR approach are the most expensive
metabarcoding primers amongst the three approaches. This is (i) because
differently indexed versions are purchased for each metabarcoding primer
set and (ii) because the increased oligo length results in lower yield
of the full length product. If unique matching indices are used to
account for index misassignment, one-step PCR can become increasingly
expensive for larger scale studies. However, this needs to be factored
against the potential cost of repeating runs due to artifacts and
contamination, and the fact that only a single PCR step is needed to go
from sample extract to library.
In the tagged PCR approach (Fig. 2D), the metabarcoding primers are
relatively inexpensive compared to the one-step PCR fusion primers as
they only add 5’ tags of 5-10 nucleotides in length. However, as with
the one-step PCR approach, these need to be purchased in many tagged
editions for each metabarcoding primer set. Furthermore, if tag-jumping
is to be taken into account by only using each tag once in a library
amplicon pool, e.g. by only amplifying with twin forward and reverse
tags, then metabarcoding primer sets have to be ordered in many
differently labelled editions (Schnell et al. 2015). To keep
costs down, this needs to be balanced by pooling fewer PCR products into
each library and thereby creating more sequence libraries (Fig. 2D).
However, if a library preparation protocol is used that does not create
tag-jumps, tags can be freely combined, which lowers the number of
tagged primers that must be purchased
(Schnell et al.2015; Carøe & Bohmann 2020). In contrast to the other two
metabarcoding approaches, the tagged PCR approach includes library
preparation on pools of amplicons, and the cost of this therefore has to
be taken into account. This can however be kept low if a protocol that
does not generate tag-jumps is used and only a few libraries have to be
made.
If a large number of metabarcoding primer sets are used, the two-step
approach offers a relatively inexpensive solution. In the two-step PCR
approach, the metabarcoding primers are generally synthesized with 5’
tails containing no tags or indices. This means that the same primer set
can be used across multiple samples and projects. This has the benefit
that trying out new metabarcoding primer sets does not entail buying
many labelled versions of the metabarcoding primer sets, as it does in
the other metabarcoding approaches (Fig. 2B-D). However, the second
primer set in the two-step PCR approach is costly as it has to include
both the sequence complementary to the sequence overhang, the sequence
adapters and the library indices (Fig. 2C). It is worth noting that,
just as with the one-step PCR approach, many labelled index primers will
have to be purchased if twin dual-indices are used to account for
incorrect assignment of indices to libraries. This second primer set is,
however, applicable across different metabarcoding primer sets and can
thereby be used across many metabarcoding studies.
Laboratory workload The one-step PCR approach is without doubt the quickest method for
generating sequence-ready libraries, as it only requires a single
PCR-step to achieve both amplification and library preparation of the
metabarcoding amplicons (Fig. 2B), and it has been used in the field to
rapidly turn-around sequence data. The workload for the two-step PCR
approach and the tagged PCR approach depends, to some extent, on how
many sample extracts and PCR replicates are to be processed. If it is a
relatively high number, the tagged PCR approach is the quickest due to
the library build being performed on pooled amplicons rather than
through a PCR step on individual PCR products. However, as with all
molecular biological workflows, carefully organised liquid handling and
automation provide solutions to high-throughput studies.