Introduction
In recent years, the analysis of environmental DNA (eDNA) and DNA
extracted from bulk specimen samples has experienced an enormous surge
in popularity in basic and applied biodiversity studies seeking to
detect plants and animal taxa
(Taberletet al. 2012a; Creer et al. 2016; Jarman et al.2018). Within the field of genetic biodiversity assessment, DNA
metabarcoding is currently the most widely used approach, as it allows
targeted, parallel, and as such relatively cost-effective,
identification of multiple taxa from DNA extracted from e.g. soil,
water, faeces as well as from bulk samples of organisms
(Taberlet et al.2012b). Here, the application of metabarcoding ranges widely; e.g.,
detection of invasive species in water samples
(e.g. Pochonet al. 2013); assessment of water quality via identification of
freshwater invertebrates in bulk specimen samples
(e.g. Elbrechtet al. 2017) and environmental samples
(e.g. Seymouret al. 2020); identification of plant-pollinator interactions
via pollen trapped on the bodies of modern
(e.g. Lucaset al. 2018) and historical
(e.g. Gouset al. 2019) pollinator specimens; detection of vertebrate
wildlife via invertebrate ‘samplers’ of vertebrate blood or feces
(e.g.
Calvignac-Spencer et al. 2013), assessment of e.g. niche
partitioning
(e.g. Razgouret al. 2011) and ecosystem services
(e.g. Aizpuruaet al. 2017) through detection of diet items in gut and faecal
samples. Furthermore, metabarcoding is explored for implementation in
routine biomonitoring around the world
(Pontet al. 2018, 2021; Li et al. 2018, 2019; Aylagas et
al. 2018; Zizka et al. 2020)
(www.danubesurvey.org;
www.syke.fi),
and is an integral component of the proposals for the Next Generation of
Biomonitoring programmes
(Bohan et al. 2017).
Metabarcoding relies on PCR amplification of extracted DNA with primers
designed to target a taxonomically informative marker for a selected
taxonomic group (Taberletet al. 2012b) (Fig. 1). The backbone of metabarcoding analyses
is the addition of sample-specific nucleotide identifiers to amplicons
and the use of these to assign metabarcoding sequences back to the
samples they originated from (‘demultiplexing’). This allows pooling of
hundreds to thousands of samples for sequencing and thereby full
utilisation of the capacity of high-throughput sequencing platforms
(Fig. 1). Amplicon labelling can be achieved at two stages during a
metabarcoding workflow: prior to library build as 5’ nucleotide ‘tags’
on amplicons and/or during library build as library indices. The
strategies to achieve this labelling can be categorised into three main
approaches (Fig. 2). All three approaches have advantages, challenges
and limitations, which - if not considered - can result in misleading
data interpretation, and in the very worst case can lead to unusable
data and considerable wasted time and money, as for instance in the case
of the so-called ‘tag-jumps’
(Schnell et
al. 2015; Esling et al. 2015; Carøe & Bohmann 2020). Despite
this, in contrast to discussions on metabarcoding substrate selection,
DNA extraction and data processing, the strategies for amplicon
labelling and library preparation workflows have received little
systematic attention in the metabarcoding literature
(although
see Murray et al. 2015).
Here, we present an overview of the three most commonly used workflows
with which to achieve sample-specific labelling and library preparation
in metabarcoding studies and how they can potentially influence the
resulting data. For the sake of simplicity, we focus on metabarcoding of
plants and animals in basic and applied biodiversity studies with
sequencing on arguably the most used high-throughput sequencing platform
series today, the Illumina sequencing platforms. Doing so, we provide
critical considerations for researchers to choose the optimal
metabarcoding strategy for generating reliable data tailored to their
individual study;for example, regarding sample type and number, research
question, speed of laboratory processing, contamination risk, budget and
whether similar studies are to be carried out in the laboratory in the
future. Ultimately, by gaining detailed and critical insights into the
consequences of choosing different metabarcoding workflows, we hope to
further increase the potential of metabarcoding as a reliable tool for
use across a wide range of applications.