Introduction
In recent years, the analysis of environmental DNA (eDNA) and DNA extracted from bulk specimen samples has experienced an enormous surge in popularity in basic and applied biodiversity studies seeking to detect plants and animal taxa (Taberletet al. 2012a; Creer et al. 2016; Jarman et al.2018). Within the field of genetic biodiversity assessment, DNA metabarcoding is currently the most widely used approach, as it allows targeted, parallel, and as such relatively cost-effective, identification of multiple taxa from DNA extracted from e.g. soil, water, faeces as well as from bulk samples of organisms (Taberlet et al.2012b). Here, the application of metabarcoding ranges widely; e.g., detection of invasive species in water samples (e.g. Pochonet al. 2013); assessment of water quality via identification of freshwater invertebrates in bulk specimen samples (e.g. Elbrechtet al. 2017) and environmental samples (e.g. Seymouret al. 2020); identification of plant-pollinator interactions via pollen trapped on the bodies of modern (e.g. Lucaset al. 2018) and historical (e.g. Gouset al. 2019) pollinator specimens; detection of vertebrate wildlife via invertebrate ‘samplers’ of vertebrate blood or feces (e.g. Calvignac-Spencer et al. 2013), assessment of e.g. niche partitioning (e.g. Razgouret al. 2011) and ecosystem services (e.g. Aizpuruaet al. 2017) through detection of diet items in gut and faecal samples. Furthermore, metabarcoding is explored for implementation in routine biomonitoring around the world (Pontet al. 2018, 2021; Li et al. 2018, 2019; Aylagas et al. 2018; Zizka et al. 2020) (www.danubesurvey.org; www.syke.fi), and is an integral component of the proposals for the Next Generation of Biomonitoring programmes (Bohan et al. 2017).
Metabarcoding relies on PCR amplification of extracted DNA with primers designed to target a taxonomically informative marker for a selected taxonomic group (Taberletet al. 2012b) (Fig. 1). The backbone of metabarcoding analyses is the addition of sample-specific nucleotide identifiers to amplicons and the use of these to assign metabarcoding sequences back to the samples they originated from (‘demultiplexing’). This allows pooling of hundreds to thousands of samples for sequencing and thereby full utilisation of the capacity of high-throughput sequencing platforms (Fig. 1). Amplicon labelling can be achieved at two stages during a metabarcoding workflow: prior to library build as 5’ nucleotide ‘tags’ on amplicons and/or during library build as library indices. The strategies to achieve this labelling can be categorised into three main approaches (Fig. 2). All three approaches have advantages, challenges and limitations, which - if not considered - can result in misleading data interpretation, and in the very worst case can lead to unusable data and considerable wasted time and money, as for instance in the case of the so-called ‘tag-jumps’ (Schnell et al. 2015; Esling et al. 2015; Carøe & Bohmann 2020). Despite this, in contrast to discussions on metabarcoding substrate selection, DNA extraction and data processing, the strategies for amplicon labelling and library preparation workflows have received little systematic attention in the metabarcoding literature (although see Murray et al. 2015).
Here, we present an overview of the three most commonly used workflows with which to achieve sample-specific labelling and library preparation in metabarcoding studies and how they can potentially influence the resulting data. For the sake of simplicity, we focus on metabarcoding of plants and animals in basic and applied biodiversity studies with sequencing on arguably the most used high-throughput sequencing platform series today, the Illumina sequencing platforms. Doing so, we provide critical considerations for researchers to choose the optimal metabarcoding strategy for generating reliable data tailored to their individual study;for example, regarding sample type and number, research question, speed of laboratory processing, contamination risk, budget and whether similar studies are to be carried out in the laboratory in the future. Ultimately, by gaining detailed and critical insights into the consequences of choosing different metabarcoding workflows, we hope to further increase the potential of metabarcoding as a reliable tool for use across a wide range of applications.