Multiplex barcoding and whole organism community metabarcoding
While classical barcode sequencing involves the individualisation of both the PCR and sequencing reactions, HTS platforms now offer the opportunity to pool thousands of amplicons from individual specimens via tagged amplicon sequencing (Creedy et al., 2020; Hebert et al., 2018, Shokralla et al., 2014, Srivathsan et al., 2019, 2021; de Kerdrel et al. 2020). This can be scaled up to 10,000 multiplexed individuals within a single MinION flow cell (Srivathsan et al., 2019, 2021) or several hundred thousand for one lane of NovaSeq 6000 when a reduced length “mini barcode” is used (Yeo, Srivathsan & Meier, 2020). HTS multiplex barcoding provides a direct link between DNA sequences and the individuals from which they were amplified. This has several advantages. It allows one to sort physical specimens to putative species and to resolve taxonomic disagreements between barcodes and other data (Wang, Srivathsan, Foo, Yamane, & Meier, 2018). This is necessary when the associated sequence appears unusual (e.g. unexpectedly high sequence divergence) or species delimitation approaches with different algorithms return conflicting results (Meier et al., 2021). It also allows one to return to the DNA extract, should there be interest in further exploring the nuclear genome, diet content or microbiome of specific specimens (Kennedy et al., 2020). Another very obvious advantage is that abundance estimates can be directly extracted from the DNA sequence data.
In contrast to multiplex barcoding, whole organism community DNA (wocDNA) metabarcoding (Andújar et al., 2018; Creedy et al., 2021; Yu et al., 2012) involves a single DNA extraction for multiple individuals from multiple species, that is subsequently PCR amplified and sequenced, typically using the Illumina platform. This reduces the individualised processing of specimens, which is particularly relevant for hyperdiverse (and minute specimen) arthropod assemblages (e.g. Arribas et al., 2016; Creedy et al., 2019) and/or high numbers of community samples (e.g. for long-term or broad-scale approaches). However, there are a number of ways in which the information content of wocDNA metabarcode data is different from multiplex barcode data, either requiring additional data processing or placing limits on inferences that can be derived. An important feature of wocDNA metabarcode sequence output is the difficulty to discern spurious sequences (PCR and DNA sequencing artefacts, contamination, nuclear copies, or different combinations of these) from real (but low abundance) sequences in the community sample. With appropriate laboratory protocols, design and bioinformatic processing, contamination issues and PCR and DNA sequencing artefacts can be substantially reduced (e.g. Alberdi, Aizpurua, Gilbert, & Bohmann, 2018; Creedy et al., 2021). It has also recently become possible to effectively remove nuclear copies of mtDNA sequences, providing for haplotype-level resolution from wocDNA metabarcode data (Andújar et al., 2021). Within wocDNA metabarcode data, there is no correspondence between sequences and the individual from which they are derived. While biodiversity patterns can still be explored without taxonomic assignment, species-level taxonomic assignment is generally a desirable feature, and in this case, can be only achieved with taxonomically assigned barcode reference sequences. Even without species-specific reference libraries, arthropod sequence assignment to some taxonomic level can be achieved using public repositories (e.g. GenBank or BOLD). Finally, the extrapolation of abundance data from metabarcode sequence output is complicated, but several promising approaches for deriving abundance data from standardised samples have been developed (e.g. Ji et al., 2020; Krehenwinkel et al., 2017; Luo, Ji, & Yu, 2022).
The choice of HTS barcoding approach to catalogue arthropod biodiversity will be dependent on the specific objectives to be addressed. However, there are potential synergies from combining high throughput generation of community-level data by wocDNA metabarcoding, together with vouchered sequencing by multiplex barcoding. Vouchering may be considered unnecessary when well-parameterized reference libraries are available, but it is otherwise an essential consideration for future taxonomic assignment of metabarcoding reads and for completing reference barcode databases. Individualised and validated barcodes generated by multiplex barcoding are also of particular relevance for the bioinformatic processing of metabarcode reads (Andújar et al., 2021).