Genomic Observatories: a framework for harmonised high throughput barcode sequencing of island arthropods
The biodiversity, ecology, and evolution of island arthropod communities can be studied at unprecedented scales and resolution through the individual or joint application of (i) wocDNA metabarcoding, (ii) barcode reference libraries, (iii) multiplex barcoding and (iv) image analyses. Harmonisation across the first three approaches can also provide for a common data currency, facilitating comparisons and synthetic analyses across independent studies. By incorporating the universally accepted arthropod barcode region of the mitochondrial cytochrome oxidase subunit I (COI) gene into wocDNA metabarcoding (Andújar et al., 2018; Elbrecht et al., 2019), the COI barcode region can act as a directly comparable species tag across any given study, transcending potential taxonomic assignment errors within individual studies. The Genomic Observatory concept, within which HTS serves as a core tool for biodiversity assessment (Arribas et al., 2021a), provides a solid foundation for implementing genome-based inventory and monitoring of insular arthropod biodiversity.
Harmonised HTS data generation and bioinformatic workflows for general arthropod inventory and assessment are emerging (e.g. Arribas et al., 2022; Creedy et al., 2021; Srivathsan et al., 2021). However, more development is needed for an inclusive range of sampling protocols that can capture important arthropod fractions of biodiversity on islands (see Montgomery et al., 2021 for a review). For terrestrial fractions of arthropod biodiversity, these can be developed as submodules within the recently proposed framework of Arribas et al. (2022), taking advantage of their proposed downstream submodules for the processing and sequencing of samples.
In addition to the need for harmonised data generation protocols, there are other generic obstacles for Genomic Observatories that need to be addressed for an efficient island Genomic Observatories Network. One important challenge is to ensure that metabarcode data conform to Findable, Accessible, Interoperable and Reusable (FAIR) Data Principles (Wilkinson et al., 2016), such that new wocDNA metabarcode and multiplex barcoding data sets can be cross-referenced to previous work. In the same way that cross-referencing sequence reads to barcode sequence repositories can assign taxonomy and clarify species origins, additional cross-referencing to a metabarcode sequence repository would facilitate understanding the structure of community similarity over a range of spatial scales. The GEOME (Genomic Observatories Metadatabase; Deck et al., 2017; Riginos et al., 2020) initiative offers a very useful platform, facilitating FAIR data archival practices. GEOME also facilitates DNA data sharing through the deposition of raw genetic data to the Sequence Read Archive (SRA, www.ncbi.nlm.nih.gov/sra), while maintaining persistent links to standard-compliant metadata held in the GEOME database. Achieving seamless cross-referencing among de novo wocDNA metabarcode sequences, multiplex barcoding sequences and repositories of both barcode sequences and wocDNA metabarcode sequences has the potential to dramatically extend the scope and reach of such data.