Bioinformatic treatment
We used the OBITools software suite (Boyer et al., 2016) to perform the bioinformatic treatment of raw sequence data, as follows. First, we assembled the forward and reverse reads using theilluminapairedend program and kept only sequences with an alignment score greater than 40 (corresponding to a 10-nucleotide overlap of the forward and reverse reads). Second, we assigned aligned sequences to the corresponding PCR replicate using the programngsfilter and allowed two and zero mismatches on primers and tags, respectively. Third, we dereplicated sequences usingobiuniq and discarded bad-quality sequences (i.e., containing “N”), sequences whose length was lower or higher than expected (based on the minimum and maximum metabarcode length; Table S1) and singletons. Fourth, we ran the obiclean program with the option -r set at 0.5 to detect potential PCR or sequencing errors and kept only the sequences tagged as “heads” in at least one PCR. Sequences are tagged as “heads” when they are at least twice (-r option set at 0.5) as abundant as other related sequences differing by one base in the same PCR. Fifth, we clustered sequences at a threshold of 96% (Bact02, Euka02, Inse01), 95% (Fung02), 92% (Olig01) or 85% (Coll01) sequence similarity using the sumaclust program (https://git.metabarcoding.org/obitools/sumaclust/wikis/home). These thresholds minimize the risk that sequences attributed to the same species are clustered in different MOTUs and were selected on the basis of preliminary bioinformatics analyses (Bonin, Guerrieri, & Ficetola, 2021).
For the taxonomic assignment, we built for each marker a sequence reference database from EMBL (version 140), as follows. First, we ran the ecoPCR program (Ficetola et al., 2010) to carry out anin silico PCR with the primer pairs used for the experiment, allowing three mismatches per primer. Then, we curated the obtained reference databases by keeping only sequences assigned at the species, genus and family levels. Finally, the taxonomic assignment was performed by the ecotag program on each sequence using the reference database.
In order to remove spurious sequences and avoid bias in ecological conclusions (Calderón‐Sanou, Münkemüller, Boyer, Zinger, & Thuiller, 2020) we performed additional filtering in R (version 4.0). We discarded MOTUs with a best identity < 80% (Bact02, Euka02, Fung02) or < 60% (Coll02, Inse01, Olig01) and MOTUs observed less than five (Bact02, Fung02, Inse01), ten (Olig01), eleven (Coll01) or twelve (Euka02) times overall. The latter corresponds to the minimum number of reads that removed ≥ 99.99% of sequences detected in our blanks (i.e., tag-jump errors). Furthermore, we discarded MOTUs detected in only one sample, as they represent singletons, MOTUs detected in less than two PCR replicates of the same sample, as they often represent false positives (Gentile F. Ficetola et al., 2015), and MOTUs detected in more than one extraction or PCR negative control, as they might represent contaminants (Zinger, Bonin, et al., 2019).