Bioinformatic treatment
We used the OBITools software suite (Boyer et al., 2016) to perform the
bioinformatic treatment of raw sequence data, as follows. First, we
assembled the forward and reverse reads using theilluminapairedend program and kept only sequences with an
alignment score greater than 40 (corresponding to a 10-nucleotide
overlap of the forward and reverse reads). Second, we assigned aligned
sequences to the corresponding PCR replicate using the programngsfilter and allowed two and zero mismatches on primers and
tags, respectively. Third, we dereplicated sequences usingobiuniq and discarded bad-quality sequences (i.e., containing
“N”), sequences whose length was lower or higher than expected (based
on the minimum and maximum metabarcode length; Table S1) and singletons.
Fourth, we ran the obiclean program with the option -r set at 0.5
to detect potential PCR or sequencing errors and kept only the sequences
tagged as “heads” in at least one PCR. Sequences are tagged as
“heads” when they are at least twice (-r option set at 0.5) as
abundant as other related sequences differing by one base in the same
PCR. Fifth, we clustered sequences at a threshold of 96% (Bact02,
Euka02, Inse01), 95% (Fung02), 92% (Olig01) or 85% (Coll01) sequence
similarity using the sumaclust program
(https://git.metabarcoding.org/obitools/sumaclust/wikis/home). These
thresholds minimize the risk that sequences attributed to the same
species are clustered in different MOTUs and were selected on the basis
of preliminary bioinformatics analyses (Bonin, Guerrieri, & Ficetola,
2021).
For the taxonomic assignment, we built for each marker a sequence
reference database from EMBL (version 140), as follows. First, we ran
the ecoPCR program (Ficetola et al., 2010) to carry out anin silico PCR with the primer pairs used for the experiment,
allowing three mismatches per primer. Then, we curated the obtained
reference databases by keeping only sequences assigned at the species,
genus and family levels. Finally, the taxonomic assignment was performed
by the ecotag program on each sequence using the reference
database.
In order to remove spurious sequences and avoid bias in ecological
conclusions (Calderón‐Sanou, Münkemüller, Boyer, Zinger, & Thuiller,
2020) we performed additional filtering in R (version 4.0). We discarded
MOTUs with a best identity < 80% (Bact02, Euka02, Fung02) or
< 60% (Coll02, Inse01, Olig01) and MOTUs observed less than
five (Bact02, Fung02, Inse01), ten (Olig01), eleven (Coll01) or twelve
(Euka02) times overall. The latter corresponds to the minimum number of
reads that removed ≥ 99.99% of sequences detected in our blanks (i.e.,
tag-jump errors). Furthermore, we discarded MOTUs detected in only one
sample, as they represent singletons, MOTUs detected in less than two
PCR replicates of the same sample, as they often represent false
positives (Gentile F. Ficetola et al., 2015), and MOTUs detected in more
than one extraction or PCR negative control, as they might represent
contaminants (Zinger, Bonin, et al., 2019).