Data analysis
The two airborne eDNA samples collected from each site were combined and
treated as one sample to maximise the recovery of cellular material from
each site. The raw reads were firstly demultiplexed using QIIME2
(version 2022.8.0) (Estaki et al. 2020) based on the Illumina barcodes,
creating separate samples for each. FastQC (version 0.11.9)
(https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
was used on the demultiplexed reads to assess the overall data quality
before and after adapter removal. Adapters were removed using Cutadapt
(Martin 2011) (version 2.8-2). Bowtie2 (Langmead and Salzberg 2012)
(version 2.2.5) and Samtools (Danecek et al. 2021) (version 1.10) were
used to align the reads to a reference human mitochondrial genome
sequence (accession number KX456569.1) and identify and remove
human/primate-associated sequences from the dataset. In some instances,
samples contained 99% human DNA and hence were removed from all further
analyses (Supplementary Table 1).
To minimise the introduction of potentially erroneous ASVs, we used our
positive control sample (known DNA source; Tursiops aduncus ) to
optimise the filtering and denoising parameters used in the DADA2
package (Callahan et al. 2016) in R (version 1.26.0). We did this by
varying the maximum expected errors from 1 to 2 and the truncation
quality score parameters from 2 to 30. Filtering for the maximum
expected error of 1 and truncation quality score of 30 significantly
decreased the abundance of all reads but did not remove erroneous ASVs.
Filtering and denoising parameters were therefore selected to maximise
data retention by using default settings: Trimming left of 19 bp for
forward reads and 21 bp for reverse reads, truncation length of 130 bp,
maximum number of ambiguous bases (N’s) of 0, maximum expected errors of
2, truncation quality score of 2, removal of PhiX contamination set to
TRUE. DADA2 was then used to remove chimeras from the sequence data
using the removeBimeraDenovo function with the pooled method. We then
produced two datasets for downstream analyses, one including all forward
filtered and denoised reads (to maximise data retention) and the merged
forward and reverse filtered and denoised reads. We found that the use
of merged reads dramatically reduced the presence of detected
contaminant DNA (Figure S1).
The merged, filtered and denoised reads were then used to produce
amplicon sequence variants (ASVs) (Table S1), which clusters unique
sequence variants based on a 100% sequence similarity. We then used
BLAST to match ASVs to mammalian mitochondrial DNA sequences filtered
from the NCBI’s GenBank non-redundant (nr) database and made species
level taxonomic assignments to BLAST hits using the R package taxonomizr
(version 0.9.3). Taxonomic assignment was performed using a combination
of consensus between the 10 BLAST matches for each ASV and the
percentage of identity for each match to produce a trust score. If the
percentage identity of one or more matches was higher than the mean of
the 10 blast hits, only those higher matches were used to establish a
consensus for the assignment of the final taxonomy. The trust score was
transformed into a 0-1 score using the formula: trust score =
(percentage_identify - 80) / (100 - 80). Only ASVs that returned a
mammalian mitochondrial DNA sequence BLAST match, had at least three
reads across all samples and did not match to primate mitochondrial
sequences were used for subsequent analyses.
A phylogenetic tree was constructed using the ASVs obtained after
taxonomic assignment and included relative abundance calculations,
normalized by library size. Reference sequences of relevant native and
introduced species identified by BLAST matches were also included in the
tree construction. These sequences and their accession numbers can be
found in Supplementary Table 2. The alignment of ASVs was carried out
with the MAAFT (Katoh and Standley 2013) algorithm, and a tree was
produced with FastTree (Price et al. 2010) (version 2.1.11). The
resulting set of ASVs were manually clustered based on three types of
evidence, 1) presence at the same site, 2) the best match to the
taxonomic reference, and 3) the normalised abundance. Where multiple,
highly similar (≥ 97%) ASVs were present at the same site, the most
abundant ASV also had the highest similarity to the taxonomic reference
and was therefore chosen as the most accurate representative. This
process of manual curation removed the ambiguity of erroneous ASVs from
the dataset and aimed to ensure accurate representation of the species
present at the sampling sites. A dendrogram was then constructed using
this set of curated ASVs and relative abundance was calculated by
summing the read counts from the cluster, normalized by library size.
Reference sequences of relevant native and introduced animals were
included in the tree construction. The tree was then visualized and
refined in the Interactive Tree Of Life (iTOL) (Letunic and Bork 2007)
tool (version 4.4.0). Additional graphics were applied using Adobe
Illustrator.