Sequence analysis
We analyzed raw sequence reads using a bioinformatics pipeline designed
to trim and sort the sequence reads according to scat sample
identification. An outline of the bioinformatic process is as follows:
(1) raw reads were paired using PEAR software (Zhang et al., 2014); (2)
followed by demultiplexing using 8 basepair index sequences unique to
each sample (mismatches discarded) using a novel grep regular
expression; (3) lastly, OTUs from each sample were taxonomically
assigned using BLAST against 12S vertebrate sequences available in
GenBank.
We carried out a series of filtering and quality control measures on
taxonomically assigned sequences. For each of the three iDNA datasets,
we removed contaminant reads (primarily human DNA sequences) and removed
sample replicates that did not amplify (below a 500 read threshold). We
then removed OTU’s with either a percent identity score less than 95%
or 1% of the total number of sequences in that sample. Finally, we
eliminated species that were not found in both sample replicates. We
then manually reviewed BLAST results for each purported species to
ensure that the 12S barcode discriminated species from sympatric
congeners or confamilials and to confirm that the taxonomic assignments
were for species regional to Mato Grosso, Brazil. If species were not
regional, we examined the other equal matches to reassign non-regional
species. If no suitable species level matches were discovered, then
these taxa were assigned at the genus level or removed from the dataset.