Abstract
Metabarcoding is an increasingly popular and accessible method for
assessing bacterial communities across a wide range of environments, and
as the sequence data archives grow, sequence data reuse will likely
become an important source of novel insights into the ecology of
microbes. While literature on the benefits of longer read lengths for
the study of microbial communities, little is known about the
(re)usability of shorter (<200 bp) read lengths, but this
information is essential to improve the reuse and comparability of
metabarcoding data across studies. This study reanalyzed three 16S rRNA
datasets targeting aquatic, animal-associated, and soil microbiomes, and
evaluated how processing the sequence data across a range of read
lengths affected the resulting taxonomic assignments, biodiversity
metrics, and differential (i.e., before-after treatment) analyses. Short
read lengths successfully recovered ecological patterns, and limited
increases in resolution were observed beyond 100 bp reads across
environments. Furthermore, abundance-weighted diversity metrics (e.g.,
Inverse Simpson index or Bray-Curtis dissimilarities) were more robust
to variation in read lengths. Importantly, the total number of ASVs
detected increased with read length, highlighting the need to consider
metabarcoding-derived diversity estimates within the context of the
bioinformatics parameters selected. This study provides evidence-based
guidelines for the processing of short reads.