2.7 16S rRNA gene amplicon sequencing of the microbiome from Ellinge WWTP
For detailed information see supplementary method 1 and for information on sequenced samples see supplementary table 3. Briefly, sequencing libraries were made in a dual-PCR setup. In the first PCR, amplifying the 16s rRNA gene, primers Uni341F and Uni806R (Yu et al., 2005) were used, which amplifies the V3-V4 region of this gene. In the second PCR primers introducing sequencing adaptors and barcode tags were used (Nunes et al., 2016). 16s rRNA gene amplicon sequencing was done using an Illumina MiSeq Desktop Sequencer (Illumina Inc.). Raw sequence reads were trimmed using cutadapt version 2.3 (Martin, 2011). Primer-trimmed sequence reads were error-corrected, merged and amplicon sequence variants (ASVs) identified using DADA2 version 1.10.0 (Callahan et al., 2016) plugin for QIIME2 (Bolyen et al., 2019). For rarefaction curves see supplementary figure 6. A multiple sequence alignment of the ASVs was performed with mafft v7.407 (Katoh & Standley, 2013) and used to build an approximate ML tree with FastTree v2.1.10 (Price et al., 2010). R (R Core Team, 2020) was used for sequence and data analysis for the 16S rRNA gene community profiling. Furthermore were the tidyverse (Wickham et al., 2019) and phyloseq (McMurdie & Holmes, 2013) packages used for visualization and general data handling. Taxonomy was assigned with the dada2 package (Callahan et al., 2016) using the Genome Taxonomy Database (GTDB; https://doi.org/10.5281/zenodo.2541239) (Parks et al., 2018). The Alpha diversity metrics Faith’s phylogenetic diversity (Faith, 1992), Mean pairwise distance (Webb et al., 2002) was calculated with the PhyloMeasures package (Tsirogiannis & Sandel, 2016). For the beta diversity, weighted Unifrac distances were calculated (Lozupone & Knight, 2005). The phylogenetic tree (figure 3.a) was made using the iTOL webtool (Letunic & Bork, 2019). For investigations of the low biomass samples sewage community/pB10 and sewage community/R27 such as alpha diversity measures, phylogeny, and abundances, a cleaned data object was used (see supplementary method 1), to avoid the influence of the kitome (i.e. the background signal of kits used) and other potential contaminants to which low biomass samples are more vulnerable than high biomass samples (Davis et al., 2018). Data cleaning for the eight low biomass samples resulted in the removal of 24165 reads, from 162040 to 137875 reads, thus removal of 14.9 % of the reads. The mean number of reads for the cleaned samples were 17234, and the minimum/maximum was 15038/20999 reads. The number of taxa for these samples was reduced from 299 to 65, thus removal of 78.26% of the taxa.