2.5 | Posthoc analyses including functional annotations
An analysis of principal components (PCA) was implemented in the adegenet package for R (Jombart 2008; Jombart & Ahmed 2011) to obtain a graphical depiction of patterns of genetic structure among virus affected and unaffected stocks based on all candidate SNPs identified by BayPass (BFs ≥ 50).
Total linkage disequilibrium (LD) among all candidate loci was calculated using LDx, a package which uses an approximate maximum likelihood approach from pooled resequencing data (Feder et al.2012). Linkage disequilibrium was calculated asr 2, the square of the correlation between alleles of SNP pairs within the paired sequence reads of each population. We subsequently calculated the average LD for each pairwise SNP comparison across sample sites. Next, we assessed the distribution of candidate loci and signatures of selection across the referenceH. rubra genome consisting of 2,854 annotated scaffolds varying between 1,830 and 1.1 x 107 bp in length (Gan et al. 2019). This was achieved by regressing the total number of candidate loci against scaffold length using ggpubr package for R (Kassambara & Kassambara 2020). Scaffolds with exceptionally large numbers of candidate loci relative to scaffold length were interrogated further using the package LDBlockShow (Dong et al. 2021) to measure pairwise linkage disequilibrium and haplotype blocks using the default -SeleVar option to calculate D ’ (the ratio of the difference between the observed and expected frequency of a haplotype, and its maximum value when considering total allele frequencies).
SnpEff v2.0.3 (Cingolani et al. 2012) was used to map candidate SNP loci to the H. rubra genome and predict variant impacts; high (highly disruptive impact on protein function), moderate (non-synonymous mutations, possible change in protein effectiveness), low (unlikely to change protein behaviour) or a modifier (synonymous mutations, non-coding or intergenic variant). Functional classification of candidate genes was achieved by aligning the peptide sequences for mapped candidate H. rubra genes with the annotated genomes for human (NCBI RefSeq IDs NC_000001 - NC_000024), pacific oyster (RefSeq IDs NC_047559 - NC_047568), scallop (RefSeq ID NC_007234.1) and the blue mussel (RefSeq ID NC_006161.1) using DIAMOND (Buchfink et al. 2015). A maximum e-value of 1e-40 was set to conservatively estimate the likelihood of similar gene functions between taxonomies. Protein GI accessions from the top hit of DIAMOND alignments were imported into the web-based version of the DAVID bioinformatics tool (Huang et al. 2009a; Huang et al. 2009b), where corresponding annotations were generated. Functional annotations were performed by focussing specifically on gene homologs known to be associated with virus-host interactions, in particular herpesvirus response pathways, including those in response to HaHV-1 in the AVG resistant H. iris .