2.5 | Posthoc analyses including functional
annotations
An analysis of principal components (PCA) was implemented in the
adegenet package for R (Jombart 2008; Jombart & Ahmed 2011) to obtain a
graphical depiction of patterns of genetic structure among virus
affected and unaffected stocks based on all candidate SNPs identified by
BayPass (BFs ≥ 50).
Total linkage disequilibrium (LD) among all candidate loci was
calculated using LDx, a package which uses an approximate maximum
likelihood approach from pooled resequencing data (Feder et al.2012). Linkage disequilibrium was calculated asr 2, the square of the correlation between
alleles of SNP pairs within the paired sequence reads of each
population. We subsequently calculated the average LD for each pairwise
SNP comparison across sample sites. Next, we assessed the distribution
of candidate loci and signatures of selection across the referenceH. rubra genome consisting of 2,854 annotated scaffolds varying
between 1,830 and 1.1 x 107 bp in length (Gan et
al. 2019). This was achieved by regressing the total number of
candidate loci against scaffold length using ggpubr package for R
(Kassambara & Kassambara 2020). Scaffolds with exceptionally large
numbers of candidate loci relative to scaffold length were interrogated
further using the package LDBlockShow (Dong et al. 2021) to
measure pairwise linkage disequilibrium and haplotype blocks using the
default -SeleVar option to calculate D ’ (the ratio of the
difference between the observed and expected frequency of a haplotype,
and its maximum value when considering total allele frequencies).
SnpEff v2.0.3 (Cingolani et al. 2012) was used to map candidate
SNP loci to the H. rubra genome and predict variant impacts; high
(highly disruptive impact on protein function), moderate (non-synonymous
mutations, possible change in protein effectiveness), low (unlikely to
change protein behaviour) or a modifier (synonymous mutations,
non-coding or intergenic variant). Functional classification of
candidate genes was achieved by aligning the peptide sequences for
mapped candidate H. rubra genes with the annotated genomes for
human (NCBI RefSeq IDs NC_000001 - NC_000024), pacific oyster (RefSeq
IDs NC_047559 - NC_047568), scallop (RefSeq ID NC_007234.1) and the
blue mussel (RefSeq ID NC_006161.1) using DIAMOND (Buchfink et
al. 2015). A maximum e-value of 1e-40 was set to
conservatively estimate the likelihood of similar gene functions between
taxonomies. Protein GI accessions from the top hit of DIAMOND alignments
were imported into the web-based version of the DAVID bioinformatics
tool (Huang et al. 2009a; Huang et al. 2009b), where
corresponding annotations were generated. Functional annotations were
performed by focussing specifically on gene homologs known to be
associated with virus-host interactions, in particular herpesvirus
response pathways, including those in response to HaHV-1 in the AVG
resistant H. iris .