Statistical analyses
Separate analyses were conducted
to compare gene expression between hybrids and parental purebreds, and
ambient versus elevated conditions within an offspring group. Transcript
abundance of the samples and the BLAST results were output to R
for statistical analyses using the package limma (Ritchie et al.,
2015). Firstly, only transcripts that were of coral origin were
retained, as indicated in the BLAST results. Secondly, duplicate
transcripts were removed. Thirdly, transcripts that consistently had
zero or very low counts were removed using the edgeR build in
function filterByExpr , and scale normalization (TMM) was applied.
For Principal Components Analysis (PCA), sample raw counts were
transformed into log2-counts per million (log-CPM) to account for
library size differences. A total of four samples were identified to
have very small library size (three A. tenuis purebreds (two
under ambient, one under elevated conditions) and one TL hybrid under
elevated conditions), and a relative log expression (RLE) plot showed
that normalization of these samples was unsuccessful (Gandolfo & Speed,
2018) (Figure S2). These samples were excluded from the analyses. A
heatmap was then used to visualize the 500 most variable genes across
samples, based on a calculated matrix of Euclidean distances from the
log-CPM.
To fit linear models for comparisons, count data was transformed to
log-CPM using the voom function in the limma package.
Since no treatment effect was found on gene expression (see Results
section), the comparison of hybrids and purebreds combined samples from
both treatments. Comparisons were made between: 1) maternal parent LL
and its hybrid LT, 2) paternal parent LL and its hybrid TL, and 3)
between the reciprocal hybrids LT and TL. The parental purebred TT
(A. tenuis ) was not included due to a small sample size (n =1,
Table S1). Empirical Bayes moderation was then carried out to obtain
more precise pairwise comparisons and p-values were corrected using the
Benjamini-Hochberg method (Benjamini & Hochberg, 1995). A gene was
considered differentially expressed when padj< 0.05 and with log-fold change (LFC) > 0.2 using
the treat function in the limma package. The list of
differentially expressed genes (DEGs) was exported for gene ontology
(GO) analyses (goseq (Young et al., 2010)) and visualized using
volcano plots (Blighe et al., 2018). The volcano plots and GO analyses
focused on the comparison of 1) paternal parent LL with its hybrid TL,
and 2) between the reciprocal hybrid LT and TL only, as these were the
pairs with a high number of differentially expressed genes (DEGs) to
explore. The p-values were corrected with the Benjamini-Hochberg method
(Benjamini & Hochberg, 1995), and a GO category was considered
overrepresented or underrepresented when padj< 0.05. The R scripts and dataset for statistical analyses are
available in:
https://datadryad.org/stash/share/XKMEOA9LQ4A_lBd1zfF-RNmjuJBZrRo3RU07i3gXx1w.