Statistical analyses
Separate analyses were conducted to compare gene expression between hybrids and parental purebreds, and ambient versus elevated conditions within an offspring group. Transcript abundance of the samples and the BLAST results were output to R for statistical analyses using the package limma (Ritchie et al., 2015). Firstly, only transcripts that were of coral origin were retained, as indicated in the BLAST results. Secondly, duplicate transcripts were removed. Thirdly, transcripts that consistently had zero or very low counts were removed using the edgeR build in function filterByExpr , and scale normalization (TMM) was applied. For Principal Components Analysis (PCA), sample raw counts were transformed into log2-counts per million (log-CPM) to account for library size differences. A total of four samples were identified to have very small library size (three A. tenuis purebreds (two under ambient, one under elevated conditions) and one TL hybrid under elevated conditions), and a relative log expression (RLE) plot showed that normalization of these samples was unsuccessful (Gandolfo & Speed, 2018) (Figure S2). These samples were excluded from the analyses. A heatmap was then used to visualize the 500 most variable genes across samples, based on a calculated matrix of Euclidean distances from the log-CPM.
To fit linear models for comparisons, count data was transformed to log-CPM using the voom function in the limma package. Since no treatment effect was found on gene expression (see Results section), the comparison of hybrids and purebreds combined samples from both treatments. Comparisons were made between: 1) maternal parent LL and its hybrid LT, 2) paternal parent LL and its hybrid TL, and 3) between the reciprocal hybrids LT and TL. The parental purebred TT (A. tenuis ) was not included due to a small sample size (n =1, Table S1). Empirical Bayes moderation was then carried out to obtain more precise pairwise comparisons and p-values were corrected using the Benjamini-Hochberg method (Benjamini & Hochberg, 1995). A gene was considered differentially expressed when padj< 0.05 and with log-fold change (LFC) > 0.2 using the treat function in the limma package. The list of differentially expressed genes (DEGs) was exported for gene ontology (GO) analyses (goseq (Young et al., 2010)) and visualized using volcano plots (Blighe et al., 2018). The volcano plots and GO analyses focused on the comparison of 1) paternal parent LL with its hybrid TL, and 2) between the reciprocal hybrid LT and TL only, as these were the pairs with a high number of differentially expressed genes (DEGs) to explore. The p-values were corrected with the Benjamini-Hochberg method (Benjamini & Hochberg, 1995), and a GO category was considered overrepresented or underrepresented when padj< 0.05. The R scripts and dataset for statistical analyses are available in: https://datadryad.org/stash/share/XKMEOA9LQ4A_lBd1zfF-RNmjuJBZrRo3RU07i3gXx1w.