Interspecific adaptive variation
For the pcadapt analysis, we considered the first two PCs that explained a total of ~16% of the genetic variance (Figure S2A, Supporting information). The first PC explained ~10% of variance and clearly separated NIDGS from SIDGS, and some variation within SIDGS, while the remaining axes mostly reflected intraspecific variation within NIDGS (Figures S2B and S2C, Supporting information). Using a value of α = 0.01, we detected 48 outlier loci using an adjustment of the p -values following the Benjamini-Hochberg procedure.
For the GEAs, most environmental variables were not normally distributed, and p -values for the Shapiro-Wilk normality were all lower than 0.001 for all variables. Based on the results of the Kendall correlation analysis, we kept 18 variables for the GEA analyses, given that these showed low correlation with one another (Figure S3A, Supporting information). The PCA performed with these variables shows that only PC1 explained more variance than the random broken stick criterion, and thus was kept as a summary of the environmental data to compare in the LFMM univariate analysis (Figure S4A, Supporting information). The PCA performed for the genomic data showed that only PC1 explained more variance than the random broken stick criterion, which suggests that the number of populations was two (Figure S3B, Supporting information). We ran the LFMM using K = 2, which resulted in a distribution of unadjusted p -values with a GIF of 2.42, which was then adjusted to 1.4 (Figure S5, Supporting information). Using a false discovery rate (FDR) of 10%, the LFMM analysis identified 52 candidate loci.
We then performed the RDA and pRDA for the IDGS dataset. For both analyses, from the 18 environmental variables used, we excluded three (BIO1, BIO2 and LF_VA) due to high VIF. We obtained an adjustedr2 of 0.08 and 0.06 for the RDA and pRDA, respectively, across 15 axes, which correspond to the proportion of genetic variance explained by the environmental predictors kept in each analysis. The ANOVA showed that both analyses were significant atp < 0.001, and that the first four axes of both analyses were significant at p < 0.001 (Figure S6, Supporting information). Imputation of the missing data appeared to not have created biases in population structure given the similarities between the pcadapt and the RDA PCA loadings (Figures 2A and 2B, and S2B and S2C, Supporting information). The first RDA axis explained 4.8% of the variance, while RDA2, RDA3 and RDA4 explained 2.8%, 2.5% and 2.2%, respectively. The remaining axes combined (RDA5-15) explained the remaining 13.5% of the total variance (Figure 2A and 2B). In this analysis, there was a clear separation of NIDGS and SIDGS individuals in RDA1, while RDA2-4 mostly separated populations within species. The environmental variables that loaded the most strongly with RDA1, and thus the separation between NIDGS and SIDGS, were ‘Steep Slopes’ (LF_SS, associated with 20 SNPs), ‘Grassland/Herbaceous’ (LC_GH, associated with 32 SNPs) and ‘Soil Temperature’ (soilPartSize, associated with 20 SNPs) (Table 2 and Figure 2A). When accounting for population structure with the pRDA analysis, we no longer find a clear distinction between NIDGS and SIDGS, and instead variation appears to mimic that of the RDA without RDA1 (Figure 2). The first pRDA axis explained 2.9% of the variance (similarly to RDA2), while pRDA2, pRDA3 and pRDA4 explained 2.6%, 2.3% and 2.1%, respectively. The remaining axes combined (RDA5-15) explained the remaining 12.7% of the total variance (Figure 2C and 2D). We also found a total of 316 SNPs associated with 17 environmental variables, with 38.6% (122 SNPs) also associated with the land formation variable ‘Ridges and Peaks’ (LF_RP) (Table 2).
Considering all analyses combined, we identified a total of 490 candidate loci, none of which were found by all four analyses (Figure 4). Comparing the population structure outlier approach (pcadapt ) to the GEAs (LFMM, RDA and pRDA), there are 29 loci found by all GEAs vs. 18 found only by the population structure outlier approach (Figure 4). Methods that do not account (i.e. correct) for population structure (pcadapt and RDA) identified 21 loci in common, while methods that account for population structure (LFMM and pRDA) identified one locus in common. The largest overlap between analyses was 168 loci found by both multivariate GEAs (RDA and pRDA).