Materials and methods

Datasets and intraspecific genetic clustering

We sampled the microsatellite datasets of two recently published regional studies, i.e., 17 populations in the East Indo-Pacific (Hernawan et al. 2017) and 11 populations in the Western Indian Ocean (Jahnke et al. 2019a). We used twelve microsatellites (i.e., Thh3, Thh15, Thh34, Thh41, TH07, TH34, TH37, TH43, TH52, TH66, TH73) for population structuring and lineage sorting of 1021 individuals from 28 populations across the tropical Indo-Pacific (Fig. 1a). We then estimated pairwise genetic differences among populations using the Cavalli-Sforza and Edwards chord distance and represented them in a network using the R package IGRAPH (Csardi & Nepusz 2006) with the addition of a custom script by Johansson et al. (2015). To visually inspect the relationships within and between the main genetic clusters inferred by STRUCTURE (Pritchard et al. 2000), we pruned the full network by sequentially removing edges (i.e., network pairwise links among sampling sites) of decreasing genetic distance until the point at which the main groups of tightly connected nodes still remained connected (in order to avoid the split of any large network cluster from the main network). We estimated the classification of sampling sites within network communities at each step of the pruning process with the “fastgreedy” community detection algorithm implemented in IGRAPH (Clauset et al. 2004, Blondel et al. 2008). Network analysis (Fig. 1b), Bayesian-based STRUCTURE (Fig. 1c), and molecular variation (AMOVA) (Table S1 in Supporting Information) revealed strong overall genetic differentiation among two distinct lineages occupying the Tropical Indo-Pacific. Based on the landscape genetic analysis of Cushman et al. (2014) and the definitions of global marine ecoregions (Spalding et al. 2007), we classified these two lineages as distinct genotypes encompassed within two biogeographic regions: the Western Tropical Indo-Pacific (WTIP) and the Central Tropical Indo-Pacific (CTIP). We then used the two lineages in subsequent ecological niche modelling.

Distribution data and marine predictors

We collected a total of 62,465 presence records of T. hemprichiifrom a recently assembled and cleaned dataset of global marine forests (Assis et al. 2020) and published literature (see Data availability). In SDM studies, it is critical to correct for sampling bias and remove clustered records, which may over-represent environmental conditions in better-surveyed regions (Kramer-Schadtet al. 2013). Therefore, presence records were filtered by: i) removing duplicated records at the resolution of our environmental predictors (i.e., keeping only one record per 5 arcmin grid cell); ii) removing records on land or with distance to land > 370 km (following other SDM studies for coastal species; e.g., Zhang et al. 2020a), and iii) performing spatial thinning using a distance of 20 km using the R package spThin (Aiello-Lammens et al.2015). This distance is a reasonable approximation of the dispersal potential for this plant traveling via floating propagules (Lacapet al. 2002), and it can also reduce potential effects of sampling bias while retaining sufficient numbers of presence records for our analyses. As significant clustering was present in the data (particularly around Australia), these procedures removed a good proportion of the presence data. Ultimately, we kept 519 records for the species-level model (hereafter “species model”), 479 records for the CTIP lineage-level model (hereafter “CTIP model”), and 26 records for the WTIP lineage-level model (hereafter “WTIP model”) (Fig. 1a).
It is important to properly select the extent of the study area used to sample background records when constructing presence-background SDMs for target species (Barve et al. 2011; Vale et al. 2014). Given previous marine SDMs (e.g., Zhang et al. 2020a) and the geographical range of T. hemprichii , we restricted our study to the areas within 370 km of land between 25°E and 180°E, and between 50°S and 40°N (Fig. 1a). Please note that our study extent includes southern Australia and New Zealand, where this species does not naturally occur. It is always challenging to estimate an appropriate study extent for a species (Barve et al. 2011), but the extent we selected should represent the plausible accessible areas to T. hemprichii over evolutionary time. We subsetted this main study extent to create separate study extents for the WTIP and CTIP lineages (Fig. 1a) based on our molecular results (see details in the Lineage genetic diversity in the Results section).
A number of marine predictors have been demonstrated to influence the geographical distribution of marine species (Bosch et al. 2018). Based on previous studies (e.g., Jayathilake & Costello 2018; Zhanget al. 2020a), we initially considered twenty such predictors for modeling, including two geographical predictors (water depth and distance to land) from the Global Marine Environment Datasets (http://gmed.auckland.ac.nz; Basher et al. 2018) and eighteen environmental predictors (including annual mean, maximum, minimum, range, average of the minimum records per year, and average of the maximum records per year) for SST, sea surface salinity, and sea surface current velocity from the Bio-ORACLE database v2.1 (https://www.bio-oracle.org; Assis et al. 2018b). In SDM studies, highly collinear predictors can lead to spurious interpretations of variable importance and unexpected predictions if correlations change in different projection scenarios (Dormann et al. 2013). Hence, we checked collinearity by calculating the pairwise Pearson’s correlation coefficients (r ) among the twenty predictors (Supporting Information Fig. S1) and selected one among highly correlated predictors (|r | > 0.7) (Dormann et al.2013) based on present-day and future data availability, biological importance, and previous findings on important variables for estimating seagrass distribution (Jayathilake & Costello 2018). In the end, we retained the two geographical predictors and six environmental predictors: annual mean current velocity, minimum current velocity, annual mean sea surface salinity, annual range of sea surface salinity, annual mean SST, and annual range of SST.
To project future habitat suitability of T. hemprichii , we considered four representative concentration pathway (RCP) scenarios (i.e., RCP 2.6, RCP 4.5, RCP 6.0, and RCP 8.5), and two time periods (i.e., 2050s: the average for 2040–2050s; and 2100s: the average for 2090–2100). We obtained the corresponding projections of future marine environmental layers from the Bio-ORACLE database v2.1. We assumed that the two geographical predictors would remain unchanged for future projections (Zhang et al. 2020a).

Niche differentiation estimation

To estimate whether the two lineages of T. hemprichii occupy different niche spaces, we characterized their realized niches using Hutchinsonian n -dimensional hypervolumes (Hutchinson 1957) sensu Blonder et al.(2018). We quantified the realized niches of the WTIP and CTIP lineages using the eight selected marine predictor variables (see previous section). In short, we extracted and standardized (i.e., zero means and unit variance) marine predictor values associated with the presence records for the two lineages. We then determined the volumes and shapes of the realized niches with the R package hypervolume using the Gaussian method (Blonder 2019). We measured the extent of niche differentiation between the two lineages with the kernel.betafunction (Mammola & Cardoso 2020) in the R package BAT (Cardosoet al. 2015, 2020). Following Carvalho & Cardoso (2020), niche differentiation between hypervolumes was partitioned into the following two processes: niche shift (replacement of space between hypervolumes) and niche contraction/expansion (net difference between hypervolumes). The niche differentiation index ranges from 0 (niches overlap entirely) to 1 (niches are fully dissimilar) (Carvalho & Cardoso 2020; Mammola & Cardoso 2020).

Species distribution modelling

We built SDMs using Maxent 3.4.4, a presence-background machine learning algorithm with two main complexity tuning parameters: regularization multiplier, which penalizes complexity by removing predictors with low predictive ability, and feature class, which allows for increasing complexity of the model response (Phillipset al. 2017). For each model (species model, WTIP model, and CTIP model), we randomly generated 10,000 background points within the corresponding study region. As Maxent’s default settings for the main tuning parameters can result in overfit models (Radosavljevic & Anderson 2014), we used a version of the R package ENMeval under expansion (1.9.0; https://github.com/jamiemkass/ENMeval) to tune our Maxent models over ranges of each parameter and chose models with optimal complexity based on performance metrics calculated on withheld data (Muscarella et al. 2014). In brief, we considered a total of 32 candidate models with different combinations of regularization multipliers (RM; ranging from 0.5 to 4.0, at 0.5 interval), which penalize complexity more with higher values, and feature classes (linear, quadratic, hinge), which allow responses with differing flexibility. Rather than using conventional random cross-validation to judge model performance, we used a spatial block cross-validation approach, which typically results in evaluations that better reflect the model’s ability to transfer to non-analog conditions (Roberts et al. 2017; Valavi et al. 2019). Briefly, each study region was divided into four spatial blocks containing an equal number of presence records, three blocks were used for model training and the remaining block for validation, then this procedure was repeated until every block was used for model validation. As with previous studies (e.g., Radosavljevic & Anderson 2014; Kass et al. 2020), the optimal model was selected by sequentially considering a 10% omission rate (i.e., the percentage of validation presences with habitat suitability predictions lower than that of the 10th quantile of training predictions), followed by the area under the receiver operating characteristics curve (AUC) calculated on the validation data (i.e., the model’s ability to discriminate between presence and background records) to break ties. We acknowledge that AUC is a poor measure for the absolute performance of presence-background models (e.g., Jiménez‐Valverde 2012), but nonetheless this metric can be used to make relative comparisons of candidate models fitted with the same data (Loboet al. 2008).
Predictive performances of the three best-performing Maxent models were further assessed using the continuous Boyce index, a reliable evaluation measure of presence-only algorithms (Hirzel et al. 2006). The continuous Boyce index ranges from –1 to 1, where positive values suggest that model predictions match well with the presence data, and negative values suggest a poor match (Hirzel et al. 2006). Variable importance for each model was determined using permutation importance calculated by Maxent. For this method, presence and background data values for each predictor variable in turn were randomly permuted and training AUC recalculated—a large drop in AUC indicates higher importance (Phillips 2017). In addition, we estimated the marginal response curves of important predictors (i.e., curves representing habitat suitability along a range of the values of one predictor variable while keeping the other predictors constant). We converted continuous habitat suitability predictions for T. hemprichii to binary values using the same 10% omission thresholds that we used for model evaluation (Radosavljevic & Anderson 2014). We then transformed the binary habitat suitability projections to the Lambert Cylindrical Equal Area projection at a resolution of 10 km and calculated areas of potential distribution (Zhang et al. 2020a).
It is of great importance to consider species dispersal ability into SDMs when estimating climate change impacts (Araújo et al. 2006; Guisan et al. 2017). Given the relatively high dispersal ability of T. hemprichii (Lacap et al. 2002) and a lack of apparent dispersal barriers in marine environments (Robinson et al. 2011), we estimated range size change under an unlimited dispersal scenario, which assumes that species have unrestricted dispersal ability and can disperse to any suitable area (Araújo et al. 2006; Zhanget al. 2020c). Range size change was calculated as follows:
\(\mathrm{range\ size\ change\ =\ }\frac{future\ suitable\ area\ -\ present\ suitable\ area}{\text{present\ suitable\ area}}\mathrm{\times 100\%}\),
where negative and positive values represent range contraction and expansion, respectively.
We used the optimal species- and lineage-level models to make projections of future potential distribution based on the different RCP scenarios for the two future time periods. Making projections using SDMs into novel environmental space (i.e., outside the range of training data) results in some degree of extrapolations, which should be quantified to determine levels of uncertainty (Elith et al.2010). Therefore, we measured the similarity between present-day and future environmental conditions using multivariate environmental similarity surfaces (MESS) (Elith et al. 2010). In practice, we calculated the MESS with the R package rmaxent (Baumgartner & Wilson 2021) for each model using the top three most important predictors via permutation importance: positive MESS values indicate conditions more similar to the training data, while negative values indicate conditions more different (i.e., novel).