Predictor Data
While training data employed in spatial biodiversity modelling are typically obtained from field surveys, predictor (sometimes called environmental, explanatory, or covariate) data are derived from remotely sensed, or modelled, spatial data extending completely across a study region (Bryn et al 2021). These data are required to model ecosystems continuously in space (Figure 1B). Very few studies have evaluated the suitability of predictor data for this type of predictive modelling (Dor-Haim et al 2019, Halvorsen 2020, Simensen 2020). We suggest suitable predictors should encompass spatial data on biotic and abiotic variables, which are independent from survey records (i.e., training data). They may also include spatial data on functional ecosystem properties (e.g., productivity), physiognomic structure (e.g., dominate growth form), and on regular or episodic disturbance (e.g., flooding, forestry, fire). Potential predictor datasets must be refined to address the same limitations (e.g., correlation, collinearity, and variance inflation) presented in modelling species and communities; we do not review these measures (see Guisan et al 2017 for guidance). Here we focus on the factors considered to finalize the selection of predictors (Figure 1C) employed in our case study, and the implications of these factors for modelling ecosystem pattern.
To assemble our pool of candidate predictors, we considered three factors – ecological response, assembly mechanisms, and spatial scale(s) of influence. In general, we sought predictors deemed important for shaping variations in the distribution or prominence of those ecosystem features (Table S1) which scale up to drive spatial ecosystem patterns. This approach is analogous to the basis of joint species distribution modelling, which assumes species respond mutually to both their physical environment, and to one another (D’Amen et al 2017, Ovaskainen and Abrego 2020). We extend this idea to encompass reciprocal influences among biotic and abiotic variables and sought covariates (Table S3) to predict singular and joint biotic-abiotic responses.
Many ecological assembly drivers (e.g., climate, topography, surficial geology) affect both biotic and abiotic ecosystem constituents and facilitate interactions between them. In our study region, a challenge we encountered is that some predictors are only available at relatively coarse-grains (e.g., climate - approximately 1 kilometer resolution) precluding their use in our compilation of finer-grained predictors. Our initial approach was to seek predictors with spatial grains similar to our survey grain (between 25 and 400 m2, depending on the ecosystem). Ecosystem surveys were structured around ecologically conspicuous breaks in topographically controlled gradients (e.g., local vegetation), so we employed direct (e.g., remotely sensed vegetation indices) and indirect predictors (e.g., terrain) of those gradients to compile our pool of fine-grained (50 m resolution) candidate predictors (Table S3). Predictor data can be subsequently upscaled to coarser spatial resolutions (e.g., 100, 250, 500, 1000 m) to test the influence of varying spatial grains on predictive model performance. König et al (2021), for instance, have demonstrated how joint species distribution model performance varies at different spatial grains. Lastly, investigating the effect of scope, defined as the ratio of spatial extent to grain (Frazier 2022), could provide insights into scaling relationships and their role in predictions of ecosystem pattern.
The combination of predictor data employed in a spatial biodiversity model can be finalized (Figure 1C) before or with model fitting. Where model fitting is used to assist with data selection, selection criteria can be conceptual, theoretical, and or statistical (Guisan et al 2017). Several ecologists (e.g., Araújo and Guisan 2006) have suggested that many biodiversity distribution models do not adequately consider relevant theory to guide predictor data selection (e.g., Poggiato et 2021). Araújo and Guisan (2006) further suggest that greater attention needs to be given to the explanatory value of each predictor, and to its relevance as a causal determinant of ecological pattern. We employed an iterative approach based on model performance, and drew on theoretical and conceptual criteria, to finalize predictor variables employed in our case study (Table S3).