Predictor Data
While training data employed in spatial biodiversity modelling are
typically obtained from field surveys, predictor (sometimes called
environmental, explanatory, or covariate) data are derived from remotely
sensed, or modelled, spatial data extending completely across a study
region (Bryn et al 2021). These data are required to model ecosystems
continuously in space (Figure 1B). Very few studies have evaluated the
suitability of predictor data for this type of predictive modelling
(Dor-Haim et al 2019, Halvorsen 2020, Simensen 2020). We suggest
suitable predictors should encompass spatial data on biotic and abiotic
variables, which are independent from survey records (i.e., training
data). They may also include spatial data on functional ecosystem
properties (e.g., productivity), physiognomic structure (e.g., dominate
growth form), and on regular or episodic disturbance (e.g., flooding,
forestry, fire). Potential predictor datasets must be refined to address
the same limitations (e.g., correlation, collinearity, and variance
inflation) presented in modelling species and communities; we do not
review these measures (see Guisan et al 2017 for guidance). Here we
focus on the factors considered to finalize the selection of predictors
(Figure 1C) employed in our case study, and the implications of these
factors for modelling ecosystem pattern.
To assemble our pool of candidate predictors, we considered three
factors – ecological response, assembly mechanisms, and spatial
scale(s) of influence. In general, we sought predictors deemed important
for shaping variations in the distribution or prominence of those
ecosystem features (Table S1) which scale up to drive spatial ecosystem
patterns. This approach is analogous to the basis of joint species
distribution modelling, which assumes species respond mutually to both
their physical environment, and to one another (D’Amen et al 2017,
Ovaskainen and Abrego 2020). We extend this idea to encompass reciprocal
influences among biotic and abiotic variables and sought covariates
(Table S3) to predict singular and joint biotic-abiotic responses.
Many ecological assembly drivers (e.g., climate, topography, surficial
geology) affect both biotic and abiotic ecosystem constituents and
facilitate interactions between them. In our study region, a challenge
we encountered is that some predictors are only available at relatively
coarse-grains (e.g., climate - approximately 1 kilometer resolution)
precluding their use in our compilation of finer-grained predictors. Our
initial approach was to seek predictors with spatial grains similar to
our survey grain (between 25 and 400 m2, depending on
the ecosystem). Ecosystem surveys were structured around ecologically
conspicuous breaks in topographically controlled gradients (e.g., local
vegetation), so we employed direct (e.g., remotely sensed vegetation
indices) and indirect predictors (e.g., terrain) of those gradients to
compile our pool of fine-grained (50 m resolution) candidate predictors
(Table S3). Predictor data can be subsequently upscaled to coarser
spatial resolutions (e.g., 100, 250, 500, 1000 m) to test the influence
of varying spatial grains on predictive model performance. König et al
(2021), for instance, have demonstrated how joint species distribution
model performance varies at different spatial grains. Lastly,
investigating the effect of scope, defined as the ratio of spatial
extent to grain (Frazier 2022), could provide insights into scaling
relationships and their role in predictions of ecosystem pattern.
The combination of predictor data employed in a spatial biodiversity
model can be finalized (Figure 1C) before or with model fitting. Where
model fitting is used to assist with data selection, selection criteria
can be conceptual, theoretical, and or statistical (Guisan et al 2017).
Several ecologists (e.g., Araújo and Guisan 2006) have suggested that
many biodiversity distribution models do not adequately consider
relevant theory to guide predictor data selection (e.g., Poggiato et
2021). Araújo and Guisan (2006) further suggest that greater attention
needs to be given to the explanatory value of each predictor, and to its
relevance as a causal determinant of ecological pattern. We employed an
iterative approach based on model performance, and drew on theoretical
and conceptual criteria, to finalize predictor variables employed in our
case study (Table S3).