The GRRIEn Analysis Framework: effective supervised learning of earth
surface processes at large spatial scales.
Abstract
Globally available, georeferenced data from earth observing satellites
and coupled earth systems models provide new opportunities to use
limited field observations to infer trends in environmental processes
across unsampled locations and over time. Building statistical models
that are intended generalize earth surface processes across large
spatial scales represents a new frontier in supervised statistical
inference. For example, care must be taken to collect unbiased samples
given the complexity of the earth system: surface features can appear
vastly different from the perspective of multispectral and SAR imagery
in different atmospheric/landscape contexts. Environmental processes
occur at variable spatial/temporal scales, and sampling resolution can
drastically alter the appearance of patterns, and lead to spatial and
temporal autocorrelation which can bias model weights and/or parameter
estimates. Multicollinearity in multivariate datasets, which is also
scale dependent, can inflate variance in parameter/weight estimates. All
these in tandem can undermine the robustness of models in predicting in
out-of-sample contexts. To overcome this, the GRRIEn (Generalizable,
Reproducible, Robust, and Interpretable Environmental) analysis
framework is introduced as a standard method of training and validating
supervised data-driven models at large spatial scales. The method is
explained, and demonstrated with a case study detecting surface water at
CONUS scale using SAR and multispectral imagery.