Toward data-driven generation and evaluation of model structure for
integrated representations of human behavior in water resources systems
Abstract
Simulations of human behavior in water resources systems are challenged
by uncertainty in model structure and parameters. The increasing
availability of observations describing these systems provides the
opportunity to infer a set of plausible model structures using
data-driven approaches. This study develops a three-phase approach to
the inference of model structures and parameterizations from data:
problem definition, model generation, and model evaluation, illustrated
on a case study of land use decisions in the Tulare Basin, California.
We encode the generalized decision problem as an arbitrary mapping from
a high-dimensional data space to the action of interest and use
multi-objective genetic programming to search over a family of functions
that perform this mapping for both regression and classification tasks.
To facilitate the discovery of models that are both realistic and
interpretable, the algorithm selects model structures based on
multi-objective optimization of (1) their performance on a training set
and (2) complexity, measured by the number of variables, constants, and
operations composing the model. After training, optimal model structures
are further evaluated according to their ability to generalize to
held-out test data and clustered based on their performance, complexity,
and generalization properties. Finally, we diagnose the causes of good
and bad generalization by performing sensitivity analysis across model
inputs and within model clusters. This study serves as a template to
inform and automate the problem-dependent task of constructing robust
data-driven model structures to describe human behavior in water
resources systems.