Hoshin Gupta - ESS Open Archive

Hoshin Gupta

Public Documents 4

On Robustness of the Explanatory Power of Machine Learning Models

Banamali Panigrahi

and 5 more

March 15, 2024

Machine learning (ML) is increasingly considered the solution to environmental problems where only limited or no physico-chemical process understanding is available. But when there is a need to provide support for high-stake decisions, where the ability to explain possible solutions is key to their acceptability and legitimacy, ML can come short. Here, we develop a method, rooted in formal sensitivity analysis (SA), that can detect the primary controls on the outputs of ML models. Unlike many common methods for explainable artificial intelligence (XAI), this method can account for complex multi-variate distributional properties of the input-output data, commonly observed with environmental systems. We apply this approach to a suite of ML models that are developed to predict various water quality variables in a pilot-scale experimental pit lake. A critical finding is that subtle alterations in the design of an ML model (such as variations in random seed for initialization, functional class, hyperparameters, or data splitting) can lead to entirely different representational interpretations of the dependence of the outputs on explanatory inputs. Further, models based on different ML families (decision trees, connectionists, or kernels) seem to focus on different aspects of the information provided by data, although displaying similar levels of predictive power. Overall, this underscores the importance of employing ensembles of ML models when explanatory power is sought. Not doing so may compromise the ability of the analysis to deliver robust and reliable predictions, especially when generalizing to conditions beyond the training data.

Deep Learning Models in Hydrology Have Not Yet Achieved Entity Awareness

Benedikt Heudorfer

and 2 more

October 18, 2024

Hydrology is shifting from process-based to deep learning models. Entity-aware (EA) deep learning models with static features (predominantly physiographic proxies) merged to dynamic forcing features show significant performance improvements. However, recent studies challenge the notion that combining dynamic forcings with static attributes make such models entity aware, suggesting static features are not effectively leveraged for generalization. We examine entity awareness using state-of-the-art Long-Sort Term Memory (LSTM) networks with the CAMELS-US dataset. We compare EA models provided with physiographic static features with ablated variants not provided with static inputs. Findings indicate that the superior performance of EA models is largely due to information provided by meteorological data, with minimal contributions by physiographic static features, particularly when tested out-of-sample. These results challenge previously held assumptions regarding how physiographic proxies are used to achieve generalization ability in EA Models, highlighting the need for new approaches for robust generalization in deep learning models.

Convergent and transdisciplinary integration: On the future of integrated modeling of...

Saman Razavi

and 35 more

June 13, 2024

The notion of convergent and transdisciplinary integration, which is about braiding together different knowledge systems, is becoming the mantra of numerous initiatives aimed at tackling pressing water challenges. Yet, the transition from rhetoric to actual implementation is impeded by incongruence in semantics, methodologies, and discourse among disciplinary scientists and societal actors. This paper confronts these disciplinary barriers by advocating a synthesis of existing and missing links across the frontiers distinguishing hydrology from engineering, the social sciences and economics, Indigenous and place-based knowledge, and studies of other interconnected natural systems such as the atmosphere, cryosphere, and ecosphere. Specifically, we embrace ‘integrated modeling’, in both quantitative and qualitative senses, as a vital exploratory instrument to advance such integration, providing a means to navigate complexity and manage the uncertainty associated with understanding, diagnosing, predicting, and governing human-water systems. While there are, arguably, no bounds to the pursuit of inclusivity in representing the spectrum of natural and human processes around water resources, we advocate that integrated modeling can provide a focused approach to delineating the scope of integration, through the lens of three fundamental questions: a) What is the modeling ‘purpose’? b) What constitutes a sound ‘boundary judgment’? and c) What are the ‘critical uncertainties’ and how do they propagate through interconnected subsystems? More broadly, we call for investigating what constitutes warranted ‘systems complexity’, as opposed to unjustified ‘computational complexity’ when representing complex natural and human-natural systems, with particular attention to interdependencies and feedbacks, nonlinear dynamics and thresholds, hysteresis, time lags, and legacy effects.

On strictly enforced mass conservation constraints for modeling the rainfall-runoff p...

Jonathan Frame

and 4 more

January 20, 2022

It has been proposed that conservation laws might not be beneficial for accurate hydrological modeling due to errors in input (precipitation) and target (streamflow) data (particularly at the event time scale), and this might explain why deep learning models (which are not based on enforcing closure) can out-perform catchment-scale conceptual and process-based models at predicting streamflow. We test this hypothesis at the event and multi-year time scale using physics-informed (mass conserving) machine learning and find that: (1) enforcing closure in the rainfall-runoff mass balance does appear to harm the overall skill of hydrological models, (2) deep learning models learn to account for spatiotemporally variable biases in data (3) however this “closure” effect accounts for only a small fraction of the difference in predictive skill between deep learning and conceptual models.