loading page

Prediction of Distributed River Sediment Respiration Rates using Community-Generated Data and Machine Learning
  • +4
  • Stefan F. Gary,
  • Timothy D. Scheibe,
  • Em Rexer,
  • Alvaro Vidal Torreira,
  • Vanessa A Garayburu-Caruso,
  • Amy E. Goldman,
  • James C Stegen
Stefan F. Gary
Parallel Works, Inc.
Author Profile
Timothy D. Scheibe
Pacific Northwest National Laboratory (DOE)

Corresponding Author:[email protected]

Author Profile
Em Rexer
Pacific Northwest National Laboratory
Author Profile
Alvaro Vidal Torreira
Parallel Works, Inc.
Author Profile
Vanessa A Garayburu-Caruso
Pacific Northwest National Laboratory
Author Profile
Amy E. Goldman
Pacific Northwest National Laboratory (DOE)
Author Profile
James C Stegen
Pacific Northwest National Laboratory (DOE)
Author Profile

Abstract

River sediment microbial respiration is a key indicator of ecosystem functioning and the biogeochemical fluxes across this critical zone link surface and subsurface waters. As such, there is tremendous interest in measuring and mapping these respiration rates. Respiration observations are expensive and labor intensive; there is limited data available to the community. An open science, collaborative initiative is collecting samples for respiration rate analysis and multi-scale metadata; this evolving data set is being used for making machine learning (ML) predictions at unsampled sites to help inform continued community engagement. However, it is a challenge to find an optimum configuration for ML models to work with this feature-rich (i.e. 100+ possible input variables) data set. Here, we present results from a two-tiered approach to managing the analysis of this complex data set: 1) a stacked ensemble of models that automatically optimizes hyperparameters and manages the training of many models and 2) feature permutation importance to detect the most important features in the models. The major elements of this workflow are modular, portable, open, and cloud-based thus making this implementation a potential template for other applications. The models developed here predict that sediment organic matter chemistry is one of the most important features for predicting sediment respiration rate. Other larger-scale, important features fall into the categories of climatic, ecological, geological, and fluvial settings. Leveraging these larger-scale features to generate data-driven estimates of river sediment respiration rates reveals spatially consistent but heterogeneous patterns across the river network of the Columbia River Basin.
21 Mar 2024Submitted to ESS Open Archive
25 Mar 2024Published in ESS Open Archive