Stefan F. Gary

and 6 more

River sediment microbial respiration is a key indicator of ecosystem functioning and the biogeochemical fluxes across this critical zone link surface and subsurface waters. As such, there is tremendous interest in measuring and mapping these respiration rates. Respiration observations are expensive and labor intensive; there is limited data available to the community. An open science, collaborative initiative is collecting samples for respiration rate analysis and multi-scale metadata; this evolving data set is being used for making machine learning (ML) predictions at unsampled sites to help inform continued community engagement. However, it is a challenge to find an optimum configuration for ML models to work with this feature-rich (i.e. 100+ possible input variables) data set. Here, we present results from a two-tiered approach to managing the analysis of this complex data set: 1) a stacked ensemble of models that automatically optimizes hyperparameters and manages the training of many models and 2) feature permutation importance to detect the most important features in the models. The major elements of this workflow are modular, portable, open, and cloud-based thus making this implementation a potential template for other applications. The models developed here predict that sediment organic matter chemistry is one of the most important features for predicting sediment respiration rate. Other larger-scale, important features fall into the categories of climatic, ecological, geological, and fluvial settings. Leveraging these larger-scale features to generate data-driven estimates of river sediment respiration rates reveals spatially consistent but heterogeneous patterns across the river network of the Columbia River Basin.

Yunxiang Chen

and 16 more

Streambed grain sizes and hydro-biogeochemistry (HBGC) control river functions. However, measuring their quantities, distributions, and uncertainties is challenging due to the diversity and heterogeneity of natural streams. This work presents a photo-driven, artificial intelligence (AI)-enabled, and theory-based workflow for extracting the quantities, distributions, and uncertainties of streambed grain sizes and HBGC parameters from photos. Specifically, we first trained You Only Look Once (YOLO), an object detection AI, using 11,977 grain labels from 36 photos collected from 9 different stream environments. We demonstrated its accuracy with a coefficient of determination of 0.98, a Nash–Sutcliffe efficiency of 0.98, and a mean absolute relative error of 6.65% in predicting the median grain size of 20 testing photos. The AI is then used to extract the grain size distributions and determine their characteristic grain sizes, including the 5th, 50th, and 84th percentiles, for 1,999 photos taken at 66 sites. With these percentiles, the quantities, distributions, and uncertainties of HBGC parameters are further derived using existing empirical formulas and our new uncertainty equations. From the data, the median grain size and HBGC parameters, including Manning’s coefficient, Darcy-Weisbach friction factor, interstitial velocity magnitude, and nitrate uptake velocity, are found to follow log-normal, normal, positively skewed, near log-normal, and negatively skewed distributions, respectively. Their most likely values are 6.63 cm, 0.0339 s·m-1/3, 0.18, 0.07 m/day, and 1.2 m/day, respectively. While their average uncertainty is 7.33%, 1.85%, 15.65%, 24.06%, and 13.88%, respectively. Major uncertainty sources in grain sizes and their subsequent impact on HBGC are further studied.

Amy E. Goldman

and 4 more

The sciences struggle to integrate across disciplines, coordinate across data generation and modeling activities, produce connected open data, and build strong networks to engage stakeholders within and beyond the scientific community. The American Geophysical Union (AGU) is divided into 25 sections intended to encompass the breadth of the geosciences. Here, we introduce a special collection of commentary articles spanning 19 AGU sections on challenges and opportunities associated with the use of ICON science principles. These principles focus on research intentionally designed to be Integrated, Coordinated, Open, and Networked (ICON) with the goal of maximizing mutual benefit (among stakeholders) and cross-system transferability of science outcomes. This article 1) summarizes the ICON principles; 2) discusses the crowdsourced approach to creating the collection; 3) explores insights from across the articles; and 4) proposes steps forward. There were common themes among the commentary articles, including broad agreement that the benefits of using ICON principles outweigh the costs, but that using ICON principles has important risks that need to be understood and mitigated. It was also clear that the ICON principles are not monolithic or static, but should instead be considered a heuristic tool that can and should be modified to meet changing needs. As a whole, the collection is intended as a resource for scientists pursuing ICON science and represents an important inflection point in which the geosciences community has come together to offer insights into ICON principles as a unified approach for improving how science is done across the geosciences and beyond.

Timothy Scheibe

and 18 more

River corridors, the spatial domains around rivers in which river water interacts with surrounding sediment and rock, are important components of watersheds. They comprise extremely complex ecosystems: heterogeneous at all spatial scales with strong temporal dynamics, coupled biological, geochemical, and hydrologic processes, and ubiquitous human impacts. We present several ways that our project, focused around the 75 km Hanford Reach of the Columbia River but with multiple connections to other systems, is addressing this challenge. These include 1) deployment of intensive, automated sensor networks supplemented by data from the Hanford Environmental Information System (HEIS) for hyporheic zone monitoring 2) data assimilation of these and other data into models using joint hydrologic and geophysical inversion, 3) integrating MASS2 model outputs and bathymetry data using machine learning to classify hydromorphologic features, 4) a community-based effort to develop broad understanding of organic carbon biogeochemistry and microbiomes in diverse river systems, and 5) use of multi-‘omics data to develop new biogeochemical reaction networks. These underpin the incorporation of process understanding and diverse data into high-resolution mechanistic models, and employment of those models to develop reduced-order models that can be applied at large scales while retaining the effects of local features and processes. In so doing we are contributing to reduction of uncertainties associated with major Earth system biogeochemical fluxes, thus improving predictions of environmental and human impacts on water quality and riverine ecosystems and supporting environmentally responsible management of linked energy-water systems.