Abstract
Nitrous oxide (N2O) is one of the important greenhouse
gases (GHGs), with its global warming potential 265 times greater than
that of carbon dioxide (CO2). About 60% of the
anthropogenic N2O emission is from agriculture
production. To date, estimating N2O emissions from
cropland remains a challenging task because the related microbial origin
processes (e.g. incomplete nitrification and denitrification) are
controlled by a diverse factors of climate, soil, plant and human
activities. In this study, we developed a ML model with
physical/biogeochemical domain knowledge, namely knowledge guided
machine learning (KGML), for simulating daily N2O fluxes
from the agriculture ecosystem. The Gated Recurrent Unit (GRU) was used
as the basis to build the model structure. A range of ideas have been
implemented to optimize the model performance, including 1) hierarchical
structure based on variable causal relations, 2) intermediate variable
(IMV) prediction and transfer, 3) inputting IMV initials for
constraints, 4) model pretrain/retrain, and 5) multitask learning. The
developed KGML was pre-trained by millions of synthetic data generated
by an advanced PB model, ecosys, and then re-trained by
observations from six mesocosm chambers during three growing seasons.
Six other pure ML models were developed using the same data from
mesocosm chambers to serve as the benchmark for the KGML model. The
results show that KGML can always outperform the PB model in efficiency
and ML models in prediction accuracy of capturing N2O
flux magnitude and dynamics. Besides, the reasonable predictions of IMVs
increase the interpretability of KGML. We believe the footprint of KGML
development in this study will stimulate a new body of research on
interpretable machine learning for biogeochemistry and other related
geoscience processes.