Geophysical Bias Correction of Trace Green House Gas Satellite
Retrievals Using Explainable Machine Learning Methods
Abstract
OCO-2, launched in 2014, uses reflected solar spectra and other
retrieved geophysical variables to estimate (“retrieve”) the column
averaged dry air mole fraction of CO2, termed XCO2. A critical issue in
satellite estimates of trace greenhouse gasses and remote sensing at
large is the error distribution of an estimated target variable which
arises from instrument artifacts as well as the under-determined nature
of the retrieval of the quantities of interest. A large portion of the
error is often incurred during inference from measurement of retrieved
physical variables. These residual errors are typically corrected using
ground truth observations of the target variable or some other truth
proxy. Previous studies used multilinear regression to model the error
distribution with a few covariates from the retrieved state vector,
sometimes termed “features.” This presentation will cover the bias
correction of XCO2 error attributed to retrieved covariates with a novel
approach utilizing explainable Machine Learning methods (XAI) on
simulated sounding retrievals from GeoCarb. Utilization of non-linear
models (Zhou, Grassotti 2020) or models that can capture non-linearity
implicitly (Lorente et al. 2021) have been shown to improve on linear
methods in operation. Our approach uses a gradient boosted decision tree
ensemble method, XGBoost, that captures non-linear relations between
input features and the target variable. XGBoost also incorporates
regularization to prevent overfitting, while also remaining resilient to
noise and large outliers – a feature missing from other ensemble DT
methods. Decision Tree based models provide inherent feature importance
that allows for high interpretability. We also approach post training
analysis with model agnostic, explainable methods (XAI). XAI methods
allow for rigorous insight into the causes of a model’s decision (Gilpin
et al. 2018). By applying these techniques, we will demonstrate our
approach provides reduced residual errors relative to the operational
method as well as yielding an uncertainty estimate in bias corrected
XCO2, which is currently not treated separately from the posterior
uncertainty estimate derived from the retrieval algorithm.