Abstract
The Global Carbon Project estimates that the terrestrial biosphere has
absorbed about one-third of anthropogenic CO2 emissions
during the 1959-2019 period. This sink-estimate is produced by an
ensemble of terrestrial biosphere models collectively referred to as the
TRENDY ensemble and is consistent with the land uptake inferred from the
residual of emissions and ocean uptake. The purpose of our study is to
understand how well TRENDY models reproduce the processes that drive the
terrestrial carbon sink. One challenge is to decide what level of
agreement between model output and observation-based reference data is
adequate considering that reference data are prone to uncertainties. To
define such a level of agreement, we compute benchmark scores that
quantify the similarity between independently derived reference datasets
using multiple statistical metrics. Models are considered to perform
well if their model scores reach benchmark scores. Our results show that
reference data can differ considerably, causing benchmark scores to be
low. Model scores are often of similar magnitude as benchmark scores,
implying that model performance is reasonable given how different
reference data are. While model performance is encouraging, ample
potential for improvements remains, including a reduction in a positive
leaf area index bias, improved representations of processes that govern
soil organic carbon in high latitudes, and an assessment of causes that
drive the inter-model spread of gross primary productivity in boreal
regions and humid tropics. The success of future model development will
increasingly depend on our capacity to reduce and account for
observational uncertainties.