Annual 30 m soybean yield mapping in Brazil using long-term satellite
observations, climate data and machine learning
Abstract
Long-term spatially explicit information on crop yield is essential for
understanding food security in a changing climate. Here we present a
study that combines twenty-years of Landsat and MODIS data, climate and
weather records, municipality-level crop yield statistics, random
forests and linear regression models for mapping crop yield in a
multi-temporal, multi-scale modeling framework. The study was conducted
for soybean in Brazil, the world’s largest producer and exporter of this
commodity crop. Using a recently developed 30 m resolution, annual
(2001-2019) soybean classification map product, we aggregated
multi-temporal phenological metrics derived from Landsat and MODIS data
over soybean pixels to the municipality scale. We combined phenological
metrics with topographic features, long-term climate data, in-season
weather data and soil variables as inputs to machine learning models. We
trained a multi-year random forests model using yield statistics as
reference and subsequently applied linear regression to adjust the
biases in the direct output of the random forests model. This model
combination achieved the best performance with a root-mean-square-error
(RMSE) of 344 kg/ha (12% relative to long-term mean yield) and an r2 of
0.69, on the basis of 20% withheld test data. The RMSE of the
leave-one-year-out assessment ranged from 259 kg/ha to 816 kg/ha. To
eliminate the artifacts caused by the coarse-resolution climate and
weather data, we developed multiple models with different categories of
input variables. Employing the per-pixel uncertainty estimates of
different models, the final soybean yield maps were produced through
per-pixel model composition. We applied the models trained on 2001-2019
data to 2020 data and produced a soybean yield map for 2020,
demonstrating the predictive capability of trained machine learning
models for operational yield mapping in future years. Our research
showed that combining satellite, climate and weather data and machine
learning could effectively map crop yield at high resolution, providing
critical information to understand yield growth, anomaly and food
security.