loading page

Annual 30 m soybean yield mapping in Brazil using long-term satellite observations, climate data and machine learning
  • +1
  • Xiao-Peng Song,
  • Haijun Li,
  • Peter Potapov,
  • Matthew C Hansen
Xiao-Peng Song
Texas Tech University

Corresponding Author:[email protected]

Author Profile
Haijun Li
Texas Tech University
Author Profile
Peter Potapov
University of Maryland
Author Profile
Matthew C Hansen
University of Maryland
Author Profile

Abstract

Long-term spatially explicit information on crop yield is essential for understanding food security in a changing climate. Here we present a study that combines twenty-years of Landsat and MODIS data, climate and weather records, municipality-level crop yield statistics, random forests and linear regression models for mapping crop yield in a multi-temporal, multi-scale modeling framework. The study was conducted for soybean in Brazil, the world’s largest producer and exporter of this commodity crop. Using a recently developed 30 m resolution, annual (2001-2019) soybean classification map product, we aggregated multi-temporal phenological metrics derived from Landsat and MODIS data over soybean pixels to the municipality scale. We combined phenological metrics with topographic features, long-term climate data, in-season weather data and soil variables as inputs to machine learning models. We trained a multi-year random forests model using yield statistics as reference and subsequently applied linear regression to adjust the biases in the direct output of the random forests model. This model combination achieved the best performance with a root-mean-square-error (RMSE) of 344 kg/ha (12% relative to long-term mean yield) and an r2 of 0.69, on the basis of 20% withheld test data. The RMSE of the leave-one-year-out assessment ranged from 259 kg/ha to 816 kg/ha. To eliminate the artifacts caused by the coarse-resolution climate and weather data, we developed multiple models with different categories of input variables. Employing the per-pixel uncertainty estimates of different models, the final soybean yield maps were produced through per-pixel model composition. We applied the models trained on 2001-2019 data to 2020 data and produced a soybean yield map for 2020, demonstrating the predictive capability of trained machine learning models for operational yield mapping in future years. Our research showed that combining satellite, climate and weather data and machine learning could effectively map crop yield at high resolution, providing critical information to understand yield growth, anomaly and food security.