DISTRICT LEVEL WHEAT YIELD PREDICTION FROM COARSE RESOLUTION SATELLITE
DATA USING MACHINE LEARNING TECHNIQUES
Abstract
Regional crop production estimates are important in both public and
private sectors to ensure the adequacy of a food supply and aid
policymakers and farmers in managing harvest, storage, import/export,
transportation, and anticipate market fluctuations. Food security will
be progressively challenged by population growth and climate change.
Thus, the prediction of accurate regional crop yield is essential for
national food security and the sustainable development of the Indian
agriculture sector. In this study, we have selected Punjab, the highest
wheat yielding state in India. The district-wise wheat yield data were
available for the year 2000 – 2019. We have used several covariates for
crop health viz. normalized difference vegetation index (NDVI), leaf
area index (LAI), fraction of absorbed photosynthetically active
radiation (fAPAR); meteorological indicators viz. land surface
temperature (LST), and evapotranspiration (ET); and surface
characteristics viz. protrusion coefficient (PC). These indicators were
generated at 250 m spatial resolution from the MODIS data using Google
Earth Engine. The whole data was divided into two groups for training
(2000 – 2009, 2011, 2013, 2014, 2016 - 2019) and testing (2010, 2012,
2015), which were randomly selected. This study uses the random forest
(RF) regression method to create a wheat yield prediction model. We
created several combinations of covariates and found that fAPAR and ET
are highly correlated with NDVI and do not have much influence on the
model’s prediction accuracy. Hence, only four out of six covariates were
selected for final training. The coefficient of determination between
district-level yield vs. (NDVI/LAI/PC/LST) was 0.37/0.31/0.15/0.13
respectively. We used randomized search cross-validation as well as grid
search cross-validation for hyper-parameter tuning. Furthermore, we used
mean absolute error (MAE) and accuracy as quality metrics. The MAE for
training was 0.1870 t/Ha with 95.81% accuracy, whereas the MAE on test
data was obtained as 0.4293 t/Ha with 90.02% accuracy. The results of
this study are within acceptable error limits of the published research
articles. Overall, this study demonstrates that covariates derived from
coarse resolution satellite data can predict district-level crop yield
with reasonable accuracy.