A Novel Machine Learning Approach for Cotton Yield Prediction

Alakananda Mitra; Sahila Beegum; David Fleisher; Vangimalla R Reddy; Wenguang Sun; Chittaranjan Ray; Dennis Timlin; Arindam Malakar

doi:10.22541/essoar.170365314.45002421/v1

loading page

A Novel Machine Learning Approach for Cotton Yield Prediction

Alakananda Mitra,
Sahila Beegum,
David Fleisher,
Vangimalla R Reddy,
Wenguang Sun,
Chittaranjan Ray,
Dennis Timlin,
Arindam Malakar

Abstract

Cotton production in the United States has reduced since 1950. The U.S. cotton industry is committed to sustainable cotton production practices that reduce water, land, and energy usage and soil loss while improving soil health and cotton yield. Various climate-smart agriculture strategies have been planned that boost yields and may lower operating costs. Cultivars, soil types, management strategies, pests and diseases, climate, and weather patterns impact crops in complex and nonlinear ways and make crop yield prediction difficult. This is where machine learning (ML) comes in. In this work (Fig.1), we aim to accurately predict cotton yield using an ML method incorporating the effects of climate change, soil variety, cultivars, and the nitrogen from NH₄, NO₃, and Urea. Two types of cotton yield data—field data and synthetic data—were used. The field data was collected in the 1980s and early 1990s across the southern cotton belt. The dataset consisted of different soil types, cultivars, and amounts of nitrogen. However, these data do not reflect the most recent effects of climate change over the past few years. To address this issue, six years of cotton yield data were generated using the process-based cotton model, GOSSYM. This dataset helps train the ML algorithm with the climate change effect and more precisely predict cotton yield. We concentrated on three southern states: Texas, Mississippi, and Georgia. For each state, three different locations in the cotton-producing counties were chosen. Weather data from 2017 to 2022 for each location were generated using the POWER Data Access Viewer web interface. The same planting and harvest dates were selected for all cases. For each case, the accumulated heat unit (AHU) was calculated from the weather data and used as one of the inputs to the ML model. Instead of applying time series weather data, calculating AHU simplified the scenario, and we were able to reduce the number of computations. Soil types, cultivars, and amounts of nitrogen were varied to create combinations of inputs likely to correspond to the range of farm input factors currently being experienced in these locations. We then developed a Random Forest Regressor to predict the yield. The results show the use of this method is highly accurate (~89%), with R² averaging around 0.82 and a root mean square error of 117.03 kg/ha.

15 Dec 2023Submitted to ESS Open Archive

27 Dec 2023Published in ESS Open Archive

Abstract

Preprint timeline