loading page

Predicting West Nile Virus Mosquito Positivity Rates and Abundance: A Comparative Evaluation of Machine Learning Methods for Epidemiological Applications
  • +2
  • Julianna Schneider,
  • Alessandro Greco,
  • Jillian Chang,
  • Maria Molchanova,
  • Luke Shao
Julianna Schneider
NASA SEES (STEM Enhancement in the Earth Sciences)

Corresponding Author:[email protected]

Author Profile
Alessandro Greco
NASA SEES (STEM Enhancement in Earth Sciences)
Author Profile
Jillian Chang
NASA SEES (STEM Enhancement in Earth Sciences)
Author Profile
Maria Molchanova
NASA SEES (STEM Enhancement in Earth Sciences)
Author Profile
Luke Shao
NASA SEES (STEM Enhancement in Earth Sciences)
Author Profile

Abstract

Mosquitoes are major vectors of disease and thus a key public health concern. Some cities have programs to track mosquito abundance and vector competence, but such fieldwork is expensive, time-consuming, and retrospective. We present a comparative analysis of two machine-learning-based regression techniques for forecasting the rate at which mosquito abundance changes and the rate at which mosquitoes test positive for West Nile Virus (WNV) in our AOI, the City of Chicago, three weeks in advance. We selected an initial pool of climatic inputs based on the findings of prior work. Ordinary least squares regression was run on each input individually and then in various groups. A p-value cutoff of 0.05 was used to determine which were best suited for predicting the derivatives of mosquito abundance and WNV positivity rate. Using these inputs, we trained four machine learning models using two types of regression: a Random Forest Regressor (RFR) and Backward Elimination Linear Regression (BELR). We optimized our RFR’s hyperparameters using Randomized Search Cross Validation and further reduced our BELR inputs using a p-value of 0.05. The enhanced vegetation index and temperature, described in various metrics, emerged as common inputs across the four models. In three of the four models, the respective temperature metric was the most important feature while EVI varied between second and last place. Our root mean square error largely resided within the hundredths place or less, but spiked at novel, week-to-week extremes in the testing data. Our methodology and results indicate valuable directions for future research into forecasting mosquito population abundance and vector competence. This work is particularly applicable to public health programs, as our models’ use of open-source, remote sensing data to predict how quickly the mosquito population and their vector competence will change three weeks in advance streamlines disease monitoring and prevention.