Predicting West Nile Virus Mosquito Positivity Rates and Abundance: A
Comparative Evaluation of Machine Learning Methods for Epidemiological
Applications
Abstract
Mosquitoes are major vectors of disease and thus a key public health
concern. Some cities have programs to track mosquito abundance and
vector competence, but such fieldwork is expensive, time-consuming, and
retrospective. We present a comparative analysis of two
machine-learning-based regression techniques for forecasting the rate at
which mosquito abundance changes and the rate at which mosquitoes test
positive for West Nile Virus (WNV) in our AOI, the City of Chicago,
three weeks in advance. We selected an initial pool of climatic inputs
based on the findings of prior work. Ordinary least squares regression
was run on each input individually and then in various groups. A p-value
cutoff of 0.05 was used to determine which were best suited for
predicting the derivatives of mosquito abundance and WNV positivity
rate. Using these inputs, we trained four machine learning models using
two types of regression: a Random Forest Regressor (RFR) and Backward
Elimination Linear Regression (BELR). We optimized our RFR’s
hyperparameters using Randomized Search Cross Validation and further
reduced our BELR inputs using a p-value of 0.05. The enhanced vegetation
index and temperature, described in various metrics, emerged as common
inputs across the four models. In three of the four models, the
respective temperature metric was the most important feature while EVI
varied between second and last place. Our root mean square error largely
resided within the hundredths place or less, but spiked at novel,
week-to-week extremes in the testing data. Our methodology and results
indicate valuable directions for future research into forecasting
mosquito population abundance and vector competence. This work is
particularly applicable to public health programs, as our models’ use of
open-source, remote sensing data to predict how quickly the mosquito
population and their vector competence will change three weeks in
advance streamlines disease monitoring and prevention.