Predicting nitrate exposure from groundwater wells using machine
learning and meteorological conditions
Abstract
Private groundwater wells have the potential to be an unmonitored source
of contaminants that can harm human health for millions of people
throughout the United States. Developing models that predict potential
exposure to contaminants, such as nitrate, could guide sampling efforts
and allow the residents to take action to reduce their risk. Machine
learning models have been successful in predicting nitrate contamination
using geospatial information such as proximity to nitrate sources or
soil type, but previous models have not considered meteorological
factors that change temporally. In this study, we test random forest
(regression and classification) and linear regression models to predict
nitrate contamination of wells using rainfall and temperature records
over the previous 180-days. We trained and tested models for (1) all of
North Carolina, (2) each geographic region in North Carolina, (3) a
three-county region with high density animal agriculture, and (4) a
three-county region with a low density of animal agriculture. All
regression models had poor predictive performance (R2 = 0.04) for all
areas tested. The random forest classification model for the coastal
plain region showed fair agreement (Cohen’s kappa = 0.23) when trying to
predict whether contamination occurred. All other classification models
had slight or poor predictive performance. Our results show that
temporal changes in rainfall and temperature alone are not enough to
predict nitrate contamination in most areas of North Carolina but show
potential in the coastal plain region.