Statistical and Machine Learning Methods for Evaluating Trends in Air
Quality under Changing Meteorological Conditions
Abstract
Evaluating the influence of anthropogenic emissions changes on air
quality requires accounting for the influence of meteorological
variability. Statistical methods such as multiple linear regression
(MLR) models with basic meteorological variables are often used to
remove meteorological variability and estimate trends in measured
pollutant concentrations attributable to emissions changes. However, the
ability of these widely-used statistical approaches to correct for
meteorological variability remains unknown, limiting their usefulness in
the real-world policy evaluations. Here, we quantify the performance of
MLR and other quantitative methods using two scenarios simulated by a
chemical transport model, GEOS-Chem, as a synthetic dataset. Focusing on
the impacts of anthropogenic emissions changes in the US (2011 to 2017)
and China (2013 to 2017) on PM2.5 and O3, we show that widely-used
regression methods do not perform well in correcting for meteorological
variability and identifying long-term trends in ambient pollution
related to changes in emissions. The estimation errors, characterized as
the differences between meteorology-corrected trends and emission-driven
trends under constant meteorology scenarios, can be reduced by 30%-42%
using a random forest model that incorporates both local and regional
scale meteorological features. We further design a correction method
based on GEOS-Chem simulations with constant emission input and quantify
the degree to which emissions and meteorological influences are
inseparable, due to their process-based interactions. We conclude by
providing recommendations for evaluating the effectiveness of emissions
reduction policies using statistical approaches.