Statistical and Machine Learning Methods for Evaluating Trends in Air
Quality under Changing Meteorological Conditions
Abstract
Evaluating the influence of anthropogenic emissions changes on air
quality requires accounting for the influence of meteorological
variability. Statistical methods such as multiple linear regression
(MLR) models with basic meteorological variables are often used to
remove meteorological variability and estimate trends in measured
pollutant concentrations attributable to emissions changes. However, the
ability of these widely-used statistical approaches to correct for
meteorological variability remains unknown, limiting their usefulness in
the real-world policy evaluations. Here, we quantify the performance of
MLR and other quantitative methods using two scenarios simulated by a
chemical transport model, GEOS-Chem, as a synthetic dataset. Focusing on
the impacts of anthropogenic emissions changes in the US (2011 to 2017)
and China (2013 to 2017) on PM2.5 and
O3, we show that widely-used regression methods do not
perform well in correcting for meteorological variability and
identifying long-term trends in ambient pollution related to changes in
emissions. The estimation errors, characterized as the differences
between meteorology-corrected trends and emission-driven trends under
constant meteorology scenarios, can be reduced by 30%-42% using a
random forest model that incorporates both local and regional scale
meteorological features. We further design a correction method based on
GEOS-Chem simulations with constant emission input and quantify the
degree to which emissions and meteorological influences are inseparable,
due to their process-based interactions. We conclude by providing
recommendations for evaluating the effectiveness of emissions reduction
policies using statistical approaches.