A Comparison of Regression Methods for Inferring Near-Surface NO2 with
Satellite Data
Abstract
Nitrogen dioxide (NO2) is emitted during high temperature combustion
from anthropogenic and natural sources. Human exposure to high NO2
concentrations causes cardiovascular and respiratory illnesses. The EPA
operates ground monitors across the U.S. which take hourly measurements
of NO2 concentrations, providing precise measurements for assessing
human pollution exposure but with sparse spatial distribution.
Satellite-based instruments capture NO2 amounts through the atmospheric
column with global coverage at regular spatial resolution, but do not
directly measure surface NO2. This study compares regression methods
using satellite NO2 data from the TROPospheric Ozone Monitoring
Instrument (TROPOMI) to estimate annual surface NO2 concentrations in
varying geographic and land use settings across the continental U.S. We
then apply the best-performing regression models to estimate surface NO2
at 0.01o by 0.01o resolution, and we term this estimate as quasi-NO2
(qNO2). qNO2 agrees best with measurements at suburban sites
(cross-validation (CV) R2 = 0.72) and away from major roads (CV R2 =
0.75). Among U.S. regions, qNO2 agrees best with measurements in the
Midwest (CV R2 = 0.89) and agrees least in the Southwest (CV R2 = 0.65).
To account for the non-Gaussian distribution of TROPOMI NO2, we apply
data transforms, with the Anscombe transform yielding highest agreement
across the continental U.S. (CV R2 = 0.78). The interpretability,
minimal computational cost, and health relevance of qNO2 facilitates use
of satellite data in a wide range of air quality applications.