loading page

Hourly and Daily PM2.5 Estimations using MERRA-2: A Machine Learning Approach
  • +3
  • Alqamah Sayeed,
  • Paul Lin,
  • Pawan Gupta,
  • Nhu Nguyen Minh Tran,
  • Virginie Buchard,
  • Sundar A Chirstopher
Alqamah Sayeed
Universities Space Research Association/ NASA-MSFC, Universities Space Research Association/ NASA-MSFC

Corresponding Author:[email protected]

Author Profile
Paul Lin
Universities Space Research Association (USRA), Huntsville, AL, USA, Universities Space Research Association (USRA), Huntsville, AL, USA
Author Profile
Pawan Gupta
GESTAR/USRA/NASA, GESTAR/USRA/NASA
Author Profile
Nhu Nguyen Minh Tran
University of Alabama in Huntsville, University of Alabama in Huntsville
Author Profile
Virginie Buchard
NASA Goddard Modeling and Assimilation Office, NASA Goddard Modeling and Assimilation Office
Author Profile
Sundar A Chirstopher
University of Alabama in Huntsville, USA, University of Alabama in Huntsville, USA
Author Profile

Abstract

Health and environmental hazards related to high pollutant concentrations have become a serious issue from the perspectives of public policy and human health. The objective of this research is to improve the estimation of grid-wise PM2.5, a criteria pollutant, by reducing systematic bias in estimating PM2.5 empirically from speciation provided by MERRA-2 using a ML approach. We present a unique application of machine learning (ML) for estimating hourly PM2.5 concentrations at grid points of Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). The model was trained using various meteorological parameters and aerosol species simulated by MERRA-2 and ground measurements from Environmental Protection Agency (EPA) air quality system (AQS) stations. monitors. The ML approach significantly improved performance and reduced mean bias in the 0-10 µg m-3 range. We also used the Random Forest ML model for each EPA region using one year of collocated datasets. The resulting ML models for each EPA region were validated and the aggregate data set has a Pearson correlation of 0.88 (RMSE = 4.8 µg m-3) and 0.82 (RMSE = 5.8 µg m-3) for training and testing, respectively. The correlation (and RMSE) increased to 0.89 (4.0), 0.95 (1.6), 0.94 (1.1) for daily, monthly, and yearly average comparisons. The results from initial implementation of the ML model for global region are encouraging but require more research and development to overcome challenges associated with data gaps in many parts of the world.