Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

loading page

Towards emulating an explicit organic chemistry mechanism with random forest models
  • Camille Mouchel-Vallon,
  • Alma Hodzic
Camille Mouchel-Vallon
Université Paul Sabatier

Corresponding Author:[email protected]

Author Profile
Alma Hodzic
National Center for Atmospheric Research (UCAR)
Author Profile

Abstract

Predicting secondary organic aerosol (SOA) formation relies either on extremely detailed, numerically expensive models accounting for the condensation of individual species or on extremely simplified, numerically affordable models parameterizing SOA formation for large-scale simulations. In this work, we explore the possibility of creating a random forest to reproduce the behavior of a detailed atmospheric organic chemistry model at a fraction of the numerical cost. A comprehensive dataset was created based on thousands of individual detailed simulations, randomly initialized to account for the variety of atmospheric chemical environments. Recurrent random forests were trained to predict organic matter formation from dodecane and toluene precursors, and the partitioning between gas and particle phases. Validation tests show that the random forests perform well without any divergence over 10 days of simulations. The distribution of errors shows that the sampling of initial conditions for the training simulations needs to focus on chemical regimes where SOA production is the most sensitive. Sensitivity tests show that specializing multiple random forests for a specific chemical regime is not more efficient than training a single general random forest for the entire dataset. The most important predictors are those providing information about the chemical regime, oxidants levels and existing organic mass. The choice of predictors is crucial as using too many unimportant predictors reduces the performances of the random forests.