Towards emulating an explicit organic chemistry mechanism with random
forest models
Abstract
Predicting secondary organic aerosol (SOA) formation relies either on
extremely detailed, numerically expensive models accounting for the
condensation of individual species or on extremely simplified,
numerically affordable models parameterizing SOA formation for
large-scale simulations. In this work, we explore the possibility of
creating a random forest to reproduce the behavior of a detailed
atmospheric organic chemistry model at a fraction of the numerical cost.
A comprehensive dataset was created based on thousands of individual
detailed simulations, randomly initialized to account for the variety of
atmospheric chemical environments. Recurrent random forests were trained
to predict organic matter formation from dodecane and toluene
precursors, and the partitioning between gas and particle phases.
Validation tests show that the random forests perform well without any
divergence over 10 days of simulations. The distribution of errors shows
that the sampling of initial conditions for the training simulations
needs to focus on chemical regimes where SOA production is the most
sensitive. Sensitivity tests show that specializing multiple random
forests for a specific chemical regime is not more efficient than
training a single general random forest for the entire dataset. The most
important predictors are those providing information about the chemical
regime, oxidants levels and existing organic mass. The choice of
predictors is crucial as using too many unimportant predictors reduces
the performances of the random forests.