Neural network emulation of the formation of organic aerosols based on
the explicit GECKO-A chemistry model
Abstract
Secondary organic aerosols (SOA) are formed from oxidation of hundreds
of volatile organic compounds (VOCs) emitted from anthropogenic and
natural sources. Accurate predictions of this chemistry are key for air
quality and climate studies due to the large contribution of organic
aerosols to submicron aerosol mass. Currently, only explicit models,
such as the Generator for Explicit Chemistry and Kinetics of Organics in
the Atmosphere (GECKO-A), can fully represent the chemical processing of
thousands of organic species. However, their extreme computational cost
prohibits their use in current chemistry-climate models, which rely on
simplified empirical parameterizations to predict SOA concentrations.
Recent applications of atmospheric chemistry emulation with machine
learning (ML) applied to the simpler chemical mechanisms of tropospheric
ozone have shown its ability to produce realistic predictions and
significantly reduce the computational cost. This study proves that ML
can accurately emulate SOA formation from an explicit chemistry model
for several precursors with 100 to 100,000 times speedup over GECKO-A,
making it computationally usable in a chemistry-climate model. To train
the ML emulator, we generated thousands of GECKO-A box simulations
sampled from a broad range of initial environmental conditions, and
focused on the chemistry of three representative SOA precursors: the
oxidation by OH of two anthropogenic (toluene, dodecane), and one
biogenic VOC (alpha-pinene). We compare fully-connected and recurrent
neural network methods and use an ensemble approach to quantify their
underlying uncertainty and robustness. The SOA predictions generally
remain stable over a simulation period of 5 days with an approximate
error of 2-8\%.