A machine learning based approach to clinopyroxene thermobarometry:
model optimisation and distribution for use in Earth Sciences
Abstract
Thermobarometry is a fundamental tool to quantitatively interrogate
magma plumbing systems and broaden our appreciation of volcanic
processes. Developments in random forest-based machine learning lend
themselves to a more data-driven approach to clinopyroxene
thermobarometry. This can include allowing users to access and filter
large experimental datasets that can be tailored to individual
applications in Earth Sciences. Here we present a methodological
assessment of random forest thermobarometry, using the R freeware
package “extraTrees”, by investigating the model performance, tuning
hyperparameters, and evaluating different methods for calculating
uncertainties. We determine that deviating from the default
hyperparameters used in the “extraTrees” package results in little
difference in overall model performance (<0.2 kbar and
<3 ⁰C difference in mean SEE). However, accuracy is greatly
affected by how the final pressure or temperature (PT) value from the
voting distribution of trees in the random forest is selected (mean,
median or mode). This thus far has been unapproached in machine learning
thermobarometry. Using the mean value leads to a higher residual between
experimental and predicted PT, whereas using median values produces
smaller residuals. Additionally, this work provides two comprehensive R
scripts for users to apply the random forest methodology to natural
datasets. The first script permits modification and filtering of the
model calibration dataset. The second script contains pre-made models in
which users can rapidly input their data to recover pressure and
temperature estimates. These scripts are open source and can be accessed
at https://github.com/corinjorgenson/RandomForest-cpx-thermobarometer.