Leveraging Machine Learning Approaches to Predict Organic Carbon
Abundance in Mars-Analog Hypersaline Lake Sediments
Abstract
Modern advancements in laboratory and instrumental techniques in astrobiology have improved our life detection capabilities on both Earth and beyond. These advancements have also increased the complexity of data often resulting in datasets that are characterized by complex and non-linear relationships. Machine learning methods are underutilized in astrobiology; however, these methods are extremely effective at revealing structure and patterns in complex datasets when paired with the right algorithms. Here, we employ a series of classification and regression algorithms to predict the abundance of organic carbon (OC) from X-ray fluorescence (XRF) data in dynamic Mars-analog hypersaline lake sediments. More specifically, we constructed models using the random forest (RF), k-nearest neighbors (KNN), support vector machine (SVM), and logistic regression (LR) algorithms. Overall, our trained models showed good performance with predicting the abundance of OC, with accuracies from 80% to 94%. Our results show how applying predictive models to astrobiology datasets can help life detection efforts. Machine learning approaches such as classification and regression algorithms offer insight into complex data while providing agnostic insights, ultimately creating a more efficient search for OC. We applied our trained model on XRF data from Martian soil using PIXL and Odyssey datasets to produce probability predictions of OC abundance. Our predictions show a high probability that OC abundance is low which is comparable to OC data from recently landed missions. These results highlight the potential for machine learning models to be trained on data from analog environments on Earth and then transferred (transfer learning) to extraterrestrial targets.