loading page

Using Machine Learning Models and Logistic Regression Analyses to Develop a Comprehensive Understanding of Extinction Risk For Marine Animal Phyla Across the Paleozoic
  • +4
  • Adarsh Ambati,
  • Theo Chiang,
  • Anya Sengupta,
  • Pedro Monarrez,
  • Michael Pimentel-Galvan,
  • Noel Heim,
  • Jonathan Payne
Adarsh Ambati
Stanford University

Corresponding Author:[email protected]

Author Profile
Theo Chiang
Stanford University
Author Profile
Anya Sengupta
Stanford University
Author Profile
Pedro Monarrez
Stanford University
Author Profile
Michael Pimentel-Galvan
Stanford University
Author Profile
Noel Heim
Tufts University
Author Profile
Jonathan Payne
Stanford University
Author Profile

Abstract

Extinction within the Paleozoic era has been studied in the past, but there still lacks a comprehensive understanding of how extinction risk changed throughout it. Our research project aims to bridge this gap by exploring extinction risk in relation to major Paleozoic phyla and ecological characteristics. Using R, we analyzed the Stanford Earth Body Size dataset, which includes extensive data (n=8816) on Paleozoic marine animals. In Step 1, regression coefficients were formed, indicating whether being in one of the 6 phyla in each period of the Paleozoic era conferred greater or less extinction risk. In Step 2, the examined ecological characteristics included ocean acidification resilience, feeding patterns, body volume, length, surface area, motility, tiering, circulatory systems, and respiratory organ type. In Step 3, 6 binomial machine learning models were created using the traits from Step 2 to determine whether an individual genus went extinct in a particular period. Our Step 1 results confirm that within these timeframes, while certain phyla have greater extinction risk, extinction risk was not uniform across these groups. Our Step 2 results show certain traits provided advantages and disadvantages for an organism’s extinction risk. One interesting pattern was that the only consistently non-significant traits were body length, area, and volume. Likewise with Step 1, extinction risk for each ecological characteristic varied across the Paleozoic. Finally, in Step 3, the results were largely successful. Most of the six models had an accuracy above 80% with the highest being 92% in the Cambrian. The areas under the Precision-Recall and the Receiver Operating Characteristic Curves were all in the acceptable (<0.6) range, demonstrating that the model has low false positive/ negative rates and is able to distinguish between what trait indicates extinction or survival for each period. Our research project identified phyla at risk of extinction in each period of the Paleozoic, determined which natural traits incited greater extinction risk, and demonstrated machine learning models trained on fossil descriptors can predict when an individual genus became extinct. Our results confirmed that extinction risk is not consistently dependent on a singular factor nor is it constant across every period of the Paleozoic era.