Using Machine Learning Models and Logistic Regression Analyses to
Develop a Comprehensive Understanding of Extinction Risk For Marine
Animal Phyla Across the Paleozoic
Abstract
Extinction within the Paleozoic era has been studied in the past, but
there still lacks a comprehensive understanding of how extinction risk
changed throughout it. Our research project aims to bridge this gap by
exploring extinction risk in relation to major Paleozoic phyla and
ecological characteristics. Using R, we analyzed the Stanford Earth Body
Size dataset, which includes extensive data (n=8816) on Paleozoic marine
animals. In Step 1, regression coefficients were formed, indicating
whether being in one of the 6 phyla in each period of the Paleozoic era
conferred greater or less extinction risk. In Step 2, the examined
ecological characteristics included ocean acidification resilience,
feeding patterns, body volume, length, surface area, motility, tiering,
circulatory systems, and respiratory organ type. In Step 3, 6 binomial
machine learning models were created using the traits from Step 2 to
determine whether an individual genus went extinct in a particular
period. Our Step 1 results confirm that within these timeframes, while
certain phyla have greater extinction risk, extinction risk was not
uniform across these groups. Our Step 2 results show certain traits
provided advantages and disadvantages for an organism’s extinction risk.
One interesting pattern was that the only consistently non-significant
traits were body length, area, and volume. Likewise with Step 1,
extinction risk for each ecological characteristic varied across the
Paleozoic. Finally, in Step 3, the results were largely successful. Most
of the six models had an accuracy above 80% with the highest being 92%
in the Cambrian. The areas under the Precision-Recall and the Receiver
Operating Characteristic Curves were all in the acceptable
(<0.6) range, demonstrating that the model has low false
positive/ negative rates and is able to distinguish between what trait
indicates extinction or survival for each period. Our research project
identified phyla at risk of extinction in each period of the Paleozoic,
determined which natural traits incited greater extinction risk, and
demonstrated machine learning models trained on fossil descriptors can
predict when an individual genus became extinct. Our results confirmed
that extinction risk is not consistently dependent on a singular factor
nor is it constant across every period of the Paleozoic era.