loading page

MsPASS: A Parallel Processing Framework for Seismology
  • Yinzhi Wang,
  • Gary Pavlis
Yinzhi Wang
University of Texas at Austin

Corresponding Author:[email protected]

Author Profile
Gary Pavlis
Indiana University Bloomington
Author Profile

Abstract

Over the past decade, the huge success in many large-scale projects like the USArray component of Earthscope gave rise to a massive increase in the data volume available to the seismology community. We assert that the software infrastructure of the field has not kept up with parallel developments in ‘big data’ sciences. As a step towards enabling research at the extreme scale to more of the seismology community, we are developing a new framework for seismic data processing and management we call Massive Parallel Analysis System for Seismologists (MsPASS). MsPASS leverages several existing technologies: (1) Spark as the scalable parallel processing framework, (2) MongoDB as the flexible database system, and (3) Docker and Singularity as the containerized virtual environment. The core of the system builds on a rewrite of the SEISPP package to implement wrappers around the widely accepted ObsPy toolkit. The wrappers automate many database operations and provide a mechanism to automatically save the processing history and provide a mechanism for reproducibility. The synthesis of these components can provide flexibility to adapt to a wide range of data processing workflows. The use of containers enables the deployment to a wide range of computing platforms without requiring intervention by system administrators. We evaluate the effectiveness of the system with a deconvolution processing workflow applied to USArray data. Through extensive documentation and examples, we aim to make this system a sustainable, open-source framework for the community.