Abstract
For science to reliably support new discoveries, its results must be
reproducible. This has proven to be a challenge in many fields including
fields that rely on computational methods as a means for supporting new
discoveries. Reproducibility in these studies is particularly difficult
because they require open, documented sharing of data and models and
careful control of underlying hardware and software dependencies so that
computational procedures executed by the original researcher are
portable and can be run on different hardware or software and produce
consistent results. Despite recent advances in making scientific work
more findable, accessible, interoperable and reusable (FAIR),
fundamental questions in the conduct of reproducible computational
studies remain: Can published results be repeated in different computing
environments? If yes, how similar are they to previous results? Can we
further verify and build on the results by using additional data or
changing computational methods? Can these changes be automatically and
systematically tracked? This presentation will describe our EarthCube
project to advance computational reproducibility and make it easier and
more efficient for geoscientists to preserve, share, repeat and
replicate scientific computations. Our approach is based on Sciunit
software developed by prior EarthCube projects which encapsulates
application dependencies composed of system binaries, code, data,
environment and application provenance so that the resulting
computational research object can be shared and re-executed on different
platforms. We have deployed Sciunit within the HydroShare JupyterHub
platform operated by the Consortium of Universities for the Advancement
of Hydrologic Science Inc. (CUAHSI) for the hydrology research community
and will present use cases that demonstrate how to preserve, share,
repeat and replicate scientific results from the field of hydrologic
modeling. While illustrated in the context of hydrology, the methods and
tools developed as part of this project have the potential to be
extended to other geoscience domains. They also have the potential to
inform the reproducibility evaluation process as currently undertaken by
journals and publishers.