Data Reuse and Reproducibility in Earth System Science: A Survey of
Current Practices, Barriers, and Expectations
Abstract
As Earth System Science (ESS) becomes more data-intensive,
collaborative, and interdisciplinary, it is important to understand how
best to support and advance data reuse. We conducted an online survey of
active ESS researchers from 126 U.S. universities and research centers,
representing a wide variety of scientific fields. Of the 207
respondents, 51.7% had more than 20 years of research experience.
Results indicated that the current primary purposes for reusing data are
to conduct new analysis (87%), followed by comparing results (70.4%),
with only 18.5% reusing data to reproduce published studies. As
expected, data hosted by federally funded data centers were reused most
frequently, with open government data and data provided directly from
other researchers also widely used. Reuse of data from other types of
repositories lags far behind, due in part to a range of service
limitations. At the same time, data sharing by respondents is
strong—96.6% actively release their data, primarily as supplements to
published papers, with moderate use of open access repositories. Of the
45.9% who had attempted to reproduce research, 73.7% failed at least
once, often due to the limited detail provided in published papers.
Still, 92.3% believe it is the researcher’s responsibility to ensure
their work is reproducible. The majority favored traditional modes of
documenting research—word processors, text editors, and code
commenting over electronic notebooks or workflow systems. Interestingly,
59.9% continue to use hand-written notebooks. Challenges to data reuse
and reproducibility specific to ESS included the complex nature of earth
systems, increasingly complicated models, lack of data management
resources, and limited emphasis on reproducibility in the field.
Open-ended responses raised questions about whether “exact
replication” is necessary or possible for ESS. Most researchers agreed
that data and code should be considered important research products and
that outlets are needed for publishing negative results. Taken together,
the results suggest a strong data sharing culture in ESS with high
levels of reuse and commitment to open science. The research community
would benefit greatly from better documentation and sharing of methods
and research processes, as well as targeted improvements in data
services and tools.