Abstract
The core tools of science (data, software, and computers) are undergoing
a rapid and historic evolution, changing what questions scientists ask
and how they find answers. Earth science data are being transformed into
new formats optimized for cloud storage that enable rapid analysis of
multi-petabyte datasets. Datasets are moving from archive centers to
vast cloud data storage, adjacent to massive server farms. Open source
cloud-based data science platforms, accessed through a web-browser
window, are enabling advanced, collaborative, interdisciplinary science
to be performed wherever scientists can connect to the internet.
Specialized software and hardware for machine learning and artificial
intelligence (AI/ML) are being integrated into data science platforms,
making them more accessible to average scientists. Increasing amounts of
data and computational power in the cloud are unlocking new approaches
for data-driven discovery. For the first time, it is truly feasible for
scientists to bring their analysis to data in the cloud without
specialized cloud computing knowledge. This shift in paradigm has the
potential to lower the threshold for entry, expand the science
community, and increase opportunities for collaboration while promoting
scientific innovation, transparency, and reproducibility. Yet, we have
all witnessed promising new tools which seem harmless and beneficial at
the outset become damaging or limiting. What do we need to consider as
this new way of doing science is evolving?