Whyjay Zheng

and 9 more

Supplemental material (SM; also known as supplementary information) comes with its associated research article and provides study details such as metadata, additional figures and text, multimedia, and code. Well-designed SM helps readers fully understand the underlying scientific analysis, reproduce the work, and even reuse the workflows for exploratory ideas. Thus, the concept of FAIR (Findable, Accessible, Interoperable, and Reusable), which is originally designed for data sharing guidelines, also matches these core qualities for SM.We evaluate different SM-preparation practices that are commonly found in Earth Science journal articles. These practices are classified into five tiers based on the FAIR principles and the narrative structure. We show that Jupyter Book-based SM belongs to the top tier and outperforms the other practices, despite being not as popular as the other SM-preparation practices as of 2022.We identify the advantages of the Jupyter Book-based SM as follows. Jupyter Book uses a narrative structure to combine different elements of SM into a single scholarly object, increasing readability. Jupyter Book's direct support of HTML publishing allows users to web host the SM using services such as Github Pages, improving the web indexing ranks and resulting in higher exposure of both the research article and the SM. The entire SM is also eligible to be archived in a data repository and receive a Digital Object Identifier (DOI) that can be used for citations. In addition, Jupyter Book-based SM lowers the threshold of reproducing and reusing the work by accessing an interactive cloud computing service (e.g., MyBinder.org) with all data and code imported if the content is available on a code-hosting platform (e.g., Github).These features summarize the core values of SM from the perspective of open science. We encourage researchers to use these good practices and urge journal publishers to be open to receiving such supplements for maximum effectiveness.
The core tools of science (data, software, and computers) are undergoing a rapid and historic evolution, changing what questions scientists ask and how they find answers. Earth science data are being transformed into new formats optimized for cloud storage that enable rapid analysis of multi-petabyte datasets. Datasets are moving from archive centers to vast cloud data storage, adjacent to massive server farms. Open source cloud-based data science platforms, accessed through a web-browser window, are enabling advanced, collaborative, interdisciplinary science to be performed wherever scientists can connect to the internet. Specialized software and hardware for machine learning and artificial intelligence (AI/ML) are being integrated into data science platforms, making them more accessible to average scientists. Increasing amounts of data and computational power in the cloud are unlocking new approaches for data-driven discovery. For the first time, it is truly feasible for scientists to bring their analysis to data in the cloud without specialized cloud computing knowledge. This shift in paradigm has the potential to lower the threshold for entry, expand the science community, and increase opportunities for collaboration while promoting scientific innovation, transparency, and reproducibility. Yet, we have all witnessed promising new tools which seem harmless and beneficial at the outset become damaging or limiting. What do we need to consider as this new way of doing science is evolving?