loading page

NCAR Datasets Published in the Cloud
  • +4
  • Jeff de La Beaujardière,
  • Brian Bonnlander,
  • Seth McGinnis,
  • Maxwell Grover,
  • Anderson Banihirwe,
  • Kevin Raeder,
  • Gary Strand
Jeff de La Beaujardière
National Center for Atmospheric Research

Corresponding Author:[email protected]

Author Profile
Brian Bonnlander
National Center for Atmospheric Research
Author Profile
Seth McGinnis
National Center for Atmospheric Research
Author Profile
Maxwell Grover
National Center for Atmospheric Research
Author Profile
Anderson Banihirwe
National Center for Atmospheric Research
Author Profile
Kevin Raeder
National Center for Atmospheric Research
Author Profile
Gary Strand
National Center for Atmospheric Research
Author Profile

Abstract

The US National Center for Atmospheric Research (NCAR) has published several large datasets in the Amazon Web Services (AWS) cloud, thanks to support from the NCAR “Science at Scale” project, the AWS Open Data Sponsorship program, and the Amazon Sustainability Data Initiative. In each case we selected a subset comprising the most useful variables from the original data, and converted that subset from NetCDF to Zarr before publication. The Zarr format supports the same data model as netCDF and is well suited to object storage and distributed computing in the cloud using the Pangeo libraries in Python. Each dataset has an accompanying Intake-ESM catalog to facilitate data discovery and reading via Xarray, and each also has a sample Jupyter Notebook to illustrate how to access and analyze the data. Egress for these data are free, but users are encouraged to bring their compute to the data. The datasets currently published are: Community Earth System Model Large Ensemble (CESM LENS): https://doi.org/10.26024/wt24-5j82 North American Coordinated Regional Downscaling Experiment (NA-CORDEX): https://doi.org/10.26024/9xkm-fp8 CESM version 2 Large Ensemble (CESM2-LE): https://doi.org/10.26024/y48t-q717 Data Assimilation Research Testbed (DART) Reanalysis: https://doi.org/10.26024/sprq-2d04 This paper will provide information about the datasets and summarize lessons learned from the data conversion and publication.