loading page

Managing a Community Data Collection with Open Source Software
  • +3
  • Roland Schweitzer,
  • Ethan Davis,
  • Sean Arms,
  • Robert Simons,
  • Kevin O'Brien,
  • David Neufeld
Roland Schweitzer
Self Employed

Corresponding Author:[email protected]

Author Profile
Ethan Davis
UCAR Unidata
Author Profile
Sean Arms
University Corporation for Atmospheric Research
Author Profile
Robert Simons
NOAA/NMFS/SWFSC
Author Profile
Kevin O'Brien
NOAA/PMEL UW Joint Institute for the Study of the Atmosphere and Ocean
Author Profile
David Neufeld
CIRES
Author Profile

Abstract

The Unified Access Framework (UAF) project of the NOAA Global Earth Observation - Integrated Data Environment (GEO-IDE) in an on-going effort to provide access to NOAA-wide data in a way that is FAIR and meets PARR requirements. The first priority of UAF is to copy success. We recognize: data that follows the Climate and Forecast netCDF convention is readily used by working scientists; THREDDS Data Servers and ERDDAP servers are a popular ways to serve such data; these servers can be interrogated by software to determine that the data follows the conventions and the servers can be federated. To make the collection we construct a master “raw” catalog of candidate data set from THREDDS servers around NOAA and other organizations. The raw catalog is examined by custom software to eliminate large data collections which are not aggregated in time and organize the results into a “clean” catalog. The catalog is then examined by ERDDAP to provide ERDDAP GridDAP access and to verify that the data sources follow the CF convention. The gridded data sets are merged into a collection of TableDAP (netCDF Discrete Sampling Geometry) data sources. Currently the UAF ERDDAP server is home to 10,712 data sets. After the UAF ERDDAP server has examined the data collection, a Live Access Server (LAS) is configured to offer data analysis and visualization access to all the data sets. The final piece of the puzzle is to make the data FAIR and to achieve PARR compliance. This requires some tools that have been adapted and developed for this purpose. We resurrected the ncISO tool which can examine the contents of CF netCDF data sources and create ISO metadata and score the data according the the Unidata Attribute Convention for Data Discovery. We can help the centers hosting the data meet their PARR requirements by properly integrating the resulting metadata from ncISO into NOAA’s central data catalog. We have recently updated the templates which are used to generate the metadata to insure they are meeting the latest ISO and ADDC specifications. Work is underway at NOAA and Unidata to integrate the ncISO code back into the GitHub repository for the THREDDS Data Server. This will bring together two disparate ncISO implementations. UAF is a few people working a few hours a month to maintain and large and useful data collection and in this talk we’ll tell you how we do it.