Abstract
The Unified Access Framework (UAF) project of the NOAA Global Earth
Observation - Integrated Data Environment (GEO-IDE) in an on-going
effort to provide access to NOAA-wide data in a way that is FAIR and
meets PARR requirements. The first priority of UAF is to copy success.
We recognize: data that follows the Climate and Forecast netCDF
convention is readily used by working scientists; THREDDS Data Servers
and ERDDAP servers are a popular ways to serve such data; these servers
can be interrogated by software to determine that the data follows the
conventions and the servers can be federated. To make the collection we
construct a master “raw” catalog of candidate data set from THREDDS
servers around NOAA and other organizations. The raw catalog is examined
by custom software to eliminate large data collections which are not
aggregated in time and organize the results into a “clean” catalog.
The catalog is then examined by ERDDAP to provide ERDDAP GridDAP access
and to verify that the data sources follow the CF convention. The
gridded data sets are merged into a collection of TableDAP (netCDF
Discrete Sampling Geometry) data sources. Currently the UAF ERDDAP
server is home to 10,712 data sets. After the UAF ERDDAP server has
examined the data collection, a Live Access Server (LAS) is configured
to offer data analysis and visualization access to all the data sets.
The final piece of the puzzle is to make the data FAIR and to achieve
PARR compliance. This requires some tools that have been adapted and
developed for this purpose. We resurrected the ncISO tool which can
examine the contents of CF netCDF data sources and create ISO metadata
and score the data according the the Unidata Attribute Convention for
Data Discovery. We can help the centers hosting the data meet their PARR
requirements by properly integrating the resulting metadata from ncISO
into NOAA’s central data catalog. We have recently updated the templates
which are used to generate the metadata to insure they are meeting the
latest ISO and ADDC specifications. Work is underway at NOAA and Unidata
to integrate the ncISO code back into the GitHub repository for the
THREDDS Data Server. This will bring together two disparate ncISO
implementations. UAF is a few people working a few hours a month to
maintain and large and useful data collection and in this talk we’ll
tell you how we do it.