Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

Karen Stocks

and 11 more

The Rolling Deck to Repository (R2R; www.rvdata.us) program is entering its second decade of managing underway data from US-operated academic research vessels to ensure preservation of, and access to, these national oceanographic research assets. Reflecting on the move from decentralized data submission by chief scientists to an operational centralized facility has brought insights that may inform other communities with distributed networks of data acquisition providers with diverse practices and resources. 4,000 cruises and 100+TB of data later, here are lessons R2R has learned. - Managing data via a central aggregating system where both curation and domain data expertise can be optimally leveraged promotes more complete and efficient data preservation. - Identifying key organizing elements for the data, and implementing persistent identifiers and metadata for those elements, facilitates management and usability. R2R developed authoritative DOIs and standard metadata for cruises to organize R2R data for discoverability and access, and facilitate reciprocal linking to related data in external repositories. When data submissions from diverse providers are heterogeneous, standardizing data at ingest supports data aggregation and synthesis that promote broad data re-use. - Providing tools and expertise to assist with standardization, such as recommended data structures and best practice guidance for data acquisition, reduces heterogenoeus practices over time even when compliance is voluntary. - Developing organized and persistent communication mechanisms with all main stakeholders is central to success. R2R has annual community-level meetings, as well as more frequent individual interactions, with vessel operators/technicians, the NOAA National Centers for Environmental Information staff, and oceanographic research scientists. These communications have been critical to informing high level priorities, overall approaches, and specific technical details and decisions.

Edward Armstrong

and 16 more

Before complex analysis of oceanographic or any earth science data can occur, it must be placed in the proper domain of computing and software resources. In the past this was nearly always the scientist’s personal computer or institutional computer servers. The problem with this approach is that it is necessary to bring the data products directly to these compute resources leading to large data transfers and storage requirements especially for high volume satellite or model datasets. In this presentation we will present a new technological solution under development and implementation at the NASA Jet Propulsion Laboratory for conducting oceanographic and related research based on satellite data and other sources. Fundamentally, our approach for satellite resources is to tile (partition) the data inputs into cloud-optimized and computation friendly databases that allow distributed computing resources to perform on demand and server-side computation and data analytics. This technology, known as NEXUS, has already been implemented in several existing NASA data portals to support oceanographic, sea-level, and gravity data time series analysis with capabilities to output time-average maps, correlation maps, Hovmöller plots, climatological averages and more. A further extension of this technology will integrate ocean in situ observations, event-based data discovery (e.g., natural disasters), data quality screening and additional capabilities. This particular activity is an open source project known as the Apache Science Data Analytics Platform (SDAP) (https://sdap.apache.org), and colloquially as OceanWorks, and is funded by the NASA AIST program. It harmonizes data, tools and computational resources for the researcher allowing them to focus on research results and hypothesis testing, and not be concerned with security, data preparation and management. We will present a few oceanographic and interdisciplinary use cases demonstrating the capabilities for characterizing regional sea-level rise, sea surface temperature anomalies, and ocean hurricane responses.