Carlo Lacagnina

and 9 more

The knowledge of data quality and the quality of the associated information, including metadata, is critical for data use and reuse. Assessment of data and metadata quality is key for ensuring credible available information, establishing a foundation of trust between the data provider and various downstream users, and demonstrating compliance with requirements established by funders and federal policies. Data quality information should be consistently curated, traceable, and adequately documented to provide sufficient evidence to guide users to address their specific needs. The quality information is especially important for data used to support decisions and policies, and for enabling data to be truly findable, accessible, interoperable, and reusable (FAIR). Clear documentation of the quality assessment protocols used can promote the reuse of quality assurance practices and thus support the generation of more easily-comparable datasets and quality metrics. To enable interoperability across systems and tools, the data quality information should be machine-actionable. Guidance on the curation of dataset quality information can help to improve the practices of various stakeholders who contribute to the collection, curation, and dissemination of data. This presentation introduces international community guidelines to curate data quality information that is consistent with the FAIR principles throughout the entire data life cycle and inheritable by any derivative product. Supportive case studies demonstrate the applicability of the proposed guidelines.
Formal international standards as well as promotion of community or recommended practices have their place in ensuring “FAIRness” of data. Data management in NASA’s Earth Observation System Data and Information System (EOSDIS) has benefited from both of these avenues to a significant extent. The purpose of this paper is to present one example of each of these, which promote (re)usability. The first is an ISO standard for specifying preservation content from Earth observation missions. The work on this started in 2011, informally within the Earth Science Information Partners (ESIP) in the US, while the European Space Agency (ESA) was leading an effort on Long-Term Data Preservation (LTDP). Resulting from the ESIP discussions was NASA’s Preservation Content Specification, which was applied in 2012 as a requirement for NASA’s new missions. ESA’s Preserved Data Set Content (PDSC) document was codified into a document adopted by the Committee on Earth Observation Satellites (CEOS). It was recognized that it would be useful to combine PCS and PDSC into an ISO standard to ensure consistency in data preservation on a broader international scale. This standard, numbered ISO 19165-2 has been under development since mid-2017. The second is an example of developing recommendations for “best practices” within more limited (still fairly broad) communities. A Data Product Developers’ Guide (DPDG) is currently being developed by one of NASA’s Earth Science Data System Working Groups (ESDSWGs). It is for use by developers of products to be derived from Earth observation data to improve product (re)usability. One of the challenges in developing the guide is the fact that there are already many applicable standards and guides. The relevant information needs to be selected and expressed in a succinct manner, with appropriate pointers to references. The DPDG aims to compile the most applicable parts of earlier guides into a single document outlining the typical development process for Earth Science data products. Standards and best practices formally endorsed by the Earth Science Data and Information System (ESDIS) Standards Office (ESO), outputs from ESDSWGs (e.g., Dataset Interoperability Working Group, and Data Quality Working Group), and recommendations from Distributed Active Archive Centers and data producers are emphasized.