Thematic harmonization of environmental data: Facilitating interoperability of data within and among repositories in support of data reuse and scientific synthesis

Margaret O'Brien; Colin Smith; Corinna Gries

doi:10.1002/essoar.10501268.1

loading page

Thematic harmonization of environmental data: Facilitating interoperability of data within and among repositories in support of data reuse and scientific synthesis

Margaret O'Brien,
Colin Smith,
Corinna Gries

Abstract

Data repositories and research networks worldwide are publishing a diverse array of long-term and experimental data for meaningful reuse, repurpose, and integration. However, in synthesis research the largest time investment is still in discovering, cleaning and combining primary datasets until all are completely understood and converted to a usable format. To accelerate this process, we have developed an approach to define flexible domain specific data models and convert primary data to these models using a light-weight and distributed workflow framework. The approach is based on extensive experience in synthesis research workflows, takes into account the distributed nature of original data curation, satisfies the requirement for regular additions to the original data, and is not determined by a single synthesis research question. Furthermore, all data describing the sampling context are preserved and the harmonization may be performed by data scientists that are not specialists in each specific research domain. Our harmonization process is 3-phased. First, a Design Phase captures essential attributes, considers already existing standardization efforts, and external vocabularies that disambiguate meaning. Second, an Implementation Phase publishes the data model and best practice guides for reference, followed by conversion of relevant repository contents by data managers, and creation of software for data discovery and exploration. Third, a Maintenance Phase implements programmatic workflows that run automatically when parent data are revisioned using event notification services.In this presentation we demonstrate the harmonization process for ecological community survey data and highlight the unique challenges and lessons learned. Additionally, we demonstrate the maintenance workflow and data exploration and aggregation tools that plug in to this data model