Planet Microbe: Toward the integration of oceanographic ‘omics,
environmental and physiochemical data layers
Abstract
Oceanographic research cruises produce abundant data, using a wide range
of methods and equipment; very often through large collaborative
efforts. These research endeavors span a broad array of disciplines and
are critical to investigating the interplay between biological,
geological, and chemical processes in the ocean systems over space and
time. The advent of genomic sequencing technologies allows for the
analysis of gene expression in a variety of environmental settings, to
measure the distribution and significance of metabolites and lipids in
organisms and the environment. Despite scientists’ best efforts to
carefully curate and share their data with collaborators to advance
individual studies and publications, no systematic, unifying framework
currently exists to integrate ‘omics data with physical, geochemical,
and biological datasets commonly used by the broader geoscience
community. As a result, the moment each sample leaves the ship is often
the last time each data component appears together in a unified
collection. Typically, ‘omics datasets are submitted to nucleotide
sequence repositories, whereas contextual environmental data are
submitted and stored in specialized data-repositories, or only made
available within published papers. This makes it difficult to fully
reconnect in-situ data, therefore limiting their reuse in other studies.
The development of resources to facilitate the aggregation, publication
and reuse of biological datasets along with their physicochemical
information is critical for studying marine microbes and the
biogeochemical processes in the ocean that they drive. We present Planet
Microbe, a cyberinfrastructure resource enabling data discovery and open
data sharing for historical and on-going oceanographic sequencing
efforts. Several historical oceanographic ‘omics datasets (Hawaii Ocean
Time-series (HOT), Bermuda Atlantic Time-series (BATS), Global Ocean
Sampling Expedition (GOS)) have been integrated into Planet Microbe
along with new oceanic large-scale datasets as the Tara Expeditions and
Ocean Sampling Day (OSD). In Planet Microbe, these ’omics data have been
reintegrated with their in-situ environmental contextual data, including
biological and physicochemical measurements, and information about
sampling events, and sampling stations. Finally, cruise tracks,
protocols and instrumentation are also linked to these datasets to
provide the user with a comprehensive view of the metadata.
Additionally, Planet Microbe integrates computational tools using
National Science Foundation (NSF) funded Cyberinfrastructure (CyVerse)
and provides users with free access to large-scale computing power to
analyze and explore these datasets.