porousMedia4Foam is a package for solving flow and transport in porous media using OpenFOAM - a popular open-source numerical toolbox. We introduce and highlight the features of a new generation open-source hydro-geochemical module implemented within porousMedia4Foam. It opens up a new dimension to investigate hydro-geochemical processes occurring at multiple scales i.e. at the pore-scale, reservoir-scale and at the hybrid-scale relying on the concepts of micro-continuum. The package is designed such that only the chemistry part of the solver is handled by an external package, the geochemical package, which is coupled to the flow and transport solver of OpenFOAM. The evolution of the porous media and fluid properties, such as porosity, permeability, reactive surface area, or diffusivity of chemical species, are handled by various models that we implemented in the package. For the present work, PHREEQC was chosen as the geochemical solver. We conducted benchmarks across different scales to validate the accuracy of our simulator. We further looked at the evolution of mineral dissolution/ precipitation in a hybrid system comprising of a fracture and reactive porous medium.
Japan Agency for Marine-Earth Science and Technology (JAMSTEC) has been disseminated information of data and samples obtained by JAMSTEC’s research vessels and submersibles via some databases. The core database is the Data and Sample Research System for Whole Cruise Information (DARWIN, http://www.godac.jamstec.go.jp/darwin/e). DARWIN provides information of more than 2000 cruises and 6500 dives, and its related observation data and geological sample. Each cruise or dive information page of DARWIN has links to related contents such as biological sample, deep-sea image, cruise report and data book provided by another databases. In this poster, we will introduce some examples of efforts to improve the reusability of data and sample described above. - Assignment of Digital Object Identifier (DOI): DOIs have been assigned to more than 2000 research cruises information from the early 1980s onward to promote data citation. A granularity of DOI is a research cruise information page defined as a cruise ID at this time. - Specification of Data Policy: DARWIN has been treated data and samples based on only JAMSTEC data policy until FY2018’s research cruises. Currently, the applicable policy is dependent on type of cruise because the cooperative research cruises have been started since FY2019. DARWIN has been updated to display of an applicable policy on each information page clearly. The function is also available to add to another policy in the future. - Data Rescue: A cruise report is a prompt report that includes the purpose of the cruise, information about what kind of observation was done, how it was done and the results. Cruise reports for past research cruises have been found when moving laboratory and submitted to the Data Management Group. The reports have been scanned and saved as individual PDF to store for long-term and prepare for publication. A part of them with Principle Investigators’ consent have been disseminated step by step.
Since its formation in 2015, the National Centers for Environmental Information (NCEI) has used disparate, legacy systems spread across several IT networks of the National Environmental Satellite, Data, and Information Service (NESDIS) to fulfill its data-stewardship functions. As part of its modernization and consolidation of these functions, NCEI implemented Common Ingest as the functional component that ingests approximately 200 data streams every month into its enterprise archival information system. In parallel, NESDIS completed the Secure Ingest Gateway Project (SIGP), a pilot project to establish standard-enterprise secure methods for NESDIS and the rest of the National Oceanic and Atmospheric Administration (NOAA) to receive data in a cloud environment from their external partners. SIGP is now transitioning to operations as the Operational Secure Ingest Service (OSIS), which will be the on-ramp to NCEI’s “Common Ingest” functionality when it too moves to the cloud. In addition, this ingest function will populate and use a cloud-based metadata catalog, which will be the beating heart of the NESDIS and NCEI information systems in the cloud environment. The vision is to scale their ingest of environmental data to keep pace with its ever increasing volume, veracity, variety, and velocity. In this presentation to the ocean data community, the authors describe NESDIS and NCEI’s challenges and successes with the implementation of the ingest function of their archival information system in a cloud environment.
Knowledge of soil properties is essential for risk assessment of vapor intrusion (VI). Data assimilation (DA) provides a valuable means to characterize contaminated sites by fusing the information contained in the measurement data (such as concentrations of volatile organic chemicals). Nevertheless, the application of DA in risk assessment of VI is quite limited. Moreover, soil heterogeneity is often overlooked in VI-related research. To fill these knowledge gaps, we apply a state-of-the-art DA method based on deep learning (DL), that is, ES(DL), to better characterize the contaminated sites in VI risk assessment. The effectiveness of ES(DL) is well demonstrated by three representative scenarios with increasing soil heterogeneity. The results clearly show that ignoring soil heterogeneity will significantly undermine one’s ability to make reasonable decisions in VI risk assessment. As a preliminary attempt of applying an advanced DA method in VI research, this work provides implications for the potential of using DL and DA in complex problems that couple hydrological and environmental processes.
The Martian caves have revived interest in the field of speleology because they are the potential destinations for future human residences and astrobiological research. The skylights are formed by the collapse of the surface materials into the subsurface void spaces. Hence, they are the doors to access the subsurface caves. The signature of life is probable in a sub-surface cave on Mars as it can protect life from the harsh and dangerous radiation environment of the surface. In a cave, there may be an abundance of minerals, fluids, and other key resources. Therefore, locating the skylights is essential and crucial for formulating plans for robotics/human explorations of the Red Planet, Mars. We have used remote sensing data from MRO (Mars Reconnaissance Orbiter; NASA), MGS (Mars Global Surveyor; NASA), and Mars Odyssey (NASA) for identifying, mapping, and classifying of skylights based on their morphology, morphometry, and thermal behavior. A total of thirty-two skylight candidates have been examined which includes twenty-six newly discovered ones. Out of these, seventeen have been classified as Atypical Pit Craters (APCs) and fifteen as Bowl-shaped Pit Craters (BPCs). Among these, there are twelve newly found APCs. The APCs are considered as potential skylights associated with caves; however, considering the formation and the geological context, fifteen BPCs, which have displayed the requisite morphological and thermal behavior, have also been considered as potential skylights.
Formal international standards as well as promotion of community or recommended practices have their place in ensuring “FAIRness” of data. Data management in NASA’s Earth Observation System Data and Information System (EOSDIS) has benefited from both of these avenues to a significant extent. The purpose of this paper is to present one example of each of these, which promote (re)usability. The first is an ISO standard for specifying preservation content from Earth observation missions. The work on this started in 2011, informally within the Earth Science Information Partners (ESIP) in the US, while the European Space Agency (ESA) was leading an effort on Long-Term Data Preservation (LTDP). Resulting from the ESIP discussions was NASA’s Preservation Content Specification, which was applied in 2012 as a requirement for NASA’s new missions. ESA’s Preserved Data Set Content (PDSC) document was codified into a document adopted by the Committee on Earth Observation Satellites (CEOS). It was recognized that it would be useful to combine PCS and PDSC into an ISO standard to ensure consistency in data preservation on a broader international scale. This standard, numbered ISO 19165-2 has been under development since mid-2017. The second is an example of developing recommendations for “best practices” within more limited (still fairly broad) communities. A Data Product Developers’ Guide (DPDG) is currently being developed by one of NASA’s Earth Science Data System Working Groups (ESDSWGs). It is for use by developers of products to be derived from Earth observation data to improve product (re)usability. One of the challenges in developing the guide is the fact that there are already many applicable standards and guides. The relevant information needs to be selected and expressed in a succinct manner, with appropriate pointers to references. The DPDG aims to compile the most applicable parts of earlier guides into a single document outlining the typical development process for Earth Science data products. Standards and best practices formally endorsed by the Earth Science Data and Information System (ESDIS) Standards Office (ESO), outputs from ESDSWGs (e.g., Dataset Interoperability Working Group, and Data Quality Working Group), and recommendations from Distributed Active Archive Centers and data producers are emphasized.
The Space Physics Data Facility (SPDF https://spdf.gsfc.nasa.gov) and Solar Data Analysis Center (SDAC https://umbra.nascom.nasa.gov/), as the NASA Heliophysics active final archives, will be preserving and distributing the data from Parker Solar Probe. Working in cooperation with current operating missions and the heliophysics community, SPDF ingests, preserves and serves a wide range of past and current public science-quality data from the ionosphere into the furthest reach of deep-space exploration. SPDF has been working with the Parker Solar Probe mission in preparation for archiving and serving its in-situ data starting 2019 Nov 12, and also has arrangements to serve in-situ data from Solar Orbiter when those data become public. SPDF will facilitate scientific analysis of multi-instrument and multi-mission datasets to enhance the science return of Parker Solar Probe mission. SPDF develops and maintains the Common Data Format (CDF) and the associated ISTP/SPDF metadata guidelines. SPDF services include CDAWeb, which supports both survey and burst mode data with graphics, listings and data superset/subset functions. All public data held by SPDF are also available for direct file download by HTTPS or FTPS links from the SPDF home page (https://spdf.gsfc.nasa.gov). SPDF is currently receiving and serving from missions including Helios, MMS, Van Allen Probes, THEMIS/ARTEMIS, GOLD, ACE, Cluster, Geotail, Polar, Wind and many others, and >120 Ground-Based investigations. SPDF recently added support for ARASE/ERG and MAVEN as supplementary access at the requests of those missions. SPDF also operates the multi-mission orbit displays and query services of SSCWeb and the Java-based 4D Orbit Viewer, as well as the Heliophysics Data Portal (HDP) discipline-wide data inventory and access service, and the OMNIweb near-Earth solar wind plasma and magnetic field database.
The usage of the term Knowledge Graph (KG) has gained significant popularity since 2012, when Google introduced its own knowledge graph, and how they used it to enhance their searches and question answering systems. While various definitions and interpretations for knowledge graphs have been presented, what remains consistent is that knowledge graphs are commonly used with reasonsers to make inferences about data, based on assertions and axioms written by human experts. But knowledge graphs, which store complex, multi-dimensional data contain hidden patterns and trends that cannot be explored simply using reasoners. In such a case it becomes necessary to extract parts of the knowledge graph (focusing on the instances related to one property at a time) and analyze them individually in order to conduct a focused but tractable exploration of the domain. In this presentation, we present one way to gain insights from knowledge graphs, using network science. To achieve this goal, we have formalised the partitioning of knowledge graphs to unipartite knowledge networks, and present various ways to explore and analyse such knowledge networks to form scientific hypotheses, gain scientific insights and make discoveries.
Global Sensitivity Analysis (GSA) has long been recognized as an indispensable tool for model analysis. GSA has been extensively used for model simplification, identifiability analysis, and diagnostic tests, among others. Nevertheless, computationally efficient methodologies are sorely needed for GSA, not only to reduce the computational overhead, but also to improve the quality and robustness of the results. This is especially the case for process-based hydrologic models, as their simulation time is often too high and is typically beyond the availability for a comprehensive GSA. We overcome this computational barrier by developing an efficient variance-based sensitivity analysis using copulas. Our data-driven method, called VISCOUS, approximates the joint probability density function of the given set of input-output pairs using Gaussian mixture copula to provide a given-data estimation of the sensitivity indices. This enables our method to identify dominant hydrologic factors by recycling pre-computed set of model evaluations or existing input-output data, and thus avoids augmenting the computational cost. We used two hydrologic models of increasing complexity (HBV and VIC) to assess the performance of the proposed method. Our results confirm that VISCOUS and the original variance-based method can detect similar important and unimportant factors. However, while being robust, our method can substantially reduce the computational cost. The results here are particularly significant for, though not limited to, process-based models with many uncertain parameters, large domain size, and high spatial and temporal resolution.
Sparse observational data in developing regions leads to uncertainty about how hydro-climatic factors influence crop phases and productivity, knowledge of which is essential to mitigating food security threats induced by climate change. In this study, NASA Tropical Rainfall Measuring Mission (TRMM), Global Precipitation Measurement (GPM), and Global Land Data Assimilation System (GLDAS) data products bypass spatiotemporal limitations and drive machine learning algorithms developed to characterize hydro-climate-productivity interactions. Extensive feature engineering processes these products into nearly 4000 metrics, designed to decompose crop growing season hydro-climate conditions. Dimensionality reduction with bidirectional step-wise regression, Multi-Adaptive-Regression-Splines (MARS), and Random Forest algorithms are explored to determine key temporal hydro-climate drivers to agricultural productivity, with each method recognizing unique linear and non-linear predictors. Finally, multi-variate regression, MARS, and Random Forest models are trained on the drivers to predict seasonal crop yield. We apply this hydro-climate-productivity framework to investigate rabi wheat productivity on Pakistan’s Potohar Plateau. Here, we identify six of wheat’s ten phenological phases that display strong hydro-climate responses, with the shooting phase exhibiting sensitivity to precipitation intensity, minimum soil moisture, and sub-zero temperatures. In addition, the plateau’s heterogeneous climate-productivity connections are captured well by the calibrated models, strengthening their application for studying broader climate change impacts. The integration of remote sensing products and machine learning offers an effective framework to bypass in-situ data limitations and decompose climate-crop productivity relationships, thus improving drought onset recognition and food security forecasting.
Machine learning (ML) models that classify a sample as non-indicative or indicative of life can play an important role in planning life-detection missions. They are based on clearly defined and consistent algorithms, regardless of sample type or origin, and make their predictions from weighted combinations of multiple features rather than from any singular feature. These weighted combinations can reveal the most informative measurements within the operational constraints of a life-detection mission. The Ladder of Life Detection (Neveu 2018) identifies the need for an understanding of how combinations of multiple biosignatures affect overall confidence. The present work provides a starting point to answer this need, and future work will expand the data types to obtain even more predictive combinations of features. Elemental composition and isotope fractionation were chosen as the data types, as they are available for both biogenic and abiogenic systems and not unique to Earth biochemistry. Measurements of these data types across a wide range of unambiguously non-indicative or indicative samples were gathered from published literature. The varied sample measurements were then integrated into twenty-one representative samples. The ML models only made binary classifications of non-indicative or indicative of life. Nonetheless, the indicative samples broadly fell into three categories: mixed, non-alive, and alive. Four classification algorithms were trained and tested with Monte Carlo simulations using a 70:30 train to validation ratio. Between the models, around 75% of the test samples were correctly classified, with variations in sensitivity and specificity of the models. For elemental abundances predictive of a non-indicative of life sample: all models found Ti and Si as strong and Fe, Al, Mn, and Mg as medium. For predicting an indicative of life sample, all models found C, N, and Carbon-13 as strong and K, H, P, and Ca as medium. A weighted combination of multiple biosignatures is shown to be a more effective approach to classifying sample-data than relying on any individual biosignature or on an unweighted group of biosignatures. Different models also made different chronic misclassifications, suggesting that combining the outputs of multiple models may be more effective than relying on the output of a singular model. Which type of model to use may depend on the application, e.g., higher sensitivity models might be preferred in first-pass situations where false-negatives are more costly than false-positives. Lastly, the weighted combination of measurements in a model suggests how to combine biosignatures to affect the overall confidence of the classification. These results provide evidence of elemental biosignatures beyond the CHNOPS of Earth-based life and serve as a proof of concept for algorithmic biosignature classification.
Natural and non-natural factors have combined effects on the trajectory of COVID-19 pandemic, but it is difficult to make them separate. To address this problem, a two-stepped methodology is proposed. First, a compound natural factor (CNF) model is developed via assigning weight to each of seven investigated natural factors, i.e., temperature, humidity, visibility, wind speed, barometric pressure, aerosol and vegetation in order to show their coupling relationship with the COVID-19 trajectory. Onward, the empirical distribution based framework (EDBF) is employed to iteratively optimize the coupling relationship between trajectory and CNF to express the real interaction. In addition, the collected data is considered from the backdate, i.e., about 23 days—which contains 14-days incubation period and 9-days invalid human response time—due to the non-availability of prior information about the natural spreading of virus without any human intervention(s), and also lag effects of the weather change and social interventions on the observed trajectory due to the COVID-19 incubation period; Second, the optimized CNF-plus-polynomial model is used to predict the future trajectory of COVID-19.Results revealed that aerosol and visibility show the higher contribution to transmission, wind speed to death, and humidity followed by barometric pressure dominate the recovery rates, respectively. Consequently, the average effect of environmental change to COVID-19 trajectory in China is minor in all variables, i.e., about -0.3%, +0.3% and +0.1%, respectively. In this research, the response analysis of COVID-19 trajectory to the compound natural interactions presents a new prospect on the part of global pandemic trajectory to environmental changes.
The CF (Climate and Forecast) Conventions are a community-developed metadata standard for storing and describing Earth system science data in the netCDF binary data format. Numerous existing FOSS (Free and Open Source Software) and commercial software tools can explore, analyze, and visualize data that is encoded using the CF Conventions. The CF community holds annual workshops to develop, refine, and review enhancements to the CF Conventions and to manage the CF governance and processes. The EarthCube netCDF-CF project worked with the CF community on the development of extensions to netCDF-CF. Several of these have been accepted into the CF Conventions. Work on these extensions involved broad participation by members of the existing netCDF-CF community as well as members of science domains not traditionally represented in the netCDF-CF community. This presentation will provide an update of recent work and an overview of CF plans and future activities.
Oceanographic research cruises produce abundant data, using a wide range of methods and equipment; very often through large collaborative efforts. These research endeavors span a broad array of disciplines and are critical to investigating the interplay between biological, geological, and chemical processes in the ocean systems over space and time. The advent of genomic sequencing technologies allows for the analysis of gene expression in a variety of environmental settings, to measure the distribution and significance of metabolites and lipids in organisms and the environment. Despite scientists’ best efforts to carefully curate and share their data with collaborators to advance individual studies and publications, no systematic, unifying framework currently exists to integrate ‘omics data with physical, geochemical, and biological datasets commonly used by the broader geoscience community. As a result, the moment each sample leaves the ship is often the last time each data component appears together in a unified collection. Typically, ‘omics datasets are submitted to nucleotide sequence repositories, whereas contextual environmental data are submitted and stored in specialized data-repositories, or only made available within published papers. This makes it difficult to fully reconnect in-situ data, therefore limiting their reuse in other studies. The development of resources to facilitate the aggregation, publication and reuse of biological datasets along with their physicochemical information is critical for studying marine microbes and the biogeochemical processes in the ocean that they drive. We present Planet Microbe, a cyberinfrastructure resource enabling data discovery and open data sharing for historical and on-going oceanographic sequencing efforts. Several historical oceanographic ‘omics datasets (Hawaii Ocean Time-series (HOT), Bermuda Atlantic Time-series (BATS), Global Ocean Sampling Expedition (GOS)) have been integrated into Planet Microbe along with new oceanic large-scale datasets as the Tara Expeditions and Ocean Sampling Day (OSD). In Planet Microbe, these ’omics data have been reintegrated with their in-situ environmental contextual data, including biological and physicochemical measurements, and information about sampling events, and sampling stations. Finally, cruise tracks, protocols and instrumentation are also linked to these datasets to provide the user with a comprehensive view of the metadata. Additionally, Planet Microbe integrates computational tools using National Science Foundation (NSF) funded Cyberinfrastructure (CyVerse) and provides users with free access to large-scale computing power to analyze and explore these datasets.
Not only are reservoir managers and aquatic scientists concerned with the environmental effects of water quality, civil engineers must also consider water quality to comply with regulations in the construction of new reservoirs, or in making structural and operational modifications to existing reservoirs. This study establishes a machine learning approach for predicting Carlson’s Trophic State Index (CTSI), which is a frequently used metric of water quality in reservoirs. Data collected over ten years (1995-2016) from the stations at 20 reservoirs in Taiwan were preprocessed as the input for the modeling system. Four well-known artificial intelligence (AI) techniques, ANN (Artificial Neural Network), SVM (Support Vector Machine), CART (Classification And Regression Technique), and LR (Linear Regression), were used to analyze in baseline and ensemble scenarios. Moreover, one variation of support vector machine was integrated with a metaheuristic optimization algorithm to develop a hybrid AI model. The comprehensive comparison demonstrated that the ensemble ANN model, based on tiering method, is more accurate than the other single, ensemble, and hybrid models. The novelty of this study is providing a new approach of AI models, reducing the complexity of measuring three traditional parameters of CTSI formula, as an alternative to the conventional approach to predicting CTSI. This work contributes to the improvement of water quality management by providing a versatile technique that offers diverse predictive methods to meet the specific requirements of practitioners.
The traditional ocean color remote sensing usually focuses on using optical inversion models to estimate the properties of in-water components from the above-surface spectra, so we call it the spectrum-concentration (SC) scheme. Unlike the SC scheme, this study proposed a new research scheme, distribution-distribution (DD) scheme, which uses statistical inference models to estimate the possibility distribution of these in-water components, based on the possibility distribution of the observed spectra. The DD scheme has the advantages that (1) it can rapidly give the key and overview information of the interest water, instead of using the SC scheme to compute each image pixel, (2) it can assist the SC scheme to improve their models and parameters, and (3) it can provide more valuable information for better understanding and indicating the features and dynamics of aquatic environment. In this study, based on Landsat-8 images, we analyzed the spectral possibility distributions (SPD) of 688 global water and found many of them were normal, lognormal, and exponential distributions, but with diverse patterns in distribution parameters such as the mean, standard deviation, skewness and kurtosis. Furthermore, we used Monte-Carlo and Hydrolight simulations to study the theoretical and statistical connections between the possibility distributions of in-water components and SPDs. The simulation results were basically consistent with the observations on the real water. Then by using the simulation and field measured data, we proposed a bootstrap-based DD scheme and developed some simple statistical inference models to estimate the distribution parameters of yellow substance in lakes. Since DD scheme is still on its early stage, we also suggested some potential and useful topics for the future work.
When NASA established the Planetary Data System (PDS) in the late 1980s, its mandate to the PDS was not merely to preserve the bytes from NASA’s planetary science missions, but to maintain the usability of the data for present and future generations. Two fundamental pillars support this ambitious goal: The external peer review required for acceptance of all archived data submissions; and the PDS Standards for data and metadata formatting and completeness. The PDS external peer review process is at least equivalent to, if not more rigorous than, the journal refereeing process(1). Data reviewers who are field experts but not affiliated with the data preparer, nor involved in the PDS consulting process, are brought in to review documentation and completeness. They are specifically charged to attempt to use the data to perform some scientific investigation (reproducing published results, comparison to correlated observations for consistency, etc.). If the reviewers are not successful, the impediments are documented and the data submission is amended by the preparer until the reviewers are satisfied. This process demonstrates immediate usability of the data. The PDS Standards, and in particular the recently-implemented version based on the PDS4 Information Model, require exhaustive metadata documenting data structure, observing circumstances, provenance, analytical metadata, and so on using the same templates across the entire archive. The associated schematic enforcement of at least minimal requirements for metadata completeness and quality provides a foundation for discoverability, interoperability, and usability of data from disparate sources throughout the archive. Together, the PDS external peer review and the Information Model-based PDS4 standards ensure both quality and usability for data accepted into the PDS archive, for this and future generations of planetary scientists. Reference: (1) Raugh, A. and Bauer, J., PDS Data Sets as Peer-Reviewed References, Poster presented at the 15th Annual Meeting of the Asia Oceania Geosciences Society Meeting, 03-08 June 2018, Honolulu, Hawai’i.
The Gabor transform can be utilized in an algorithm for compression due to its ability to allow the user to isolate high frequency information to filter. This transform can be implemented using FFT’s to aid in calculating the Gabor coefficients of a particular image. In the C++ programming language, an open source library exists called FFTW that is able to perform FFT’s quickly on the CPU. cuFFT does have a bottleneck during the initial allocation of the input, output, and plan for the desired FFT, but with larger images these became less and less impactful. Image compression algorithms using the Gabor transform can benefit in reduced computational time from cuFFT’s functions.