Understanding the interaction between tectonic plates from geodetic data is relevant to the assessment of seismic hazard. To shed light on that prevalently slow aseismic interaction, we developed a new static-slip inversion strategy, the ELADIN (ELastostatic ADjoint INversion) method, that uses the adjoint elastostatic equations to compute the gradient of the cost function. To handle plausible slip constraints, ELADIN is a 2-step inversion algorithm. First it finds the slip that best explains the data without any constraint, and then refines the solution by imposing the constraints through a Gradient Projection Method. To obtain a selfsimilar, physically-consistent slip distribution that accounts for sparsity and uncertainty in the data, ELADIN reduces the model space by using a von Karman regularization function that controls the wavenumber content of the solution, and weights the observations according to their covariance using the data precision matrix. Since crustal deformation is the result of different concomitant interactions at the plate interface, ELADIN simultaneously determines the regions of the interface subject to both stressing (i.e., coupling) and relaxing slip regimes. For estimating the resolution, we introduce a mobile checkerboard that allows to determine lower-bound fault resolution zones for an expected slip-patch size and a given stations array. We systematically test ELADIN with synthetic inversions along the whole Mexican subduction zone and use it to invert the 2006 Guerrero Slow Slip Event (SSE), which is one of the most studied SSEs in Mexico. Since only 12 GPS stations recorded the event, careful regularization is thus required to achieve reliable solutions. We compared our preferred slip solution with two previously published models and found that our solution retains their most reliable features. In addition, although all three SSE models predict an upward slip penetration invading the seismogenic zone of the Guerrero seismic gap, our resolution analysis indicates that this penetration might not be a reliable feature of the 2006 SSE.
Automated classification of remote sensing data is an integral tool for earth scientists, and deep learning has proven very successful at solving such problems. However, building deep learning models to process the data requires expert knowledge of machine learning. We introduce DELTA, a software toolkit to bridge this technical gap and make deep learning easily accessible to earth scientists. Visual feature engineering is a critical part of the machine learning lifecycle, and hence is a key area that will be automated by DELTA. Hand-engineered features can perform well, but require a cross functional team with expertise in both machine learning and the specific problem domain, which is costly in both researcher time and labor. The problem is more acute with multispectral satellite imagery, which requires considerable computational resources to process. In order to automate the feature learning process, a neural architecture search samples the space of asymmetric and symmetric autoencoders using evolutionary algorithms. Since denoising autoencoders have been shown to perform well for feature learning, the autoencoders are trained on various levels of noise and the features generated by the best performing autoencoders evaluated according to their performance on image classification tasks. The resulting features are demonstrated to be effective for Landsat-8 flood mapping, as well as benchmark datasets CIFAR10 and SVHN.
The North Atlantic ocean is key to climate through its role in heat transport and storage. Climate models suggest that the circulation is weakening but the physical drivers of this change are poorly constrained. Here, the root mechanisms are revealed with the explicitly transparent machine learning method Tracking global Heating with Ocean Regimes (THOR). Addressing the fundamental question of the existence of dynamical coherent regions, THOR identifies these and their link to distinct currents and mechanisms such as the formation regions of deep water masses, and the location of the Gulf Stream and North Atlantic Current. Beyond a black box approach, THOR is engineered to elucidate its source of predictive skill rooted in physical understanding. A labeled dataset is engineered using an explicitly interpretable equation transform and k-means application to model data, allowing theoretical inference. A multilayer perceptron is then trained, explaining its skill using a combination of layerwise relevance propagation and theory. With abrupt CO2 quadrupling, the circulation weakens due to a shift in deep water formation regions, a northward shift of the Gulf stream and an eastwards shift in the North Atlantic Current. If CO2 is increased 1% yearly, similar but weaker patterns emerge influenced by natural variability. THOR is scalable and applicable to a range of models using only the ocean depth, dynamic sea level and wind stress, and could accelerate the analysis and dissemination of climate model data. THOR constitutes a step towards trustworthy machine learning called for within oceanography and beyond.
Heterogeneous snow accumulation in the mountains introduces uncertainty to water-supply forecasting in much of the world. Water managers’ awareness of the challenge may account for forecast errors in management decisions. We assess the impact of uncertainty in seasonal-water-supply forecasts on reservoir management using the western slope of the Sierra Nevada of California as a case study. We find that higher forecast uncertainty decreases the volume of water released from reservoirs between April and July, suggesting that water managers hedge against the possibility of lower-than-expected runoff. We modeled April-July water releases as a function of corresponding runoff forecasts, their reported uncertainty, and available storage capacity. An unbalanced (n=416) panel data model with fixed effects suggests that if uncertainty goes up by 10 units, water managers reduce releases by about 6 units, even holding the mean forecast constant. The forecast volume, its uncertainty, available storage capacity, and the interaction between forecasted volume and uncertainty were all statistically significant predictors (p < 0.005) of releases. Increased forecast uncertainty and increased available storage were significantly and inversely associated with April-July release volume, whereas forecast volume and the interaction between forecast uncertainty and forecast volume were significantly and positively associated with release volume. These results support the hypothesis that water managers behave as if they are risk-averse with respect to the possibility of less runoff than forecasted. Thus, reducing operational forecast uncertainty may result in more water being released, without the need for direct coordination with water managers.
The societal importance of geothermal energy is significantly increasing because of its low carbon-dioxide footprint. However, geothermal exploration is also subject to high risks. For a better assessment of these risks, extensive parameter studies are required that improve the understanding of the subsurface. This yields computationally demanding analyses. Often this is compensated by constructing models with a small vertical extent. This paper demonstrates that this leads to entirely boundary-dominated and hence uninformative models. It demonstrates the indispensable requirement to construct models with a large vertical extent to obtain informative models with respect to the model parameters. For this quantitative investigation, global sensitivity studies are essential since they also consider parameter correlations. To compensate for the computationally demanding nature of the analyses, a physics-based machine learning approach is employed, namely the reduced basis method, instead of reducing the physical dimensionality of the model. The reduced basis method yields a significant cost reduction while preserving the physics and a high accuracy, thus providing a more efficient alternative to considering, for instance, a small vertical extent. The reduction of the mathematical instead of physical space leads to less restrictive models and, hence, maintains the model prediction capabilities. The combination of methods is used for a detailed investigation of the influence of model boundary settings in typical regional-scale geothermal simulations and highlights potential problems.
Deep learning (DL) methods have shown great promise for accurately predicting hydrologic processes but have not yet reached the complexity of traditional process-based hydrologic models (PBHM) in terms of representing the entire hydrologic cycle. The ability of PBHMs to simulate the hydrologic cycle makes them useful for a wide range of modeling and simulation tasks, for which DL methods have not yet been adapted. We argue that we can take advantage of each of these approaches to couple DL methods into PBHMs as individual process parameterizations. We demonstrate that this is viable by developing DL process parameterizations for turbulent heat fluxes and couple them into the Structure for Unifying Multiple Modeling Alternatives (SUMMA), a modular PBHM modeling framework. We developed two DL parameterizations and integrated them into SUMMA, resulting in a one way coupled implementation (NN1W) which relies only on model inputs and a two-way coupled implementation (NN2W), which also incorporates SUMMA-derived model states. Our results demonstrate that the DL parameterizations are able outperform calibrated standalone SUMMA benchmark simulations. Further we demonstrate that the two-way coupling can simulate the long-term latent heat flux better than the standalone benchmark. This shows that DL methods can benefit from PBHM information, and the synergy between these modeling approaches is superior to either approach individually.
The recent development of the TOUGH3 code allows for a faster and more reliable fluid flow simulator. At the same time, new versions of FLAC3D are released periodically, allowing for new features and faster execution. In this paper, we present the first implementation of the coupling between TOUGH3 and FLAC3Dv6/7, maintaining parallel computing capabilities for the coupled fluid flow and geomechanical codes. We compare the newly developed version with analytical solutions and with the previous approach, and provide some performance analysis on different meshes and varying the number of running processors. Finally, we present two case studies related to fault reactivation during CO2 sequestration and nuclear waste disposal. The use of parallel computing allows for meshes with a larger number of elements, and hence more detailed understanding of thermo-hydro-mechanical processes occurring at depth.
The core tools of science (data, software, and computers) are undergoing a rapid and historic evolution, changing what questions scientists ask and how they find answers. Earth science data are being transformed into new formats optimized for cloud storage that enable rapid analysis of multi-petabyte datasets. Datasets are moving from archive centers to vast cloud data storage, adjacent to massive server farms. Open source cloud-based data science platforms, accessed through a web-browser window, are enabling advanced, collaborative, interdisciplinary science to be performed wherever scientists can connect to the internet. Specialized software and hardware for machine learning and artificial intelligence (AI/ML) are being integrated into data science platforms, making them more accessible to average scientists. Increasing amounts of data and computational power in the cloud are unlocking new approaches for data-driven discovery. For the first time, it is truly feasible for scientists to bring their analysis to data in the cloud without specialized cloud computing knowledge. This shift in paradigm has the potential to lower the threshold for entry, expand the science community, and increase opportunities for collaboration while promoting scientific innovation, transparency, and reproducibility. Yet, we have all witnessed promising new tools which seem harmless and beneficial at the outset become damaging or limiting. What do we need to consider as this new way of doing science is evolving?
This is a test-case study assessing the ability of deep learning methods to generalize to a future climate (end of 21st century) when trained to classify thunderstorms in model output representative of the present-day climate. A convolutional neural network (CNN) was trained to classify strongly-rotating thunderstorms from a current climate created using the Weather Research and Forecasting (WRF) model at high-resolution, then evaluated against thunderstorms from a future climate, and found to perform with skill and comparatively in both climates. Despite training with labels derived from a threshold value of a severe thunderstorm diagnostic (updraft helicity), which was not used as an input attribute, the CNN learned physical characteristics of organized convection and environments that are not captured by the diagnostic heuristic. Physical features were not prescribed but rather learned from the data, such as the importance of dry air at mid-levels for intense thunderstorm development when low-level moisture is present (i.e., convective available potential energy). Explanation techniques also revealed that thunderstorms classified as strongly rotating are associated with learned rotation signatures. Results show that the creation of synthetic data with ground truth is a viable alternative to human-labeled data and that a CNN is able to generalize a target using learned features that would be difficult to encode due to spatial complexity. Most importantly, results from this study show that deep learning is capable of generalizing to future climate extremes and can exhibit out-of-sample robustness with hyperparameter tuning in certain applications.
Diverse, complex data are a significant component of Earth Science’s “big data” challenge. Some earth science data, like remote sensing observations, are well understood, are uniformly structured, and have well-developed standards that are adopted broadly within the scientific community. Unfortunately, for other types of Earth Science data, like ecological, geochemical and hydrological observations, few standards exist and their adoption is limited. The synthesis challenge is compounded in interdisciplinary projects in which many disciplines, each with their own cultures, must synthesize data to solve cutting edge research questions. Data synthesis for research analysis is a common, resource intensive bottleneck in data management workflows. We have faced this challenge in several U.S. Department of Energy research projects in which data synthesis is essential to addressing the science. These projects include AmeriFlux, Next Generation Ecosystem Experiment (NGEE) - Tropics, Watershed Function Science Focus Area, Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE), and a DOE Early Career project using data-driven approaches to predict water quality. In these projects, we have taken a range of approaches to support (meta)data synthesis. At one end of the spectrum, data providers apply well-defined standards or reporting formats before sharing their data, and at the other, data users apply standards after data acquisition. As these projects continue to evolve, we have gained insights from these experiences, including advantages and disadvantages, how project history and resources led to choice of approach, and enabled data harmonization. In this talk, we discuss the pros and cons of the various approaches, and also present flexible applications of standards to support diverse needs when dealing with complex data.
Quantifying the response of human activities to different COVID-19 measures may serve as a potential way to evaluate the effectiveness of the measures and optimize measures. Recent studies reported that seismic noise reduction caused by less human activities due to COVID-19 lockdown had been observed by seismometers. However, it is difficult for current seismic infrastructure in urban cities to characterize spatiotemporal seismic noise during the post-COVID-19 lockdown because of sparse distribution. Here we show key connections between progressive COVID-19 measures and spatiotemporal seismic noise changes recorded by a distributed acoustic sensing (DAS) array deployed in State College, PA. We first show spatiotemporal seismic noise reduction (up to 90%) corresponding to the reduced human activities in different city blocks during the period of stay-at-home. We also show partial noise recovery corresponding to increased road traffics and machines in Phase Yellow/Green. It is interesting to note that non-recovery seismic noise in 0.01-10 Hz suggests the low level of pedestrian movement in Phase Yellow/Green. Despite of a linear correlation between mobility change and seismic noise change, we emphasize that DAS recordings using city-wide fiber optics could provide a way for quantifying the impact of COVID-19 measures on human activities in city blocks.
The Earth’s upper atmosphere is a dynamic environment that is continuously affected by space weather from above and atmospheric processes from below. An effective way to observe this interface region is the monitoring of airglow. Since the 1950s, airglow emissions have been systematically measured by ground-based photometers in specific wavelength bands during the nighttime. The availability of the calibrated data from over 30 years of photometric airglow measurements at Abastumani in Georgia (41.75 N, 42.82 E), at wavelengths of 557.7 nm and 630.0 nm, enable us to investigate if a data-driven model based on advanced machine learning techniques can be successfully employed for modeling airglow intensities. A regression task was performed using the time series of space weather indices and thermosphere-ionosphere parameters. We have found that the developed data-driven model has good consistency with the commonly used GLOW airglow model and also captures airglow variations caused by cycles of solar activity and changes of the seasons. This enables us to visualize the green and red airglow variations over a period of three solar cycles with a one-hour time resolution.
Recent developments of infrastructures and methods are major driving forces in the advances of solid Earth sciences. The deployment of large and dense sensor networks enables data centres to acquire data of increased volume and quality. The analysis of such data provides scientists with a better understanding about natural phenomena in the subsurface. Nevertheless new challenges arise to exploit the growing information potential. Innovative methods based on Artificial Intelligence offer concrete opportunities to tackle those challenges. In this paper we present an investigation of Convolutional Neural Networks (CNN) for seismo-acoustic event classification in the Netherlands. We designed, trained and evaluated two CNN models. Our results suggest that as CNN inputs spectrograms are more suitable than continuous waveforms. We discuss our findings’ potential and requirements for their operational adoption. We focus on explainability aspects and offer an approach to pave the way for a broader uptake of Artificial Intelligence based methods.
Recent breakthroughs in artificial intelligence (AI), and particularly in deep learning (DL), have created tremendous excitement and opportunities in the earth and environmental sciences communities. To leverage these new ‘data-driven’ technologies, however, one needs to understand the fundamental concepts that give rise to DL and how they differ from ‘process-based’, mechanistic modelling. This paper revisits those fundamentals and addresses 10 questions often posed by earth and environmental scientists with the aid of a real-world modelling experiment. The overarching objective is to contribute to a future of AI-assisted earth and environmental sciences where DL models can (1) embrace the typically ignored knowledge base available, (2) function credibly in ‘true’ out-of-sample prediction, and (3) handle non-stationarity in earth and environmental systems. Comparing and contrasting earth and environmental problems with prominent AI applications, such as playing chess and trading in stock markets, provides critical insights for better directing future research in this field.
Simulations of human behavior in water resources systems are challenged by uncertainty in model structure and parameters. The increasing availability of observations describing these systems provides the opportunity to infer a set of plausible model structures using data-driven approaches. This study develops a three-phase approach to the inference of model structures and parameterizations from data: problem definition, model generation, and model evaluation, illustrated on a case study of land use decisions in the Tulare Basin, California. We encode the generalized decision problem as an arbitrary mapping from a high-dimensional data space to the action of interest and use multi-objective genetic programming to search over a family of functions that perform this mapping for both regression and classification tasks. To facilitate the discovery of models that are both realistic and interpretable, the algorithm selects model structures based on multi-objective optimization of (1) their performance on a training set and (2) complexity, measured by the number of variables, constants, and operations composing the model. After training, optimal model structures are further evaluated according to their ability to generalize to held-out test data and clustered based on their performance, complexity, and generalization properties. Finally, we diagnose the causes of good and bad generalization by performing sensitivity analysis across model inputs and within model clusters. This study serves as a template to inform and automate the problem-dependent task of constructing robust data-driven model structures to describe human behavior in water resources systems.
Copepods are the dominant members of the zooplankton community and the most abundant form of life. It is imperative to obtain insights into the copepod-associated bacteriobiomes (CAB) in order to identify specific bacterial taxa associated within a copepod, and to understand how they vary between different copepods. Analysing the potential genes within the CAB may reveal their intrinsic role in biogeochemical cycles. For this, machine-learning models and PICRUSt2 analysis were deployed to analyse 16S rDNA gene sequences (approximately 16 million reads) of CAB belonging to five different copepod genera viz., Acartia spp., Calanus spp., Centropages sp., Pleuromamma spp., and Temora spp.. Overall, we predict 50 sub-OTUs (s-OTUs) (gradient boosting classifiers) to be important in five copepod genera. Among these, 15 s-OTUs were predicted to be important in Calanus spp. and 20 s-OTUs as important in Pleuromamma spp.. Four bacterial s-OTUs Acinetobacter johnsonii, Phaeobacter, Vibrio shilonii and Piscirickettsiaceae were identified as important s-OTUs in Calanus spp., and the s-OTUs Marinobacter, Alteromonas, Desulfovibrio, Limnobacter, Sphingomonas, Methyloversatilis, Enhydrobacter and Coriobacteriaceae were predicted as important s-OTUs in Pleuromamma spp., for the first time. Our meta-analysis revealed that the CAB of Pleuromamma spp. had a high proportion of potential genes responsible for methanogenesis and nitrogen fixation, whereas the CAB of Temora spp. had a high proportion of potential genes involved in assimilatory sulphate reduction, and cyanocobalamin synthesis. The CAB of Pleuromamma spp. and Temora spp. have potential genes accountable for iron transport.
This is a network analysis approach to locate potential technosignatures in space. In the approach, nodes represent exoplanet host stars (host stars as a proxy for exoplanet locations when working with interstellar distances), while edges or connections represent hypothetical ET navigation/communication pathways between the nodes. The approach is flexible whereby it can apply to either non-radio or radio technosigntures. A customizable network fitting algorithm is used to determine the network topology. The data source is the NASA Exoplanet Archive, and the software program used to perform the analysis is a Python software package known as a Point Processing Toolkit or “pptk”, which is useful for visualizing 2D and 3D points. Prospective contributions to the field include narrowed down locations of potential technosignatures in space for mission or project design (e.g., involving the James Webb Space Telescope, TESS…), and operationalization of the Drake Equation in regard to the equation’s term pertaining to the fraction of planets that develop intelligent life.
Global water use for food production needs to be reduced to remain within planetary boundaries, yet the financial feasibility of crucial measures to reduce water use is poorly quantified. Here, we introduce a novel method to compare the costs of water conservation measures with the added value that reallocation of water savings might generate if used for expansion of irrigation. Based on detailed water accounting through the use of a high-resolution hydrology-crop model, we modify the traditional cost curve approach with an improved estimation of demand, increasing marginal cost per water conservation measure combination and add a correction to control for impacts on downstream water availability. We apply the method to three major river basins in the Indo-Gangetic plain (Indus, Ganges and Brahmaputra), a major global food producing region but increasingly water stressed. Our analysis shows that at basin level only about 10% (Brahmaputra) to just over 20% (Indus and Ganges) of potential water savings would be realised; the equilibrium price for water is too low to make the majority of water conservation measures cost effective. The associated expansion of irrigated area is moderate, about 7% in the Indus basin, 5% in the Ganges and negligible in the Brahmaputra, but farmers’ gross profit increases more substantially, by 11%. Increasing the volumetric cost of irrigation water influences supply and demand in a similar way and has little influence on water reallocation. Controlling for the impact on return flows is important and more than halves the amount of water available for reallocation.