Jerry Bieszczad - ESS Open Archive

Seamless Transition of Data Analyses and Analytics from a Local Workstation to Scalab...

Jerry Bieszczad

and 4 more

January 22, 2019

Newer satellite platforms, such as NISAR, are poised to produce huge amounts of data that require large computational resources. Currently, researchers typically download datasets for analysis on local computer resources. This paradigm is no longer practical given the volumes of data from new sensing platforms. While cloud computing services offer a potential solution for accessing and managing large computational resources, there remains a significant barrier to entry. Levering cloud services requires users to: navigate new terminology without appropriate documentation; optimize settings for services to reduce costs; and maintain software dependencies, upgrades, and allocated hardware resources. A more accessible approach for migrating earth scientists to the cloud is needed. To address this problem, we are developing the open source Python library PODPAC (Pipeline for Observational Data Processing Analysis and Collaboration), with the goal of helping to address NASA’s rapidly growing observational data volume and variety needs. PODPAC enables earth scientists to seamlessly transition between processing on a local workstation (their current paradigm) to distributed remote processing on the cloud. It does this by leveraging a text-based JSON format automatically generated for any plug-and-play algorithm developed using PODPAC (e.g., in a Jupyter Notebook). This text format describes data provenance, and is used in RESTful web requests to preconfigured PODPAC cloud deployments, allowing scalable, massively distributed processing. We demonstrate the seamless transition to the cloud by developing a simplified soil moisture downscaling algorithm in Python using PODPAC. Data for this algorithm uses NASA Soil Moisture Active Passive (SMAP) sensor retrieved from the National Snow and Ice Data Center using OpenDAP, and fine-scale topographic data retrieved via Open Geospatial Consortium (OGC) Web Coverage Service (WCS) calls. We then use a serverless AWS Lambda function to run the same algorithm using the automatically-generated text format. Our generic preconfigured environment can handle a wide variety of processing pipelines, and scale up to 1024 parallel processes. This approach enables incremental adoption of cloud services by researchers, significantly lowering the barrier to entry.

SoilMAP: An Open Source Python Library for Developing Algorithms and Specialized User...

Jerry Bieszczad

and 3 more

January 21, 2020

COSMOS soil moisture sensors provide meso-scale area-averaged soil moisture estimates, presenting a unique opportunity for validating remotely sensed soil moisture data from satellite sensing platforms such as SMAP. New, roving COSMOS sensors can provide greater spatial coverage than their stationary counterparts. However, COSMOS sensors require careful site-specific calibrations, which are not available for roving sensors. As such, it is critically important for researchers to monitor roving COSMOS collection campaigns in near-real-time. However, specialized user interfaces are needed for rapid analysis. Moreover, harmonizing remotely sensed data (such as Landsat, SSURGO, MODIS, SMAP, and SRTM) with a roving COSMOS sensor is non-trivial and requires great care that cannot be accomplished on-the-fly in the field. To address these problems, we are developing the open source SoilMAP (Soil Moisture Analysis and Processing) software, which is a specialized analysis application for COSMOS and SMAP soil moisture data. We are developing this application using PODPAC (https://podpac.org/), a cloud-ready open source Python library for large-scale analysis and on-demand processing of raw earth science data. Our soil moisture analysis application aims to provide (1) customizable, rapid, near-real-time visualization and analysis of COSMOS and SMAP data; (2) unified data access and automated data wrangling to harmonize roving COSMOS measurements and SMAP L3 data; and (3) a streamlined workflow for developing roving COSMOS sensor calibrations with uncertainty estimates. We will demonstrate on-demand processing of raw soil moisture data retrieved from COSMOS sensors and SMAP L3 data using our SoilMAP software framework. We will also show our user workflows specialized for (1) staging data from various remotely-sensed and in-situ sensors, (2) monitoring a COSMOS data collection campaign in near-real-time, and (3) analyzing the resultant data with comparison to SMAP soil moisture. We will outline the steps required to build and customize this application. SoilMAP greatly reduces the burden of analyzing, comparing, and validating soil moisture data using measurements from roving COSMOS sensors.

General Server for Rapid Publishing of OGC-Compliant Earth Science Data Products

Mattheus Ueckermann

and 1 more

December 20, 2021

To make timely decisions for weather -and climate-related disasters and vulnerabilities, decision makers need current information that can be readily shared and communicated to stakeholders. To date, geospatial data is distributed using monolithic storage architectures and formats best suited for traditional research applications. Thus, everyday decision-makers face significant “barriers to entry” when trying to access, explore, and modify vast historical archives and real time data feeds. To address this need, we are developing a server architecture for rapidly creating and publishing data products. Privileged users can rapidly create or change products by operating on another product or combining multiple disparate data sources together. Users can then consume these new products using OGC-compliant WMS/WCS clients such as ArcGIS, QGIS, or Leaflet. This enables decision makers to effectively communicate with stakeholders using customized maps. Moreover, this capability enables products to be rapidly updated in cases where timely information is important. Our server architecture is containerized, making it easy to deploy on various architectures including serverless cloud resources. It is implemented in Python, leverages the plug-and-play data wrangling capabilities of the PODPAC library, and uses a custom library for serving OGC-compliant data. The result is an easy-to-use architecture for rapidly publishing custom geospatial products that exploit vast earth science data resources. We will demonstrate our server capabilities by showing how privileged users can build a set of products that are computed on-demand starting from a fresh server. Using Jupyter Lab notebooks, we will create products that modify single data sources as well as products that combine multiple disparate sources. We will then show how users can consume these products using OGC-compliant clients. Next, we will detail our cloud-based, serverless deployment of this technology using Amazon Web Services. Finally, we will discuss the advantages of our approach along with any caveats. Enabling everyday decision makers to rapidly create and share geospatial data will revolutionize their productivity and effectiveness for assessing and remediating weather- and climate-related vulnerabilities and disasters.