Managing Large-scale Atmospheric and Oceanic Climate Data for Efficient
Analysis and On-the-fly Interactive Visualization
Abstract
Managing vast volumes of climate data, often reaching into terabytes
and petabytes, presents significant challenges in terms of storage,
accessibility, efficient analysis, and on-the-fly interactive
visualization. Traditional data handling techniques are increasingly
inadequate for the massive atmospheric and oceanic data generated by
modern climate research. We tackled these challenges by reorganizing the
native data layout to optimize access and processing, implementing
advanced visualization algorithms like OpenVisus for real-time
interactive exploration, and extracting comprehensive metadata for all
available fields to improve data discoverability and usability. Our work
utilized extensive datasets, including downscaled projections of various
climate variables and high-resolution ocean simulations from NEX GDDP
CMIP6 and NASA DYAMOND datasets. By transforming the data into
progressive, streaming-capable formats and incorporating ARCO (Analysis
Ready, Cloud Optimized) features before moving them to the cloud, we
ensured that the data is highly accessible and efficient for analysis,
while allowing direct access to data subsets in the cloud. The direct
integration of the Python library called Xarray allows efficient and
easy access to the data, leveraging the familiarity most climate
scientists have with it. This approach, combined with the progressive
streaming format, not only enhances the findability, shareability and
reusability of the data but also facilitates sophisticated analyses and
visualizations from commodity hardware like personal cell phones and
computers without the need for large computational resources. By
collaborating with climate scientists and domain experts from NASA Jet
Propulsion Lab and NASA Ames Research Center, we published more than 2
petabytes of climate data via our interactive dashboards for climate
scientists and the general public. Ultimately, our solution fosters
quicker decision-making, greater collaboration, and innovation in the
global climate science community by breaking down barriers imposed by
hardware limitations and geographical constraints and allowing access to
sophisticated visualization tools via publicly available dashboards.