loading page

The Pangeo Platform: a community-driven open-source big data environment
  • +8
  • Joseph Hamman,
  • Scott Henderson,
  • Anthony Arendt,
  • Amanda Tan,
  • Dennis Fatland,
  • Andrew Pawloski,
  • Daniel Pilone,
  • Matthew Hanson,
  • Tom Augspurger,
  • Ryan Abernathey,
  • Richard Signell
Joseph Hamman
National Center for Atmospheric Research

Corresponding Author:[email protected]

Author Profile
Scott Henderson
Cornell University
Author Profile
Anthony Arendt
University of Alaska Fairbanks
Author Profile
Amanda Tan
University of Washington
Author Profile
Dennis Fatland
Unversity of Washington
Author Profile
Andrew Pawloski
Element 84, Inc.
Author Profile
Daniel Pilone
Element 84
Author Profile
Matthew Hanson
Development Seed
Author Profile
Tom Augspurger
Anaconda Inc.
Author Profile
Ryan Abernathey
Lamont -Doherty Earth Observatory
Author Profile
Richard Signell
NOAA
Author Profile

Abstract

In this presentation, we will describe the [Pangeo Project](http://pangeo.io), a coordinated community effort with support from NASA, NSF, AWS, Microsoft Azure and Google Cloud, to develop interactive and reproducible open source workflows for discovery, visualization, and quantitative analysis of large datasets used for research in the Earth Sciences. The Pangeo computational platform is based on JupyterHub and deployed wherever the data is stored. Python libraries such as Xarray, Rasterio, and Dask enable distributed parallel computations on HPC and Kubernetes clusters. We will discuss the design concepts central to the Pangeo platform and highlight specific applications using NASA satellite data archives on AWS. We will discuss recent progress in the integration of data discovery tools (e.g. STAC, CMR, Intake) with cloud-native storage formats for multidimensional data types (Cloud-Optimized Geotiff, Zarr, etc.) and highlight how they can be used to construct elegant, robust and reproducible scientific workflows. Finally, we will discuss performance, security, transferability across public cloud platforms, cost to operate, and approaches to encourage a cultural shift in scientific computation through educational events.