loading page

H41L-1877 - Dendra: a real-time cloud-based time-series curation system
  • Collin Bode,
  • J. Scott Smith
Collin Bode
University of California, Berkeley

Corresponding Author:[email protected]

Author Profile
J. Scott Smith
University of California, Berkeley
Author Profile

Abstract

Wireless sensor networks for environmental monitoring are becoming a common tool for researchers across many of the field sciences. However, managing these systems is still an emerging issue. Internet of Things (IoT) technologies have provided many tools for creating big data solutions to these issues, but the industry is at cross-purposes with scientists. The big data approach is to collect massive amounts of data then throw out the anomalous points. For environmental monitoring, we need to archive and curate all the data as a permanent record of rapid environmental change. To achieve this, we need to combine IoT with museum curation sensibility. Dendra is cyberinfrastructure for real-time sensor data storage, retrieval, management, and curation for the field sciences. It is a cloud-based, multi-organizational system, designed to support massive permanent monitoring efforts (https://dendra.science). The name is derived from dendritic networks, such as river networks. Environmental monitoring performs in a similar manner, pulling data from the earth’s surface to a single location. To curate streaming data, we developed a dynamic data versioning system. A field scientist reports invalid data from the field via mobile phone, the annotation is approved by curator, and is instantly applied to all data accessed. Data is only modified on extract. This allows us to pull data from any time in the past with the edits and calibrations of that time. Networked data logger integration works with LoggerNet, GOES satellite, and soon Iridium satellite. Dendra is hosted on NSF’s XSEDE Jetstream cloud service. The system is designed as a set of microservices that interact through as set of persistent ques (NATS). Server-side javascript with Node.js is the primary development language. A data abstraction layer allows for multiple time-series databases (InfluxDB, MySQL, etc) to be accessed, even for a single instrument over time and reassembled as a single datastream. Access is via REST API & website. Dendra is used and supported by: Eel River Critical Zone Observatory (23 stations) in Mendocino, California; the University of California Natural Reserve System (25 stations); the Moore Foundation funded California Heartbeat Initiative (4 stations, 10 mobile, 40 planned).