loading page

Unsupervised Clustering and Supervised Machine Learning for Lightning Classification: Application to Identifying EIPs for Ground-based TGF Detection
  • +5
  • Yunjiao Pu,
  • Steven A. Cummer,
  • Fanchao Lyu,
  • Yu Zheng,
  • Michael S. Briggs,
  • Stephen Lesage,
  • Bagrat Mailyan,
  • Oliver J Roberts
Yunjiao Pu
Duke University
Author Profile
Steven A. Cummer
Duke University

Corresponding Author:[email protected]

Author Profile
Fanchao Lyu
Nanjing Joint Institute for Atmospheric Sciences
Author Profile
Yu Zheng
Nanjing Joint Institute for Atmospheric Sciences
Author Profile
Michael S. Briggs
University of Alabama in Huntsville
Author Profile
Stephen Lesage
University of Alabama in Huntsville
Author Profile
Bagrat Mailyan
University of Alabama in Huntsville
Author Profile
Oliver J Roberts
USRA/NASA-MSFC
Author Profile

Abstract

We developed a framework merging unsupervised and supervised machine learning to classify lightning radio signals, and applied it to the possible detection of terrestrial gamma-ray flashes (TGFs). Recent studies have established a tight connection between energetic in-cloud pulses (EIPs, >150 kA) and a subset of TGFs, enabling continuous and large-scale ground-based TGF detection. However, even with a high peak current threshold, it is time-consuming to manually search for EIPs in a background of many non-EIP events, and it becomes even more difficult when a lower peak-current threshold is used. Machine learning classifiers are an effective tool. Beginning with unsupervised learning, spectral clustering is performed on the low-dimensional features extracted by an autoencoder from raw radio waveforms, showing that +EIPs naturally constitute a distinct class of waveform and 67% of the total population. The clustering results are used to form a labeled dataset (~10,000 events) to further train supervised convolutional neural network (CNN) that targets for +EIPs. Our CNN models identify on average 95.2% of true +EIPs with accuracy up to 98.7%, representing a powerful tool for +EIP classification. The pretrained CNN classifier is further applied to identify lower peak current EIPs (LEIPs, >50 kA) from a larger dataset (~30,000 events). Among 10 LEIPs coincident with Fermi TGF observations, 2 previously reported TGFs and 2 unreported but suspected TGFs are found, while the majority are not associated with detectable TGFs. In addition, unsupervised clustering is found to reflect characteristics of the ionosphere reflection height and its effect on radio wave propagation.
29 Dec 2022Submitted to ESS Open Archive
31 Dec 2022Published in ESS Open Archive