Abstract
A machine-learning classifier for radiation waveforms of negative return
strokes (RSs) is built and tested based on the Random Forest classifier
using a large dataset consisting of 14,898 negative RSs and 159,277
intracloud (IC) pulses with 3-D location information. Eleven simple
parameters including three parameters related with pulse characteristics
and eight parameters related with the relative strength of pulses are
defined to build the classifier. Two parameters for the evaluation of
the classifier performance are also defined, including the
classification accuracy, which is the percentage of true RSs in all
classified RSs, and the identification efficiency, which is the
percentage of correctly classified RSs in all true RSs. The tradeoff
between the accuracy and the efficiency is examined and simple methods
to tune the tradeoff are developed. The classifier achieved the best
overall performance with an accuracy of 98.84% and an efficiency of
98.81%. With the same technique, the classifier for positive RSs is
also built and tested using a dataset consisting of 8,700 positive RSs.
The classifier has an accuracy of 99.04% and an efficiency of 98.37%.
We also demonstrate that our classifiers can be readily used in various
lightning location systems. By examining misclassified waveforms, we
show evidence that some RSs and IC discharges produce special radiation
waveforms that are almost impossible to correctly classify without 3-D
location information, resulting in a fundamental difficulty to achieve
very high accuracy and efficiency in the classification of lightning
radiation waveforms.