Global-Local Attention Transformer Network for GIS Partial Discharge
Defect Detection
Abstract
The Global-Local Attention Transformer (GLAT) model is utilized for the
effective integration of global and local features in the classification
and detection of defects within Geographic Information System (GIS)
imagery. Comprised of a Vision Transformer (ViT) backbone and a
Global-Local Attention (GGLA) module, GLAT employs localized pooling and
convolutions to enhance channel interactions. Input features are
segmented, followed by the application of pooling and convolutions,
culminating in the generation of attention weights via a sigmoid
function. This process refines the features by highlighting pivotal
information and diminishing irrelevant details. Achieved through
experimentation, a classification accuracy of 83.4% is reported for
GLAT, outpacing traditional Convolutional Neural Networks (CNNs).
Additionally, the incorporation of the GGLA module across various ViT
configurations results in an average accuracy enhancement of 0.9
percentage points. Such outcomes underscore GLAT’s capability in
bolstering the identification of both global structures and local
details, signifying its potential as a novel solution in the realm of
GIS defect detection and the advancement of technology in this domain.