Reducing Effects of Swath Gaps on Unsupervised Machine Learning Models for NASA MODIS Instruments

Siddha Ganju; Sarah Chen; Esther Cao; Anirudh Koul; Satyarth Praveen; Meher Anand Kasam

doi:10.1002/essoar.10507318.1

loading page

Reducing Effects of Swath Gaps on Unsupervised Machine Learning Models for NASA MODIS Instruments

Siddha Ganju,
Sarah Chen,
Esther Cao,
Anirudh Koul,
Satyarth Praveen,
Meher Anand Kasam

Abstract

Due to the nature of their pathways, NASA Terra and NASA Aqua satellites capture imagery containing “swath gaps” which are areas of no data. Swath gaps can overlap the region of interest (ROI) completely, often rendering the entire imagery unusable by Machine Learning (ML) models. This problem is further exacerbated when the ROI rarely occurs (e.g. a hurricane) and, on occurrence, is partially overlapped with a swath gap. With annotated data as supervision, a model can learn to differentiate between the area of focus and the swath gap. However, annotation is expensive and currently the vast majority of existing data is unannotated. Hence, we propose an augmentation technique that considerably removes the existence of swath gaps in order to allow CNNs to focus on the ROI, and thus successfully use data with swath gaps for training. We experiment on the UC Merced Land Use Dataset, where we add swath gaps through empty polygons (up to 20% areas) and then apply augmentation techniques to fill the swath gaps. We compare the model trained with our augmentation techniques on the swath gap-filled data with the model trained on the original swath gap-less data and note highly augmented performance. Additionally, we perform a qualitative analysis using activation maps that visualizes the effectiveness of our trained network in not paying attention to the swath gaps. We also evaluate our results with a human baseline and show that, in certain cases, the filled swath gaps look so realistic that even a human evaluator did not distinguish between original satellite images and swath gap-filled images. Since this method is aimed at unlabeled data, it is widely generalizable and impactful for large scale unannotated datasets from various space data domains.