Deep Learning based Improved Automatic Building Extraction from
Open-Source High Resolution Unmanned Aerial Vehicle (UAV) Imagery
Abstract
Automatically extracting buildings from remotely sensed imagery has
always been a challenging task, given the spectral homogeneity of
buildings with the non-building features as well as the complex
structural diversity within the image. Traditional machine learning (ML)
based methods deeply rely on a huge number of samples and are best
suited for medium resolution images. Unmanned aerial vehicle (UAV)
imagery offers the distinct advantage of very high spatial resolution,
which is helpful in improving building extraction by characterizing
patterns and structures. However, with increased finer details, the
number of images also increase many fold in a UAV dataset, which require
robust processing algorithms. Deep learning algorithms, specifically
Fully Convolutional Networks (FCNs) have greatly improved the results of
building extraction from such high resolution remotely sensed imagery,
as compared to traditional methods. This study proposes a deep learning
based segmentation approach to extract buildings by transferring the
learning of a deep Residual Network (ResNet) to the segmentation based
FCN U-Net. This combined dense architecture of ResNet and U-Net
(Res-U-Net) is trained and tested for building extraction on the open
source Inria Aerial Image Labelling (IAIL) dataset. This dataset
contains 360 orthorectified images with a tile size of 1500m2 each, at
30cm spatial resolution with red, green and blue bands; while covering
total area of 805km2 in select US and Austrian cities. Quantitative
assessments show that the proposed methodology outperforms the current
deep learning based building extraction methods. When compared with a
singular U-Net model for building extraction for the IAIL dataset, the
proposed Res-U-Net model improves the overall accuracy from 92.85% to
96.5%, the mean F1-score from 0.83 to 0.88 and the mean IoU metric from
0.71 to 0.80. Results show that such a combination of two deep learning
architectures greatly improves the building extraction accuracy as
compared to a singular architecture.