The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume XLIII-B3-2022
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B3-2022, 41–48, 2022
https://doi.org/10.5194/isprs-archives-XLIII-B3-2022-41-2022
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B3-2022, 41–48, 2022
https://doi.org/10.5194/isprs-archives-XLIII-B3-2022-41-2022
 
30 May 2022
30 May 2022

IMPROVING CNN-BASED BUILDING SEMANTIC SEGMENTATION USING OBJECT BOUNDARIES

E. Bousias Alexakis and C. Armenakis E. Bousias Alexakis and C. Armenakis
  • Geomatics Engineering, GeoICT Lab, Department of Earth and Space Science and Engineering, Lassonde School of Engineering, York University, Toronto, Canada

Keywords: Building Extraction, CNN, Building Boundaries, Semantic Segmentation, Decoupled Body and Edge Segmentation

Abstract. Semantic segmentation is an active area of research with a wide range of applications including autonomous driving, digital mapping, urban monitoring, land use analysis and disaster management. For the past few years approaches based on Convolutional Neural Networks, especially end-to-end approaches based on architectures like the Fully Convolutional Networks (FCN) and UNet, have made great progress and are considered the current state-of-the-art. Nevertheless, there is still room for improvement as CNN-based supervised-learning models require a very large amount of labelled data in order to generalize effectively to new data and the segmentation results often lack detail, mostly in areas near the boundaries between objects. In this work we leverage the semantic information provided by the objects’ boundaries to improve the quality and detail of an encoder-decoder model’s semantic segmentation output. We use a UNet-based model with ResNet as an encoder for our backbone architecture in which we incorporate a decoupling module that separates the boundaries from the main body of the objects and thus learns explicit representations for both body and edges of each object. We evaluate our proposed approach on the Inria Aerial Image Labelling dataset and compare the results to a more traditional Unet-based architecture. We show that the proposed approach marginally outperforms the baseline on the mean precision, F1-score and IoU metrics by 1.1 to 1.6%. Finally, we examine certain cases of misclassification in the ground truth data and discuss how the trained models perform in such cases.