The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume XLIII-B2-2021
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B2-2021, 449–456, 2021
https://doi.org/10.5194/isprs-archives-XLIII-B2-2021-449-2021
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B2-2021, 449–456, 2021
https://doi.org/10.5194/isprs-archives-XLIII-B2-2021-449-2021

  28 Jun 2021

28 Jun 2021

LABEL-EFFICIENT DEEP LEARNING-BASED SEMANTIC SEGMENTATION OF BUILDING POINT CLOUDS AT LOD3 LEVEL

Y. Cao and M. Scaioni Y. Cao and M. Scaioni
  • Department of Architecture, Built Environment and Construction Engineering, Politecnico di Milano, via Ponzio 31, 20133 Milano, Italy

Keywords: 3D Point Cloud, Autoencoder, Label-efficient, LoD3 Building, Unsupervised Deep Learning

Abstract. In recent research, fully supervised Deep Learning (DL) techniques and large amounts of pointwise labels are employed to train a segmentation network to be applied to buildings’ point clouds. However, fine-labelled buildings’ point clouds are hard to find and manually annotating pointwise labels is time-consuming and expensive. Consequently, the application of fully supervised DL for semantic segmentation of buildings’ point clouds at LoD3 level is severely limited. To address this issue, we propose a novel label-efficient DL network that obtains per-point semantic labels of LoD3 buildings’ point clouds with limited supervision. In general, it consists of two steps. The first step (Autoencoder – AE) is composed of a Dynamic Graph Convolutional Neural Network-based encoder and a folding-based decoder, designed to extract discriminative global and local features from input point clouds by reconstructing them without any label. The second step is semantic segmentation. By supplying a small amount of task-specific supervision, a segmentation network is proposed for semantically segmenting the encoded features acquired from the pre-trained AE. Experimentally, we evaluate our approach based on the ArCH dataset. Compared to the fully supervised DL methods, we find that our model achieved state-of-the-art results on the unseen scenes, with only 10% of labelled training data from fully supervised methods as input.