International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Publications Copernicus
Download
Citation
Volume XLII-2/W13
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-2/W13, 155–161, 2019
https://doi.org/10.5194/isprs-archives-XLII-2-W13-155-2019
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-2/W13, 155–161, 2019
https://doi.org/10.5194/isprs-archives-XLII-2-W13-155-2019
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.

  04 Jun 2019

04 Jun 2019

BUILDING SEGMENTATION FROM AIRBORNE VHR IMAGES USING MASK R-CNN

K. Zhou1, Y. Chen2, I. Smal1, and R. Lindenbergh1 K. Zhou et al.
  • 1Dept. of Geoscience and Remote Sensing, Delft University of Technology, the Netherlands
  • 2Dept. of Computational Science and Engineering, Delft University of Technology, the Netherlands

Keywords: 3D building model, VHR image, building segmentation, different scale of building, edge, Mask R-CNN, FPN, RPN, FCN

Abstract. Up-to-date 3D building models are important for many applications. Airborne very high resolution (VHR) images often acquired annually give an opportunity to create an up-to-date 3D model. Building segmentation is often the first and utmost step. Convolutional neural networks (CNNs) draw lots of attention in interpreting VHR images as they can learn very effective features for very complex scenes. This paper employs Mask R-CNN to address two problems in building segmentation: detecting different scales of building and segmenting buildings to have accurately segmented edges. Mask R-CNN starts from feature pyramid network (FPN) to create different scales of semantically rich features. FPN is integrated with region proposal network (RPN) to generate objects with various scales with the corresponding optimal scale of features. The features with high and low levels of information are further used for better object classification of small objects and for mask prediction of edges. The method is tested on ISPRS benchmark dataset by comparing results with the fully convolutional networks (FCN), which merge high and low level features by a skip-layer to create a single feature for semantic segmentation. The results show that Mask R-CNN outperforms FCN with around 15% in detecting objects, especially in detecting small objects. Moreover, Mask R-CNN has much better results in edge region than FCN. The results also show that choosing the range of anchor scales in Mask R-CNN is a critical factor in segmenting different scale of objects. This paper provides an insight into how a good anchor scale for different dataset should be chosen.