Volume XLII-1/W1
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-1/W1, 513-517, 2017
https://doi.org/10.5194/isprs-archives-XLII-1-W1-513-2017
© Author(s) 2017. This work is distributed under
the Creative Commons Attribution 3.0 License.
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-1/W1, 513-517, 2017
https://doi.org/10.5194/isprs-archives-XLII-1-W1-513-2017
© Author(s) 2017. This work is distributed under
the Creative Commons Attribution 3.0 License.

  31 May 2017

31 May 2017

A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION

W. Zhang1,2, H. Huang3, M. Schmitz3, X. Sun1, H. Wang1, and H. Mayer3 W. Zhang et al.
  • 1Key Laboratory of Spatial Information Processing and Application System Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing, 100190, China
  • 2University of Chinese Academy of Sciences, Beijing, 100190, China
  • 3Institute for Applied Computer Science, Bundeswehr University Munich, Werner-Heisenberg-Weg 39, D-85577 Neubiberg, Germany

Keywords: Semantic Segmentation, Convolutional Networks, Multi-modal Dataset, Fusion Nets

Abstract. In recent years, the developments for Fully Convolutional Networks (FCN) have led to great improvements for semantic segmentation in various applications including fused remote sensing data. There is, however, a lack of an in-depth study inside FCN models which would lead to an understanding of the contribution of individual layers to specific classes and their sensitivity to different types of input data. In this paper, we address this problem and propose a fusion model incorporating infrared imagery and Digital Surface Models (DSM) for semantic segmentation. The goal is to utilize heterogeneous data more accurately and effectively in a single model instead of to assemble multiple models. First, the contribution and sensitivity of layers concerning the given classes are quantified by means of their recall in FCN. The contribution of different modalities on the pixel-wise prediction is then analyzed based on visualization. Finally, an optimized scheme for the fusion of layers with color and elevation information into a single FCN model is derived based on the analysis. Experiments are performed on the ISPRS Vaihingen 2D Semantic Labeling dataset. Comprehensive evaluations demonstrate the potential of the proposed approach.