The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume XLIII-B2-2021
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B2-2021, 91–99, 2021
https://doi.org/10.5194/isprs-archives-XLIII-B2-2021-91-2021
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B2-2021, 91–99, 2021
https://doi.org/10.5194/isprs-archives-XLIII-B2-2021-91-2021

  28 Jun 2021

28 Jun 2021

LEARNING MULTI-MODAL FEATURES FOR DENSE MATCHING-BASED CONFIDENCE ESTIMATION

K. Heinrich and M. Mehltretter K. Heinrich and M. Mehltretter
  • Institute of Photogrammetry and GeoInformation, Leibniz University Hannover, Germany

Keywords: Cost Volume, CNN, Local-Global Approach, Fusion Network, Uncertainty

Abstract. In recent years, the ability to assess the uncertainty of depth estimates in the context of dense stereo matching has received increased attention due to its potential to detect erroneous estimates. Especially, the introduction of deep learning approaches greatly improved general performance, with feature extraction from multiple modalities proving to be highly advantageous due to the unique and different characteristics of each modality. However, most work in the literature focuses on using only mono- or bi- or rarely tri-modal input, not considering the potential effectiveness of modalities, going beyond tri-modality. To further advance the idea of combining different types of features for confidence estimation, in this work, a CNN-based approach is proposed, exploiting uncertainty cues from up to four modalities. For this purpose, a state-of-the-art local-global approach is used as baseline and extended accordingly. Additionally, a novel disparity-based modality named warped difference is presented to support uncertainty estimation at common failure cases of dense stereo matching. The general validity and improved performance of the proposed approach is demonstrated and compared against the bi-modal baseline in an evaluation on three datasets using two common dense stereo matching techniques.