LEARNING MULTI-MODAL FEATURES FOR DENSE MATCHING-BASED CONFIDENCE ESTIMATION
Keywords: Cost Volume, CNN, Local-Global Approach, Fusion Network, Uncertainty
Abstract. In recent years, the ability to assess the uncertainty of depth estimates in the context of dense stereo matching has received increased attention due to its potential to detect erroneous estimates. Especially, the introduction of deep learning approaches greatly improved general performance, with feature extraction from multiple modalities proving to be highly advantageous due to the unique and different characteristics of each modality. However, most work in the literature focuses on using only mono- or bi- or rarely tri-modal input, not considering the potential effectiveness of modalities, going beyond tri-modality. To further advance the idea of combining different types of features for confidence estimation, in this work, a CNN-based approach is proposed, exploiting uncertainty cues from up to four modalities. For this purpose, a state-of-the-art local-global approach is used as baseline and extended accordingly. Additionally, a novel disparity-based modality named warped difference is presented to support uncertainty estimation at common failure cases of dense stereo matching. The general validity and improved performance of the proposed approach is demonstrated and compared against the bi-modal baseline in an evaluation on three datasets using two common dense stereo matching techniques.