STEREO IMAGE DENSE MATCHING BY INTEGRATING SIFT AND SGM ALGORITHM

Semi-global matching(SGM) performs the dynamic programming by treating the different path directions equally. It does not consider the impact of different path directions on cost aggregation, and with the expansion of the disparity search range, the accuracy and efficiency of the algorithm drastically decrease. This paper presents a dense matching algorithm by integrating SIFT and SGM. It takes the successful matching pairs matched by SIFT as control points to direct the path in dynamic programming with truncating error propagation. Besides, matching accuracy can be improved by using the gradient direction of the detected feature points to modify the weights of the paths in different directions. The experimental results based on Middlebury stereo data sets and CE-3 lunar data sets demonstrate that the proposed algorithm can effectively cut off the error propagation, reduce disparity search range and improve matching accuracy. * Corresponding author. E-mail address: moonriver_song@163.com(Y Song)


INTRODUCTION
Generating dense accurate disparity maps is the essential step and crucial technology for many applications such as digital surface model generation and three-dimensional reconstruction.By extracting local correspondences between two or more reference images, the depth information of the images can be obtained.Due to the complicated scene of stereo matching in the real world, such problems as shadows and occlusions make stereo matching a hot and difficult issue in the field of digital photogrammetry and computer vision.
According to the matching strategies (Scharstein et al., 2002), stereo matching algorithm can be divided into local stereo matching algorithm, semi-global stereo matching algorithm and global stereo matching algorithm.The local matching algorithm constructs the matching cost function with pixel and its surrounding small areas as constraints, which takes advantage of high efficiency and real-time performance but does not consider the overall consistency.The global matching algorithm essentially minimizes the global energy function, including graph cuts (Boykov et al., 1999) and belief propagation algorithms (Sun et al., 2002), which can effectively overcome image occlusion and maintain non-continuity, has great advantages in stability, reliability, but with a costly calculation.
The semi-global matching algorithm (Hirschmuller, 2005) takes the optimal one-dimensional energy in multiple directions to approximate two-dimensional global optimum, is among the top-performing algorithms in dense matching.It takes advantage of high precision of global matching algorithm and low time complexity of local matching algorithm and is also the method used in this paper.However, mis-match is likely to occur where has large variations in parallax.With the search range becoming larger, calculation efficiency decreases and time-consuming increases.
Aiming at these problems, pyramid strategies (Hermann and Klette, 2013) were used in the SGM to reduce the search range by providing initial disparity maps, but this method did not improve the accuracy rate.Chen et al. (2017) proposed taking advantage of the region growing and SGM algorithms to correct the aggregation path while accelerating parallax search speed, but the results depended on the range and accuracy of the region growth.Most of the improved algorithms of SGM focus on improving the initial cost calculation or adaptively selecting the matching parameters with image information.Li et al. (2017) presented an SGM algorithm based on ADCensus, which made the SGM more robust by taking advantage of the AD similarity measure.Zhu et al. (2017) proposed a matching method that considered texture features to better preserve edge features.However, these above methods did not consider the effect of the aggregation path direction on the matching accuracy.
In this paper, a modified SGM algorithm integrated with SIFT (Lowe 2004) is proposed to enhance the quality of the estimated depth map while decreasing the computation.The main work as follows: To reduce parallax search range, improve matching precision and direct the mis-match in dynamic programming, initial matching results using SIFT with object-oriented segmentation are used.Besides, the correctness of the matching can be improved by finding the relationship between the main direction of the detected feature points and the path in the dynamic programming with modifying the weights of the paths in different directions.
The remainder of this paper is structed as follows.In Section2, the algorithm background on SIFT and semi-global matching are reviewed, and the methodology are described in details.Section3 shows the results of the experiments.Conclusion and future work are drawn in Section4.

STEREO MATCHING
Stereo matching is to find the correspondence between images of the same scene under different imaging angle.A matching search is performed on the selected matching primitive on the horizontal epipolar line of the corrected stereo image pair, calculate the corresponding parallax to restore the depth information of the object in the three-dimensional space.

Scale-Invariant Feature Transform
SIFT (Scale-invariant feature transform) is a point feature detection and description algorithm based on scale space, which maintains invariance to rotation, scaling, and brightness changes, and has strong robustness in stereo matching problems.In this paper，SIFT is used to generate massive feature points with high accuracy and robustness, and takes advantage of the implicit disparity and information of the detected feature points and successful matching pairs as constraints to guide the calculation process of SGM.
The main steps of the SIFT feature matching algorithm are as follows: firstly, construct scale space based on DOG pyramid and spatial extremum points are detected; secondly, accurately locate the stable key points and distribute the information of the stable key points to generate 128-dimensional feature description, then Euclidean distance is used as the similarity measure criterion to match the feature descriptors and on the basis of local distance check, RANSAC algorithm is used for further eliminate false matches.Finally, epipolar constraint is used to improve the accuracy of the match.

Semi-Global Matching
Taking multipath dynamic programming as a global energy minimization calculation strategy, SGM retains the characteristics of high dynamic planning and can generate dense disparity maps pixel by pixel.It can be divided into three steps: matching cost calculation, matching cost aggregation, disparity determination.

Matching cost calculation：
In this paper, the SGM algorithm uses census transform as the matching cost, which is a local non-parametric transform.The encoding is performed in the order of the pixel gray levels in the window.Grey spatial information and local texture information can be preserved by census transform.Set the centre pixel p of the window as a reference, census transform compares the gray of other pixels in the window with p in sequence, where pixels with a smaller gray are 1, else 0, and compiles the result into a binary Hamming code.Defined as: Where Where min d is the minimum parallax search range and max d is the maximum parallax search range.

Disparity calculation：
Based on the matching cost of the stereo pair in the census transform calculation, the SGM algorithm realizes the optimal parallax acquisition by minimizing the energy function.As shown in the fig. 1, a global optimal constraint is approximated by using the optimal matching path in eight directions.3.
In the Equation: The first term is the initial matching costs; the second term is the minimum matching cost of previous point p-r including a penalty; the third term is added just to avoid r L being too big.To ensure continuous smoothing of the overall parallax, the penalties for excessive disparity discrepancies will not be too great.Penalty 1 P with disparity difference of 1 and penalty 2 P with disparity greater than 1 are set to ensure an exact matching of discontinuous parallax, rather than being over smoothed.

Disparity Determination：
After the matching cost ( , ) S p d of all pixels is calculated by Equation 4, the disparity of each pixel p can be obtained by Equation 5, where dp is the value that minimizes the total matching cost.min ( , )

SGM Integrated by SIFT
The integration of SIFT and SGM algorithm uses the detected feature points and matched points to constrain the SGM process, reduce the disparity search range and limit the spread of wrong disparity information.Each path cost aggregation in dynamic programming can be modified through extracting the main direction of each feature points.

Parallax Correction and Search Range Reduction
Based on Control points: Through dynamic programming, the correct disparity can be calculated in most areas.But once there exist mismatching pixels, the wrong disparity will deliver to the following image pixels.Lin and Zhang (2006) proposed control points of parallax, which refers to absolutely reliable points in the disparity map.To solve the problem above, in this paper, the control points are used.SIFT can detect various local features that are insensitive to light and radiation changes.Through feature extraction, feature matching and excluding outliers, massive correspondence points can be obtained.From the paper (Wu et al., 2015), successful matching points generated by SIFT has a good corrective effect on the path aggregation of SGM algorithm.For pixels located in the control points, only the correct disparity accumulates cost.
Based on the assumption that the disparity is continuously changing in the interior region of the object (Marr and Poggio, 1979), this paper uses segmentation and control points to reduce the search range for pixels around the control points.Firstly, object-oriented segmentation is performed on the base image and the result is superimposed with the control points.
The cost is calculated pixel by pixel.For pixel p, if its Manhattan distance from the nearest control point in the same partition is w which is less than the maximum disparity search range, the parallax search range can be reduced from the [dmin, dmax] to [d-w, d+w] based on the assumption that disparity in the same partition changes continuously.The d is the disparity of the nearest control points, dmin is the minimum parallax search range and dmax is the maximum parallax search range.In addition, when programming in the pixel where the control point is located, only the set disparity value can be given, thereby the propagation of the false matching cost can be truncated.By reducing the search range of dynamic programming on each pixel, the occupied memory can be reduced and the accuracy of the algorithm can be improved.

Aggregation Path Weight Correction Based on
Feature Orientation: Most SGM improved algorithm focus on improving the calculation of matching cost and setting the parameters of the penalty item to raise the accuracy, the path direction in cost aggregation is rarely considered.To study the impact of the aggregation path directions on the matching results, the SGM algorithm is divided into 8 aggregation paths , clockwise from 0°, 45°, 90°, 135°, 180°, 225°, 270° and 315°.
Experiments were performed on Cones in the Middlebury dataset.The matching results of each direction are shown in Fig. 3. From the experiments, the matching effect of single path in different directions is various.Instead of the same weight to the paths in all directions, in this paper, feature direction is used to affect the weight of aggregation path.Based on the feature detection, the gradient direction and amplitude of the feature points are calculated, and the histogram is used for statistics.
After the gradient direction histogram is constructed, the gradient information in different directions of each feature point is used to improve the path aggregation.And a relatively smaller weight is assigned to the path, which has a relatively larger gradient.An overview of the processing steps is given in Fig. 2.

DATA AND RESULTS
The established Middlebury stereo data sets (Aloe, Cones) and the stereo pair of CE-3 lunar data sets are used to verify the superiority of the modified SGM algorithm.Percentage of bad matching pixels is used as the evaluation criteria for dense matching and the matching quality of SIFT is measured by Root Mean Squared Error.The census transform is used as the matching cost and 8 path aggregations is adopted in the experiment.

Matching Quality Evaluation
In order to verify the effectiveness of this algorithm, the matching accuracy is assessed with the evaluation index which can be computed as follow, (a) Percentage of Bad Matching Pixels (PBM) where N is the total number of pixels, ( , ) Where mi (j=1, 2, 3, 4) and tx, ty are the geometric transformation parameters obtained in the experiment; x1i and y1i are the feature points in the left image; x2i and y2i are the feature points corresponding to the left image in the right image; i=1,2...N, N is the number of control points.

Middlebury Images
The results of Middlebury stereo data set (Aloe, Cones) are displayed in Fig. 6.According to the evaluation of control points extraction and dense matching, although there are a few false parallax control points, the accuracy of the final matching results will be effectively improved, which indicates that using matched points to reduce the search range will affect the accuracy of SGM while adding control points.Using the feature direction to affect the weight of the aggregation path can also improve the matching quality, and a higher accuracy will be brought by these two improved functions.
Table 1.SIFT matching results

Real World Images
As the crucial link of China's Moon Exploration Program, the Chang'e 3 (CE-3) detector has successfully landed on the moon softly on 14th, Dec, 2013, which is China's first unmanned lunar detector.The stereo pair of CE-3 lunar data sets are provided by a panoramic camera carried on the rover (also

RESULT AND FUTURE WORK
This paper presents a dense matching algorithm by integrating SIFT and SGM, takes advantage of the implicit disparity and gradient information of the successful matching points generated by SIFT as constraints to guide the cost calculation process.Experimental results show that this algorithm can improve the matching accuracy by cutting off the error propagation, reducing disparity search range and assigning different weights to each aggregation path, is a reasonable and effective stereo image dense matching algorithm.However, the efficiency and precision of the algorithm depend on the prior information, such as the matching quality of feature points.Therefore, our future is to enhance the SIFT algorithm to obtain more and higher precision control points and make full use of gradient information to correct the weight of aggregate path.
Figure 1. 8 paths The matching cost of each point is the sum of the accumulation of the eight directional matching costs, which is calculated by dynamic programming.The calculation of the cost function ( , ) r L p d of the pixel p in the disparity d along the direction of r is as in Equation 3.

Figure 3 .
Figure 3. Matching results for different path directions Figure 2. Processing steps for disparity estimation using the integrating SIFT and SGM algorithm Fig. 6(a) shows the left image of Aloe and Cones.The disparity ranges of Aloe and Cones are both 64 pixels.The results of feature extraction and feature matching using Middlebury Images are shown in Fig. 4(a) to (c).Massive evenly distributed feature points can be detected to provide gradient information and help to correct the weight of the aggregation path.The matching accuracy and the number of correctly matched pairs are shown in Table 1.Fig. 6(b) shows the true parallax image of Aloe and Cones, and the parallax images obtained by the general SGM algorithm and the improved functions are shown in Fig. 6(c) to 6(f).
Figure 4. Feature extraction and matching results of Cones and Aloe

Figure 6 .Figure 7 .
Figure 6.Disparity maps obtained with improved functions for Cones and Aloe.

Table 2 .
Accuracy of improved functions for Cones and Aloe