BOUNDARY DEPTH INFORMATION USING HOPFIELD NEURAL NETWORK

Depth information is widely used for representation, reconstruction and modeling of 3D scene. Generally two kinds of methods can obtain the depth information. One is to use the distance cues from the depth camera, but the results heavily depend on the device, and the accuracy is degraded greatly when the distance from the object is increased. The other one uses the binocular cues from the matching to obtain the depth information. It is more and more mature and convenient to collect the depth information of different scenes by stereo matching methods. In the objective function, the data term is to ensure that the difference between the matched pixels is small, and the smoothness term is to smooth the neighbors with different disparities. Nonetheless, the smoothness term blurs the boundary depth information of the object which becomes the bottleneck of the stereo matching. This paper proposes a novel energy function for the boundary to keep the discontinuities and uses the Hopfield neural network to solve the optimization. We first extract the region of interest areas which are the boundary pixels in original images. Then, we develop the boundary energy function to calculate the matching cost. At last, we solve the optimization globally by the Hopfield neural network. The Middlebury stereo benchmark is used to test the proposed method, and results show that our boundary depth information is more accurate than other state-of-the-art methods and can be used to optimize the results of other stereo matching methods. * Corresponding author


INTRODUCTION
Determination of correspondence between two pictures from different viewpoints of the same scene is the primary contribution of stereo matching.Various constraints have been used to find the optimal match, such as color consistency, gradient distribution, and geometric shape.Nevertheless, these constraints provide less information about discontinuities which is the bottleneck of stereo matching.
Compared with current matching methods, the proposed method is in our novel 3D space with less complexity, and has the constraint of discontinuities to find the optimal match.The main contributions of our work are in the following: 1) we obtain a new 3D space by radial basis information (RBI); 2) we form a new energy function to calculate the cost for the matching; 3) we convert the proposed objective function to a solvable Hopfield neural network (HNN) energy function.

RELATED WORK
Stereo matching methods can be divided into region-based and feature-based (Scharstein and Szeliski, 2002).Region-based algorithms use a sliding window to calculate the aggregation of the cost.Then various specific optimizations are followed to obtain the minimum cost for the matching, such as Graph Cut (Boykov et al., 2001) and Belief Propagation (Sun et al., 2003).Due to the propagation of errors in discontinuities, the accuracy of these methods is not satisfactory.
Feature-based algorithms pay attention to features and perform better in discontinuities.Shape based matching (Ogale and Aloimonos, 2005) has analyzed the effects of the shape of established dense point correspondence to improve the matching, but the result in the repeated shape areas is imprecise.CrossTrees (Cheng et al., 2015) uses two priors: edge and super-pixel, which are proposed to avoid the false matching in discontinuities.However, the method fails to large planar surfaces with fewer features.Instead of using adjacent pixels for matching, LLR (Zhu et al., 2012) chooses neighborhood windows and believes there is a linear relationship between pixel values and disparities.However, it is sensitive to the size of the window and large areas devoid of any features.To eliminate improper connections on the boundaries of two objects, imprNLCA (Chen et al., 2013) uses the boundary cue of the reference image which is more reliable than the color cue in areas with similar colors.Nevertheless, imprNLCA shows less improvement in discontinuities compared to original NLCA (Yang, 2012).The Borders (Mattoccia et al., 2007) obtains the precise border localization based on a variable support method (Tombari et al., 2007) for retrieving depth discontinuities.However, for images with a large range of disparities, their accuracy decreases.LCVB-DEM (Martins et al., 2015) introduces a trained binocular neuronal population to learn how to decode disparities.With the help of monocular cells, they encode both line and edge information which are critical for persevering discontinuities.However, due to the fact that mechanism of cell responses is complex, their accuracy is far from what is desirable.
Algorithm (Nasrabadi and Choo, 1992) based on the variance equations for each direction is less robust with limited application.To enhance the robustness, methods (Lee et al., 1994) (Huang and Wang, 2000) (Achour and Mahiddine, 2002) (Laskowski et al., 2015) formulated their respective energy functions which contain the constraints of similarity, smoothness and uniqueness to work well on noisy images.Although the above Hopfield neural network (HNN) based matching methods choose different constraints and obtain a desirable result for regions of interests, they neglect the matching of discontinuities.Neurons in HNN are interconnected and therefore, the change of a neuron state will affect all other input neurons.HNN converges to a stable state by updating the neurons from the activation function.This implies that it can obtain the global optimization result automatically.Hence, we use the HNN model to solve our optimization problem.
Neurons in our method are based on the disparity space rather than the pixels in the left image and the right image as the aforementioned methods.For the formation of objective function, we use discontinuity information obtained from our novel 3D space to calculate matching cost, and constraints of uniqueness and position to reduce the search for a solution.To show the performance of our algorithm, we test it on Middlebury benchmark to present our accuracy and show the improvement of state-of-the-art methods by raising their accuracy in discontinuities.

THE METHOD
This section describes a new objective function for stereo matching, and shows the derivation of the proposed optimization method.

Energy Function
Stereo matching is about assigning the best disparity for each pixel.Figure 1 is the disparity space.Different methods relate to various cost paths from the first row to the last row.Result should be smooth on continuities and preserve discontinuities on disparity map.The key problems are the energy function and the optimization method.The radial basis function gives an expression on the relationship between pixels and its neighbour (Buhmann, 2004).Thus, we calculate the surrounding information of a pixel P by In Eq.( 1), I P is the intensity value of a pixel P, and σ is the variance of neighbors of P. The neighbor pixels are selected by a window with a size of N. We describe a pixel P in a 3D space XYZ as (x, y, S) as shown in Figure 3(a) rather than only intensity.S is the surrounding information of the pixel at (x, y) in the image.The discontinuities are outstandingly visible and the continuities are smooth as shown in Figure 3(b).We call this space as RBI (Radial Basis Information) space and following is our objective function for the matching.Analog to the gradient of intensity in 2D obtained by Sobel (Farid and Simoncelli, 1997), the gradient of surrounding information G is In Eq.(2), J is the Jacobian matrix obtained by the gradient of S in the direction of axis, namely S x , S y , and S z , and defined as The eigenvectors of J show the direction of a pixel in RBI space, and eigenvalues show the scale.We use Principal Component Analysis (PCA) (Jolliffe, 2002) to calculate the principle direction of each pixel as shown in Figure 4.
PCA PCA Denote the eigenvectors and eigenvalues of P L as L x , L y , L z and α L , β L , γ L , respectively.Similarly, R x , R y , R z and α R , β R , γ R correspond to P R .The matching cost for P L and P R is The matched pixels should have the similar surrounding information, which is guaranteed by Eq.(3).In addition, pixels are unlikely to be matched from different areas, such as from discontinuities to continuities.Thus, there should be a regional restriction term.We replace Cost by adding the difference of surrounding RBI S L of P L and S R of P R as These two weighting coefficients w 1 and w 2 are chosen as constants in our algorithm.The optimization is in the disparity space, and the whole cost E for matching is In Eq.( 5), Cost' ik is the matching cost between P i in the left image and P i+k in the right image.Cost' ik implies the cost of assigning k as the disparity of P i in the disparity space.P ik is 1 or 0 which indicates the disparity of P i is k or not.The optimal solution is the assignment of disparities with the minimal E.
Suppose there are two pixels P(x L ,y L ) and P(x L ',y L ') in the left image, and their matching pixels are P(x R ,y R ) and P(x R ',y R ') in the right image, respectively.If x L ' is less than x L , x R ' should not be much larger than x R .We use a sigmoid function to keep the position in a limited changed after matching and obtain Cost ikjl as If x is negative, η is small, and Cost' ijkl goes to large.μ is used to control threshold-likeness or threshold-dislikeness of the output.Each time, we choose two pixels P i and P j in the left image together for the matching, and add the position constraint for the objective function as

The optimization model
This is a NP-hard problem which may not be solved in a polynomial time.We need to reduce the search for a solution.
The proposed objective function lacks the constraint of the number of matches.Each pixel is allocated with a unique disparity.Thus, our final energy E' is calculated as When i is equal to j, δ ij is 1, otherwise δ ij is 0. Substitute Eq.( 9) for U in Eq.( 8) to obtain Eq.( 10) where The graph of Eq.( 11) is shown in Figure 5. K describes the cost for the exact matching which is increased with the noise in the images.If P L is in discontinuities, we need a large λ to prevent P L from mismatching.However, when P L is in a large continuous area, a small λ is required to search the optimal matching.We calculate λ by Eq.( 12) where S mean L is the mean of The graph of λ in different areas of Teddy is shown in Figure 6.A large λ which relates to discontinuities is brighter than a small one.
Figure 6.λ in different areas of Teddy.Now E'' is calculated by Eq.( 13) which is the objective function for optimization in Hopfield neural network (Hopfield, 1982).Neurons are based on the disparity space.Different connections of neurons indicate different assignments of disparities for each pixel.Neurons can not be self-connected and hence W ikik is 0. The state of a neuron P ik is 0 or 1 which is calculated by the net.
P ik set to 1 implies that this neuron is active and the disparity of the pixel P i is k.When the state of a neuron changes from P mn to P mn ', the energy E'' changes from E old to E new as shown in Eq.( 14) and the differnce is ∆E as calculated in Eq.( 15).

    
In Eq.( 15), net mn is the output of HNN for the neuron P mn .If ∆P mn is 0, ∆E is 0 implying that E'' is unchanged; if ∆P mn is a positive (implying P mn ' is 1) and net mn is a negative, therefore ∆E is a negative; if ∆P mn is a negative (implying P mn ' is 0) and net mn is a positive, therefore ∆E is a negative.Thus, E'' is always falling during the updating of neuron states.The rule of updating is Eq.( 16) (Nasrabadi and Choo, 1992).After a number of iterations, we obtain the minimum E' which relates to the required optimal disparity map.

EXPERIMENTS AND EVALUATION
The workflow of our method is shown in Figure 7. First, we extract the surrounding information S of the input images and form our 3D RBI space to describe each pixel.Second we calculate the gradient of S and use PCA to obtain the principal directions of each pixel.Third, we calculate the matching cost, and add the constraints of the region, position and uniqueness to obtain the objective function.Finally, we solve the optimization problem by HNN.The output of HNN is the global minimum E which relates to the disparity map.

Figure 1 .
Figure 1.Disparity space.There are n pixels, namely P 1 , P 2 , ..., P n , and their disparities are ranged from 0 to d max .The cost path relates to the assignment of disparities for each pixel.
Figure 2(a) is the left image, and its surrounding information is shown from Figure 2(b) to 2(c) in different N. A small N results sharp discontinuities and a large N causes smooth discontinuities.
Surrounding information of Cones.(a) Original left image.(b) N=3.(c) N=5.(d) N=8.(a) (b) Figure 3.Display of the RBI space.(a) Cones in the 3D RBI space.(b) The contour of Cones in the RBI space.

Figure 4 .
Figure 4. Calculation of the cost between left image pixel P L and right image pixel P R .Each pixel has different gradients in the direction of Z, and we use PCA to obtain the principal direction of the pixel.

Figure 5 .
Figure 5. Graph of W with different parameters.

Figure 9 .
Figure 9.The improvement of start-of-the-art methods on all areas.

Table 1 .
Comparison of results with error threshold 1.0