PIXEL-BY-PIXEL ESTIMATION OF SCENE MOTION IN VIDEO

The paper considers the effectiveness of motion estimation in video using pixel-by-pixel recurrent algorithms. The algorithms use stochastic gradient decent to find inter-frame shifts of all pixels of a frame. These vectors form shift vectors’ field. As estimated parameters of the vectors the paper studies their projections and polar parameters. It considers two methods for estimating shift vectors’ field. The first method uses stochastic gradient descent algorithm to sequentially process all nodes of the image row-by-row. It processes each row bidirectionally i.e. from the left to the right and from the right to the left. Subsequent joint processing of the results allows compensating inertia of the recursive estimation. The second method uses correlation between rows to increase processing efficiency. It processes rows one after the other with the change in direction after each row and uses obtained values to form resulting estimate. The paper studies two criteria of its formation: gradient estimation minimum and correlation coefficient maximum. The paper gives examples of experimental results of pixel-by-pixel estimation for a video with a moving object and estimation of a moving object trajectory using shift vectors’ field. * Corresponding author


INTRODUCTION
One of the challenges in video processing is moving object detection and tracking.Some tasks require only detection of the motion, while othersextraction of the moving object or the motion area boundary.The biggest challenge is to estimate parameters of the object motion in video sequence.A solution quality to the problem largely depends on the accuracy of moving object area detection, since all the information needed to determine motion parameters and trajectory of the object is extracted from the image.
There are various approaches to identify area of moving object based on the interframe difference (Elhabian, 2008, Karasulu, 2013), background subtraction (Elhabian, 2008, Wang, 2010), the use of statistics (Karasulu, 2013, Kuczov, 2006), block estimation (Grishin, 2008), optical flow analysis (Zoloty'kh, 2012).The processing can be presented as estimation of interframe geometric deformations of two images, one of which can be considered as the reference image   , where ta number of a frame; t j i z ,brightness of the image node with coordinates   be an inter-frame shift vectors' field for all the nodes   j i, of reference image corresponding to the deformed image.The shift vector can be represented as its projections  (Smirnov, 2015).The answer to the question of which set is preferable for solving the problem of moving object area detection is not obvious and requires research.

ESTIMATION ALGORITHMS
The technique for estimating shift vectors' field H is proposed.Stochastic gradient descent algorithm (Tashlinskii. 2007) sequentially estimates the parameters j i,  of shift vectors for all the points   j i, of the image r Z : where Λthe matrix of learning rates, which determines the rate of change of the estimated parameters β -gradient estimation of an objective function.
The algorithm uses a reverse processing (Tashlinskii, 2013).It processes each row i bidirectionally: first, from the left to the right: (2) getting the estimates l j i, α ˆ, and then from the right to the left getting the estimates where parameter h  is determined by the maximum speed of moving objects.
Mean square inter-frame difference is used as an objective function because the brightness of adjacent frames changes slightly.Then using parameters ) , ( y x h h gradient estimation can be written as follows: where via finite differences method.
Gradient estimation for parameters in polar form T ) , (   can be written as follows: (5) For each node   j i, optimal value of  is given by: The optimal value from the set is determined using one of the two criteria (Tashlinskii, 2015): gradient estimation minimum: and correlation coefficient maximum: where window size for calculation of correlation coefficient.
The joint processing of l j i,  and r j i,  allows compensating inertia of the recursive estimation.A comparative efficiency analysis has shown that the accuracy is higher (as well as computational cost) for correlation coefficient maximum.In the approach discussed above images are processed, in fact, as one-dimensional signals.Taking into account correlation between rows we can improve the performance of the algorithm.To do this, rows are processed one after the other with change in direction after each row with the subsequent joint processing of the estimates ( Considering the above, we can distinguish four algorithms for field H estimation: algorithm A -reverse processing using parameters ) , (

ANALYSIS OF EFFICIENCY
To analyze the efficiency of the algorithms we used images shown in    .Table 1 summarizes numerical characteristics of estimation errors for the row and for the entire image.respectively.Criterion (7) is used.Fig. 4 shows that the use of correlation between rows significantly improves the results of parameters estimation compared to reverse processing of a single row.and criterion (6) mean value of the error for motion area decreases by 5 times, varianceby 2.1 times, and for the criterion (7)by 10 times and 2.5 times respectively.Table 1 shows the actual values.

Algorithm Motion area
Area without motion 0,28 4,8 0,28 1,9 B 0,21 1,53 0,15 0,74 C 0,06 0,61 0,07 0,27 D 0,04 0,54 0,01 0,02 Table 1.Estimation error of shift vectors' field for a row The comparison of the algorithms with a well-known block algorithm MVFAST (Motion Vector Field Adaptive Search Technique) shows that MVFAST has worse accuracy of moving object detection in equal conditions.Moreover, MVFAST does not allow to get sub-pixel accuracy.The object trajectory can be estimated using the field H . Fig. 9 shows two frames from a video of the landing to aircraft carrier and Fig. 10 shows the result of processing of 34 frames of the video.It shows the trajectory of the aircraft in relative coordinates XYZ.The camera position at the initial moment of the shooting is taken as the origin.A complicating factor in the example was the uneven camera movement toward the aircraft.Therefore, to estimate the position of the moving object relative to the scene (not to the camera) it was necessary not only to detect and identify the moving object area but also to stabilize the image.
are functionally equivalent.However, due to the inertia of recurrent estimation of shift vectors' field H estimates for the sets Fig. 1(a) and Fig. 1(b), and for Fig. 1(b) and Fig. 1(c) the parameters are

Fig
Fig. 2(a) shows dependences of l j i,  and r j i,  on i , Fig. 2(b) - the result of their joint processing, Fig. 2(c)dependences of

Figure 2 .
Figure 2. Estimates of the parameters for a row using criterion (6) Fig. 3 shows the results of joint processing for the same row using criterion (7).Results in Fig. 3(a) correspond to the set of parameters ) , ( y x h h , Fig. 3(b) -) , (  .Table1 summarizes Fig. 4 shows the results of joint processing of adjacent rows (algorithms C and D).Fig. 4(a) corresponds to the set of parameters ) , ( y x h h , Fig. 4(b) -) , (   .Criterion (6) is used.Fig. 4(c) and Fig. 4(d) show results for the sets of parameters ) , ( y x h h and ) , (   respectively.Criterion (7) is used.Fig. 4 mean value of the error for motion area decreases by 1.2 times, error varianceby 1.1 times, and for the criterion (7)by 3 times and 20 times respectively.For the set of parameters ) , (  

Fig. 5
Fig. 5 shows visualization of estimates of shift vectors' field H for the reverse processing algorithms.Fig. 5 shows the magnitudes of the estimated vectors as a function of node coordinates of the reference image.Fig. 5(a) and Fig. 5(b) show results for the set of parameters ) , ( y x h h and criteria (6) and (7) respectively.Fig. 5(c) and Fig. 5(d)parameters ) , (   and criteria (6) and (7) respectively.

Figure 7 .
Figure 7. Results of moving object area identification The above results correspond to the analysis of frames in Fig. 1(a) and Fig. 1(b), which are characterized only by a parallel shift of the moving object.Fig. 8(a) and Fig. 8(b) show the visualizations of the field H for the algorithms C and D for

Figure 9 .
Figure 9.An example of frames of a video sequence Estimates given in Table 1 also confirm that.Table 1 shows mean value

Table 2 .
Table 2 shows mean value and variance of estimation error for areas with and without motion for the entire image.It contains the results both for MVFAST and proposed algorithms.Estimation error of shift vectors' field h 