A CROSS-SITE VISUAL LOCALIZATION METHOD FOR YUTU ROVER

Localization of the rover is critical to support science and engineering operations in planetary rover missions, such as rover traverse planning and hazard avoidance. It is desirable for planetary rover to have visual localization capability with high degree of automation and quick turnaround time. In this research, we developed a visual localization method for lunar rover, which is capable of deriving accurate localization results from cross-site stereo images. Tie points are searched in correspondent areas predicted by initial localization results and determined by ASIFT matching algorithm. Accurate localization results are derived from bundle adjustment based on an image network constructed by the tie points. In order to investigate the performance of proposed method, theoretical accuracy analysis on is implemented by means of error propagation principles. Field experiments were conducted to verify the effectiveness of the proposed method in practical applications. Experiment results prove that the proposed method provides more accurate localization results (1%~4%) than dead-reckoning. After more validations and enhancements, the developed rover localization method has been successfully used in Chang’e-3 mission operations. * Corresponding author


INTRODUCTION
Chang'e-3, which includes China's first lunar lander and rover, was successfully soft landed at 19.51°W, 44.12°N, Mare Imbrium of the Moon on December 14, 2013.The rover Yutu (Jade Rabbit), which has a designed life span of three months, was released to the lunar surface and started surface exploration on December 15.Due to the uncertainties on lunar surface, Yutu travels no more than 10 meters at a low speed between adjacent waypoints, which are called sites, in spite of capability of travelling at a speed of up to 200 meters per hour.Yutu carries several stereo cameras, including Navcam and Hazcam, to satisfy the engineering demands of exploration.Rover localization and topographic mapping are essential to support surface explorations.Optimal driving path is generated based on 3D perception of the surrounding environment, which is realized through imaging and topographic mapping with Navcam.Meanwhile, accurate localization of Yutu plays an important role in path planning, offering safe guidance for obstacle avoidance and target/waypoint approaching.
During the mission, the Guidance, Navigation and Control System (GNC) contains an onboard dead-reckoning software, which processes the IMU and wheel odometer data to localization the rover continuously on the lunar surface.However, there are relatively large errors in localization results derived from dead-reckoning, because it generates rapid accumulative errors along with the increase of travelling distance, especially on sandy and rocky terrain.The error growth is affected by IMU drift and wheel slip caused by terrain undulation, and the localization error can be up to 15% of the traverse length, producing large uncertainty in execution of exploration tasks.
To satisfy the demands of high accurate localization, vision based methods were introduced to gain refined results in previous planetary explorations.In Mars Pathfinder mission, localization error of Sojourner was corrected through a human operator overlaying a model of rover on stereo range data, computed from downlinked imagery of the rover taken by the lander (Olson et al., 1998).Later in 2003 Mars Exploration Rover (MER) mission, visual localization method based on sequential images, called visual odometry (VO), has been used in slippery and uncertain terrains to reduce position errors accumulated by dead-reckoning and achieve better localization results, i.e., to 3%~5% (Maimone et al., 2007).Due to limitations in computational speed, image sequence based visual odometry was applied only to relatively short distances where the rovers traveled on steep slopes or in situations where a wheel was being dragged.Similar method was also used in Mars Science Laboratory (MSL) (Grotzinger et al., 2012).Cross-site localization of MER rover was also performed based on bundle adjustment (BA) technique using panoramic stereo images acquired at adjacent sites (Li et al., 2006(Li et al., , 2007a;;Di et al., 2008a).In this BA based cross-site localization, the tie points linking the cross-site images were manually selected in the first few years of mission operations; later an automated tie point selection method was developed based on rock detection and matching and applied in mission operations when large rocks being available on Martian surface (Li et al., 2007b;Di et al., 2008b).
Yutu rover is designed to capture stereo images at the waypoints without sequential imaging.As a result, distortions of stereo images between adjacent sites are larger than those in sequential images.New VO method, capable of deriving accurate localization results from cross-site stereo images with high degree of automation and quick turnaround time, is desirable for Yutu exploration applications.Different from the rock-based automatic tie point selection used in MER mission (Li et al., 2007a), the tie point selection in our cross-site rover localization uses image features (interest points) directly and does not need large objects such as rocks.Thus, on lunar surfaces where there are image textures and usually no large rocks, this new method would be more applicable and of a high degree of automation.

METHODOLOGY
In this paper, we present a new visual localization method using tie points selection based on ASIFT matching results.Corresponding points are found in restrict regions formed by initial localization results from dead-reckoning.Localization is refined by bundle adjustment (BA) to gain accurate location results.
Figure 1 shows the workflow of the proposed rover localization method.Cross-site images will be projected to ground plane in order to decrease distortions, if captured in opposite directions.Then, initial localization results obtained by dead-reckoning are used to calculate the correspondent image area of the adjacent site.They are considered as initial matching region to limit the search range and improve the reliability of the subsequent matching.After image processing for enhancement of texture and grayscale consistency, corresponding points are detected by ASIFT matching algorithm in the initial matching regions.Next, outlier detection is performed to obtain valid matching results.Finally, accurate localization results are generated from the exterior orientation parameters (EOPs) derived by BA solution.When travelling in environments restricted by texture and illumination condition, Yutu is ordered to take an extra pair stereo image backward to assist the visual localization.Due to the distortion between cross-site images, initial matching regions are extracted to guide image matching and tie point selection.
In case that cross-site images are taken in reverse directions (forward and backward), image textures recorded in adjacent sites have much larger differences and distortions.Before initial matching region extraction, the images acquired from the two sites are projected onto a flat ground plane to get the imaging areas on the ground.The overlapping area of the two imaging areas is determined.Considering the uncertainties of the initial EOPs, the overlapping area is extended to a larger rectangle.This rectangle is used as the correspondent matching area in the subsequent image matching.
For cross-site images taken in the same direction, initial localization results derived from dead-reckoning are used to mark the correspondent matching areas in the images, in order to restrict the searching regions in the subsequent feature matching procedure.Firstly, the ground coverage of current site using the initial EOPs of the images is calculated.Secondly, image regions in previous site are calculated by back-projection of the ground coverage of the current site.Considering the uncertainties of initial localization results, the interest regions in the images are generated by extension of the back-projected regions.These interest regions are used as initial matching regions to limit search range in the following feature matching.

Feature matching based on ASIFT
Challenges always remain in feature matching between twisted images.At present, Scale-invariant feature transform (SIFT) is usually used for robustness local interest detection regardless of rotation, scaling, or transformation (Lowe, 2004).An improved method based on SIFT, called Affine-SIFT(ASIFT), is proposed to cope with the apparent deformation of the object image caused by the change of camera position (Morel et al., 2009).Compared with SIFT, ASIFT simulates all image views obtainable by varying orientation parameters of the two camera axes, namely, the latitude and the longitude angles.The transition tilt, that measuring the amount of distortion from one view to another, is introduced as a new evaluation criterion of the affine invariance of classic algorithms.The method is mathematically proved to be fully affine invariant to provide efficient matching solution for the distortion between two images due to the viewpoint change.In this paper, ASIFT is applied to the procedure of feature matching in cross-site images.
Texture enhancement processing based on Wallis Filter (WF) is performed before matching for the purpose of robustness improvement (Zhang et al, 1999).The formulation of WF is represented as g 0 (x,y)=g (x,y)r 1 +r 0 (1) where r 1 =(cs f )/(cs g +s f /c) and r 0 =bm f +( 1+b+r 1 )m g .r 0 , r 1 are coefficients of multiplication and addition respectively.m g and m f represent the mean grayscale value of local image region before and after transformation; s g and s f represent the gray variance of image as above.In general, setting of m f and s f of images in previous site is referred as those of images in current site, so as to keep the grayscale consistence.According to the theoretical analysis derived from initial results, the images from the previous site are also zoomed properly to compensate the scaling distortion caused by rover movement.
After image processing on initial matching regions, ASIFT is used to perform detection and matching of local interest points between left images acquired at the two sites.By given initial localization and EOPs of stereo cameras, the number of translation tilts in the affine simulation is reduced to improve computation efficiency, which would decrease 50% time consumption for matching computation.Additionally, stereo matching of the ASIFT feature points is performed within stereo images from same site based on ZNCC (Zero mean Normalized Cross-Correlation).To reduce computation time, the search area for matching is also limited by an epipolar constraint and a distance range.For any given feature point in the left image, the feature point with the largest correlation coefficient in the search area of the right image is selected as the corresponding point.In order to delete possible incorrect matches, median filtering is employed to eliminate any points that have an abnormal disparity value when compared with the neighboring points (Di et al., 2008).To refine the matching locations to a sub-pixel level, least-squares matching is applied to the matched points (Gruen, 1985).
It is critical to detect and eliminate possible outliers in the feature-matching procedure.Because the baseline of the stereo camera is fixed, the distance between any two given ground points as calculated from different stereo pairs should be the same in spite of the position and attitude changes at two sites.In order to detect outliers, the three-dimensional coordinates of matched points are calculated by space intersection in adjacent sites.The absolute distance residuals between any two points are calculated as L − are Euclidean distances between point a and b in frames C 1 and C 2 , respectively.In theory, e should be zero.Considering slight measurement errors, if e is larger than a threshold e L , a matching or tracking error is indicated.Each matched point is assigned an initial weight value of 0. If a point pair fails in the consistency check (i.e., e > e L ), the weight values of both points do not change; otherwise, a weight of 1 is added to both points.After the consistence check has been finished for every matched point, the accumulated weight values are used to detect outliers.If the weight value of a point is less than a weight threshold, it is considered as an outlier and is eliminated from the list of matched points.

Bundle adjustment based motion estimation
Geometric constraints formed by corresponding points between cross-site images are the basis for the motion estimation solution.Compared with traditional VO algorithms, a bundle adjustment is used to obtain the motion estimation solutions.An image network is formed by linking all images of the two sites and BA of the image network to decrease the accumulation of error in motion estimation and thus provides high-precision localization results.( Sünderhauf et al., 2005;Konolige et al., 2007;Mouragnon et al., 2009 ) Though there exist initial localization results, a better initial approximation of the EOPs is required before the bundle adjustment.Photogrammetric intersection and resection methods are used to obtain this initial approximation.Given the EOPs of image pairs in previous site as EP 1 and the image positions of matched points in the left and right images of the previous site as F 1 , 3D coordinates P of these tracked points are calculated by space intersection.In current site, 3D coordinates of the matched points P and the image points on the left image are used to derive the EOPs of the left In image network formed by all the stereo images of the two sites, taking the image coordinates of the matched points (tie points) as observations, the linearized error equation is derived from photogrammetric collinearity equation using Taylor's theorem.Error equation is represented as V = At + BX -L (4) where V is the residual vector of the image coordinates of the tie points; A and B are the coefficient matrices; t is the correction vector of unknown EOPs; X is the correction vector of unknown 3D coordinates of tie points; and L is the difference between the measured image coordinates and the calculated values based on initial EOPs.In addition, the fixed relative EOPs, existing in the stereo camera and between pairs of stereo image in one site formed by mast rotation obtained from stereo calibration, is added to the BA solution as constraints Ct + W = 0 (5) where C and W are coefficient matrices of the linearized constraint equation derived through Taylor's theorem.In this solution, EOPs of previous site is input as true value.Unknown parameters in the BA are resolved iteratively using the least-squares principle.

LOCALIZATION ACCURACY ANALYSIS
Since image matching errors at sub-pixel level exist inevitably, visual localization method derives certain localization errors, which are effected by many factors, such as configuration of stereo cameras, tie points positions and distribution, and so on.In this section, localization accuracy is analyzed through theoretical derivation.The theoretical localization accuracy is given according to the geometric configuration of the Navcam used in Chang'e-3.

Range measurement error
In the "normal case" stereo image configuration, the coordinates can be calculated using the parallax equation where x, y are pixel coordinates, X, Y, Z are object coordinates, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-4, 2014ISPRS Technical Commission IV Symposium, 14 -16 May 2014, Suzhou, China This contribution has been peer-reviewed.doi:10.5194/isprsarchives-XL-4-279-2014f is the focal length, B is the baseline and p is the stereo parallax of a specific point.Then the range measurement error can be derived through error derivation as where p σ is the parallax measurement error.Covariance matrix of position error can be derived as the following (cos ) (sin ) (sin cos ) (sin cos ) (sin cos ) (sin cos ) (sin ) (cos ) where θ is azimuth angle to the specified point, θ σ is azimuth measurement error, r is the measurement distance.

Error modelling of cross-site localization
In order to analyze the localization errors refined by BA based on cross-site images, simulations of tie points number and distribution are applied to analysis.In practical applications, model of BA is based on collinearity equations.To simplify the difficulties of formulation derivation using collinearity equations, a similarity transform ( scale, rotation, and translation ) is used to adjust the position of current site.The similarity transformation is represented as where (X, Y) T are the ground coordinates of tie points obtained in previous site, (x, y) T are the ground coordinates of tie points obtained in current site, (a, b, c, d) T are the unknown transformation parameters to be solved.Under least square principle, (a, b, c, d) T is acquired by the minimum constrains of V that represented as V AX L = − (10) where A is coefficient matrix described in equation ( 9), L is the differences of the observations and computation results with the transformation parameters.According to error propagation principle for least squares adjustment, the covariance matrix of transformation parameters is calculated as where M is the covariance matrix of position errors of the tie points.
When implementing EOPs solution of current site, the transformation parameters are used to calculate modified positions as 1 0 0 1 where (X N , Y N ) T are new localization results of current site after adjustment, (x p , y p ) T are initial results.Then the covariance matrix, which is represented the rover localization accuracy of current site, can be derived as where C is coefficient matrix in Equation ( 12).

Error analysis of cross-site localization
In practice, considering stereo cameras configuration of Yutu rover, f is 1189 pixels, B is 0.27m, p σ of stereo matching is set to be 1/3 pixel.Generally, accuracy of ASIFT matching is up to 3 pixel (Wang et al., 2011).θ σ is set as tan -1 (3 pixels / f ) in the following discussion.
We design a standard configuration to analyze the localization error.When stereo images captured in one direction, the number of tie points is set as 4, and its distribution is shown in Figure 2. Considering the view range of stereo cameras, locations of landmarks are set at the places (13, 1.5), (13, -1.5), (16, 1.5) and ( 16, -1.5).The centers of previous site and current site are set at (0, 0) and (10, 0) respectively.Based on above formulation derivation, the theoretical localization accuracy is 1.05% in forward-forward image capture mode.
In the case that stereo images are captured in reverse direction, related accuracy analysis is performed through modified distribution of landmarks shown in Figure 3, of which locations are set at (3.5, 1.5), (3.5, -1.5), ( 6.5, 1.5) and (6.5, -1.5).The centers of sites are the same as in the above forwardforward configuration.Through computation of error formulation, the theoretical localization accuracy is 0.38% in forward-backward image capture mode.Compared with forward-forward capture mode, there is a significant improvement on localization accuracy, for the decrease of distances between tie points to the center of previous site.It should be noted that in the forward-backward capture mode, forward-looking images have to be acquired at the current site in order to link the next site.As a result, this forward-backward capture mode would need more time and resources to accomplish and usually matching between forward and backward images are more difficult.Thus, this mode has only been used occasionally in mission operations.

APPLICATIONS OF VISUAL LOCALIZATINO IN CHANG'E-3
In Chang'e-3, the teleoperation system is responsible for topographic mapping, visual localization, hazard avoidance, path planning and other operations with respect to the environmental perception and movement decision.Navcam, deployed as engineering cameras, plays an important role in visual navigation of Yutu, assisting to correct the accumulated error caused by dead-reckoning, of which expected error is up to 15% through ground tests.The accuracy could be improved to higher than 5%, which is fully satisfied for the demands of exploration.However, robustness of the method is affected by external texture and illumination conditions.Poor texture, dramatic changes of illumination and field of view are vital for visual location of the rover.In particular, image distortions in images acquired at adjacent site are quite large.The proposed method realizes autonomous visual localization by means of correspondent area detection and feature matching based on ASIFT.In very difficult surface conditions, manual intervention was also utilized.Figure 4 shows an example of the matching results in predicted areas marked by initial localization results, of which boundaries are marked with dotted lines.Natural Science Foundation of China (41171355, 61104190) is acknowledged.

Figure 1 .
Figure 1.The workflow of the proposed rover localization method 2.1 Initial matching region extraction Rover images captured in different viewpoints have distortions, bringing challenges in feature matching or tie point selection.Yutu is commanded to capture a 120 °partial Navcam panorama along forward orientation at each waypoint with a segment length of about 10 meter.The partial panorama

image 2 L
EP by resection given EP 1 as the initial values in an iterative least-squares solution.The EOPs of right image , R right , and R r are the rotation matrix formed by angle elements of the EOPs of the left image, right image and relative ones of the stereo camera, respectively; and T left , T right , and T r are the translation elements of EOPs of left image, right image and relative ones of the stereo camera, respectively.

Figure 2 .
Figure 2. Distribution of landmarks in forward-forward capture configuration

Figure 3 .
Figure 3. Distribution of landmarks in forward-backward capture configuration

Figure 4 .
Figure 4. Feature matching in cross-site imagesBefore Chang'e-3 mission started, the corss-site rover localization methods has been tested with a model rover, which has same size and functions as Yutu, in a simulated field with artificial craters and rocks.The filed, covered by thick tephra, provides a scene similar to lunar surface for testing.The model rover travelled in the simulated filed to execute tasks as in actual exploration.Stereo images were captured to validate the robustness and performance of the developed software.During the period of system testing, Navcam captured stereo images in 16 sites, with 7 or 8 pairs (a backward pair added) stereo image at a single site.Meanwhile, the position and attitude of the rover at every site, measured through an in-door GPS system, were considered as true values.Compared to true values, error of visual localization in the simulated field is 1%~4%, average of which is about 3.3%, which satisfied for the designed accuracy 5%.Improvements on robustness were applied before uploading the software to the ground teleoperation system.The final vesion of

Figure 5 .
Figure 5. Overall traverse of Yutu rover as of January 16 th , 20145 SUMMARY AND DISCUSSIONIn this paper, we have developed a new cross-site visual localization method for Yutu rover.In this new method, initial localization results are used to estimate the correspondennt area in cross-site images; BA based motion estimation is performed to localize the rover with high precision.Meanwhile, theoretical analysis of the cross-site localization is performed through error formulation derivation based on principles of error propagation.Experimental results in simulation field demonstrated that developed method can decrease accumulated errors of rover localization that derived from IMU drift and wheel slippage.The developed method and software have been successfully used in teleoperations of the Yutu rover.