AN INDOOR SLAM METHOD BASED ON KINECT AND MULTI-FEATURE EXTENDED INFORMATION FILTER

Based on the frame of ORB-SLAM in this paper the transformation parameters between adjacent Kinect image frames are computed using ORB keypoints, from which priori information matrix and information vector are calculated. The motion update of multi-feature extended information filter is then realized. According to the point cloud data formed by depth image, ICP algorithm was used to extract the point features of the point cloud data in the scene and built an observation model while calculating a-posteriori information matrix and information vector, and weakening the influences caused by the error accumulation in the positioning process. Furthermore, this paper applied ORB-SLAM frame to realize autonomous positioning in real time in interior unknown environment. In the end, Lidar was used to get data in the scene in order to estimate positioning accuracy put forward in this paper.


INTRODUCTION
Currently, with the continuous development of visual sensors, the technology of visual navigation and positioning has been widely used in motion platform.While algorithm and processor performance are updated continuously, how to realize robot selflocalization and mapping through visual sensor in unknown environment has been a current research hotspot.
It is noteworthy that a number of RGB-D launched in recent years is widely used in the field of indoor positioning and mapping.Izadi et al.(2011) described how to get the depth image and color image of Kinect sensors in detail, and realized scene reproduction in real time with the provision of precise 3-D model.Whelan et al.(2012) extended the Kinect Fusion algorithm, after its application, the region of space mapped by Kinect Fushion algorithm could change dynamically.Furthermore, high-density point cloud data in the scene was extracted and was put into the environment.This was displayed by triangular mesh, realizing real-time processing of high-density model building for the objects in the scene.The slam method of RGB-D was firstly proposed by Newcombe et al. (2011), and ICP algorithm model was applied to work out the real time registration and mapping display of Kinect sensor.Endres et al. (2014) et al. proposed RGBD-slam system model, achieving frame registration through feature matching among frames and ICP algorithm.Raul et al. (2016) put forward ORB-SLAM system to solve the problems of monocular, stereo and the slam of RGB-D.Santos et al. (2016) proposed an adaptive registration model from coarse-to-fine with making use of RGB-D data.Xiang Gao et al. (2015) proposed feature planar features in the scene to reduce deviation accumulation.
At the present stage, slam system is often used for dealing with the problems of indoor positioning.Loop closure method included in Slam system can weaken the error calculation's  Corresponding author influence on point cloud data.In a large-scale scene or a patency scene however, the function of loop closure is next to nothing.Because the ORB-SLAM system can quickly obtain the color image of the ORB feature points, this paper proposes multifeature extend information filter model with using ORB-SLAM frame to weaken the influences on positioning accuracy caused by error calculation and realize real time scene location.

Data Acquisition
The RGB-D sensor applied in this paper is Kinect v2.0 developed by Microsoft.This camera can acquire depth image in the range of 1-5m in the scene, combining with mutual calibration color camera, it can generate point cloud data in the scene.The experimental data in this paper has a resolution of 960*540.A frame of color and depth image respectively are as shown below:

2.2Data tracking
This paper completes the transaction of pair-wise points from 2-D image to 3-D point cloud data, and makes the confirmation of consecutive frame pose.Though traditional algorithm of SIFT or SURF can make good use of visual feature to realize object identification, image registration, visual image etc., it also serves as a great burden for computers.
ORB algorithm provides a new means of combining inspection method of FAST feature point with BRIEF feature descriptor while getting improvement and optimization on the primary basis as well as increasing efficiency.Tracking model applies ORB algorithm to extract the feature points of present frame of color image, registers the extracted feature point with the former frame feature points, and acquire the relative transformational correlation of present frame of data  and the previous frame of data  − 1 i.e.Δ  = [Δ, Δ, Δ, Δ, Δ, Δ] .Then, at present time , the state vector of each frame of data is expressed as: Where  represents the pose of sequence images received up to now.  () represents the pose of the data of  ℎ frame at the time .Information matrix: Information vector:   =   −1 *   =   *   To calculate the coordinate of the present data in the global coordinate system based on   and Δ  , the resultant data is Here,  +1 − () represents the a-priori information vector in the global coordinate system at the time , determined based on the present frame.(. ) is a system-augmented function.() describes a variety of uncertainties in the registration and modeling process, and it is assumed to be a Gaussian distribution expressed as white noise vector (0, ).
The present frame pose is added into the information vector and the update of each frame of data status vector is realized.

Measurement Updates
After realizing data tracking by completing the extraction of 2-D color image features, this paper utilizes the point cloud data generated by depth data to establish multi-feature measurement model while extracting features of point and plane as well as realizing the update of information vector and information matrix.

Point-feature model
This paper extracts closet point of the data of the adjacent two frames by ICP algorithm, and takes these closet points as point features to establish a measurement model.It is supposed that the data of the adjacent two frames is expressed as ( 1 ,  1 ,  1 ) in the coordinate system of  ℎ frame, and ( 2 ,  2 ,  2 ) in the coordinate system of  ℎ frame shown as follows: Here, ℎ(. ) is the systematic measurement function, and () denotes a variety of uncertainties in the scanning measurement and the transformation of coordinates which is supposed to comply with the Gaussian distribution expressed as white noise vector (0, ).

Planar feature model
This paper extracts the same planar feature of the adjacent two frames of the data to establish the measurement model of planar feature.The planar information between two frames of data only controls the rotation parameters among the transformation parameters.So, it is assumed to be the same planar unit normal vector as ( 1 ,  1 ,  1 ) and ( 2 ,  2 ,  2 ) .It is expressed as ( 1 ,  1 ,  1 ) in the  ℎ frame coordinate system, as ( 2 ,  2 ,  2 ) in the  ℎ frame coordinate system.Based on the present pose of the survey station point of  ℎ and  ℎ respectively to estimate   () and   (), the coordination is changed into global coordinate, which is expressed as ( 1 ,  1 ,  1 ) and ( 2 ,  2 ,  2 ).R is the rotation.The planar feature model is derived as

Multi-feature Measurement model
This paper acquires the features of points and planes after the establishment of the measurement model of points and planes.
Therefore, the measurement model derived from multiple features is as follows: 1 represents feature information of points;  2 represents feature information of planes.Their accuracies are different, and the two information types are assigned different weights to enhance the final result.
Here,  1 is the weight of pair-wise points;  2 is the weight of planar feature;  1 is the covariance matrix of pair-wised points;  2 is the covariance matrix of multi-feature.

Loop Closure
In the process of indoor positioning, error accumulation is unavoidable with the continuous increase of data acquired.In order to weaken the influences of error accumulation in indoor positioning systems, the method of loop closure is provided to improve the accuracy positioning and mapping.
Through feature point registration, loop closure can determine whether or not to collect the data of the present scene.When the number of detected features meets a certain threshold, the data will be recognized as already acquired.Comparing the pose of present scene with the pose of repetitive scene, the value difference between the poses is the result of error accumulation of all the key frames between the two frames of the repetitive scene.Meanwhile, the information vector is modified to make the poses of two repetitive frames of data reach unanimity and complete the loop closure.

Positioning result display
With the help of Pangolin base to realize visualization user interface, the result of real-time positioning according to the pose of the present frame in global coordinate system is displayed as follows: As shown in figure 3, the process of tracing displayed points is expressed by a rectangular pyramid instead of only a point to stand for the location of the present frame.The result of positioning is expressed by the rectangular pyramid.The normal vector of the bottom plane of the rectangular pyramid is used to express the rotation magnitude of the present frame and the global coordinate system tri-axial rotation magnitude.If the green solid line of the data between every two frames is jointed, it means there are some certain identical feature points of data in the adjacent frames, hence, the loop closure detection is finished.
While displaying 3-D scene, the paper only displays feature points instead of all the generated point cloud data in order to reduce the complexity of point cloud data in the map and also to reduce the requirements of computer hardware.

THE ANALYSIS OF EXPERIMENT RESULT
The experimental scene chosen by this paper is the building experimental scene.Lidar is used to obtain high precision laser point cloud data in the experimental scene, and merged with Kinect sequence images by manual-registration to obtain accurate position points.At the same time, the method of ORB-SLAM is used to process the obtained sequence images, and its result is compared with the positioning result of the method proposed in this paper.Comparing the result of ORB-SLAM and the positioning result of multi-feature extend information filter with the accurate positioning of fused data as shown in figure a; the blue track is the positioning result of fused data, the green track is the positioning result of MEIF (multi-feature extend information filter), and the red track is the positioning result of ORB-SLAM.From the comparisons above, it can be seen that the positioning method proposed in this paper yielded better results compared to the results of ORB-SLAM.In order to find the positional accuracy of the two patterns precisely, a comparison of the two positioning information and the location result of fused data was made, the acquired corresponding pose of the homonymous frame was used to make a comparison, and calculate the spatial distance of the tracking points.The positioning deviation of ORB-SLAM is between 0 m to 0.8 m, and the location deviation of multi-feature extend information filter is in the range of 0 m to 0.5 m.The RMSE of ORB-SLAM and MEIF are 0.48m and 0.23m.

CONCLUSION
In allusion to the problem that deviation accumulation cannot be weakened effectively in large-scale scene and patency scene of present slam, this paper extracts ORB feature points in the scene and builds an extended information filtering model which can weaken the influences of deviation accumulation effectively to realize the location of indoor scene.

a
Global positioning map b Partial enlarged detail Figure 2.Real-time positioning result display Figure6.Tracing points of fused data thinning3.2.Positioning resultMaking use of the sequence images obtained by Kinect, this paper locates sequence images with the help of ORB-SLAM system and the multi-feature extend information filter proposed in this paper.The result is shown in the following figures: