AUTOMATIC RECOGNITION OF INDOOR NAVIGATION ELEMENTS FROM KINECT POINT CLOUDS

This paper realizes automatically the navigating elements defined by indoorGML data standard——door, stairway and wall. The data used is indoor 3D point cloud collected by Kinect v2 launched in 2011 through the means of ORB-SLAM. By contrast, it is cheaper and more convenient than lidar, but the point clouds also have the problem of noise, registration error and large data volume. Hence, we adopt a shape descriptor——histogram of distances between two randomly chosen points, proposed by Osada and merges with other descriptor – in conjunction with random forest classifier to recognize the navigation elements (door, stairway and wall) from Kinect point clouds. This research acquires navigation elements and their 3-d location information from each single data frame through segmentation of point clouds, boundary extraction, feature calculation and classification. Finally, this paper utilizes the acquired navigation elements and their information to generate the state data of the indoor navigation module automatically. The experimental results demonstrate a high recognition accuracy of the proposed method.


INTRODUCTION
In the past ten years, personal navigation systems have become an indispensable tool in human life, including vehicle-mounted navigation system and the location-based services of mobile phone terminals.The fast development of outdoor navigation provides a more suitable and convenient life for people.According to statistics, people spend averagely 87% of their time engaging in indoor activities, and per most aspects of our socioeconomic life, the requirements for safety and/or emergency response management of complex indoor space and underground space including shopping mall, hospital, airport, mine etc. is becoming increasingly urgent.Indoor navigation includes two basic components--location and navigation.
Indoor navigation data is the platform on which the practice of indoor navigation is based.There are many composing elements of indoor data and they are complex.The manual preparation of the entire work is a waste of time and energy, and one mistake will cause heavy workload for modification at later period.Because there is some correlation among each element, a slight move in one part may affect the end result as a whole.So, it seems much better to extract indoor elements automatically.
There are many means of collecting indoor 3D point cloud data.3D information can be acquired through 2D images (HARTLEY RI，1992).Whiles mobile 3D laser scanner acquires 3D point cloud and color images (Enrique Valero, Antonio Adan, Carlos Cerrada，2012).M3 Trolley (produced by a German company) acquires indoor panorama images and 3D point cloud, Kinect, (Izadi S, Newcombe R A, Kim D, et al.2011)which is the latest in recent years can acquire 3D point cloud with color information, color images and depth images simultaneously.The 3D information acquired by 2D images is of low precision and complex calculation.3D laser scanner can acquire high-precision  Corresponding author and high-density point cloud.But there are problems of expensive cost, and matching between point cloud and images.By contrast, Kinect is portable as well as cheap, and it can acquire color point cloud data in real time.
In recent years, the understanding of indoor 3D scene and the recognition of objects have always been a hot research field.Many scholars proposed some methods for the features used in recognizing and extracting objects.Some universities provided a set of marked indoor scene datasets (COIL ， Columbia University; Make3d, Stanford University, NYU) to more scholars conducting research.Saurabh Gupta et al proposed algorithms for object boundary detection and hierarchical segmentation in RGB-D images.Walter Wohlkinger and Markus Vincze utilized shape descriptors to conduct research of pattern recognition for small objects of bananas, cups etc.This paper uses merging features and random forest classifier to study the pattern recognition of indoor elements--door, stairway and wall, and then improves the reliability of the recognition results by combining it with prior knowledge.As people's subjective cognizance, doors and walls are perpendicular to the ground, and the side elevation of staircases are perpendicular to the ground too, which are also parallel and the same size.Our method is to take color point cloud adopted by Kinect as input data.That is because the quality of the color point cloud adopted by Kinect is relatively worse, with low precision and high noise.So, it needs a de-noise process before it is used.Plane segmentation is then conducted for the processed point cloud, each segmentation is taken as an object which needs to be recognized.After that, this paper utilizes random forest to classify all the objects to acquire the semantic and 3D information (location and size) of each object.Finally, the state information of indoor navigation model is acquired.In consideration of the precision of point cloud data and the registering accuracy having influences on the result, this paper only conducts the recognition research on single frame data, however, it means the objects are not complete.

Previous Work
The recognition of 3D objects based on point cloud is an important research orientation in current days.Some scholars proposed some ways of extracting 3D point cloud structure elements (door, wall, roof etc.) of outdoor structures acquired by 3D laser scanner, to realize the automatic 3D reconstruction of city buildings (Shi Pu; Zhuqiang Li et al.) .Some other scholars paid attention to the recognition research of indoor small objects such as cups, mouses or furniture etc. (Saurabh Gupta，Walter Wohlkinger and Markus Vincze et al.)However, different from outdoor architectures, indoor scene is more complex with narrower field angle, which makes the research of recognition of indoor objects' content based on point cloud richer.We will introduce two recognition methods of outdoor objects in a simple way below.
Shi Pu et al. in 2009 proposed an auto 3D modelling approach for city buildings based on prior knowledge, in which the point cloud acquired by a 3D laser scanner was used.At first, the scholars conducted plane segmentation for point cloud data to acquire each segmentation as plane objects.Then, they utilized all kinds of semantic relation to classify and recognize.For example, wall is a large area and intersects with the ground.Finally, the semantic object acquired by recognition was used to rebuild geometric model of structures.This method is applicable to outdoor large-scale and simple-structured buildings, and the a priori semantic adopted in this method is not completely suitable for the recognition of indoor doors and walls.
Walter Wohlkinger and Markus Vincze utilized Kinect to collect the 3D point cloud of small objects (bananas, car models, cups etc.) in different angles as experimental data.They proposed an Ensemble of Shape Functions to classify 3D objects.This method applied ESF descriptor to describe the attribute of a group of 3D point cloud, after that weight learning was used to improve the classification accuracy.The descriptive power will be weakened when this method is used in the situation of increasing categories of classification or the objects with only a part of angles to be described.

Presented Approach
IndoorGML is an indoor spatial data standard published by International open geospatial consortium.Different from several 3D building modelling standards such as CityGML, KML, and IFC, which deals with interior space of buildings from geometric, cartographic, and semantic viewpoints.IndoorGML aims at modelling indoor spaces for navigation purposes.It contains two modules: indoor core module to describe topological connectivity and different contexts of indoor space, and indoor navigation module for indoor location-based services.IndoorGML is a topological expression of the indoor environment, because of its lack of indoor location information definition and detailed description makes its data structure relatively simple, but it covers geometric and semantic properties relevant for indoor navigation in an indoor space to support location based services for indoor navigation.
The navigation module regulated by IndoorGML includes the state information of indoor navigation elements and the topology connected (transition) among them.State information includes the elements' information of location, semantic, centroid position etc.IndoorGML focus on setting up a common schema for the application of indoor navigation.It models topology and semantics of indoor spaces, which are needed for the components of navigation networks.And it divides indoor environment into several space layers, such as WIFI sensor space layer, RFID sensor space layer and topological space layer.The elements of different space layers is described by 'states' with relevant geometric and semantic properties.The edge connecting both nodes represents the event of this 'state' transition which represent the topological relationships in IndoorGML such as adjacency and connectivity between each of the 'states'.(Figure 1)The aim of our method is to recognize doors, stairways and walls from the single frame collected by Kinect, and acquire their 3D location and size to realize automatic generation of the partial indoor navigation documents (states).
Figure 2 shows the process of indoor navigation elements from data acquisition, recognition to generation of navigation information.To begin with, we use Kinect and ORB-SLAM to acquire indoor scene 3D point cloud data, and conducts a data pre-processing for the point cloud data because of its noise and registering deviation.Plane segmentation of the processed single frame point cloud data is conducted to acquire different

Data Preprocessing
The data acquired by ORB-SLAM has features of big registration deviations, and point cloud with high noise.(as shown in figure 3).So, the data needs to be pre-processed.Thus:


The color image of each station collected by Kinect is processed by Mean Filter to reduce the influence of Gaussion noises.Then combined with color images and depth images using ORB-SLAM algorithm to generate color point cloud data.


There are quantities of repeat points and null points (because of the lack of depth value) of each station which need to be get rid of.


Then, the algorithms of statistical filter and moving least squares filter are used to remove the outliers and smooth the point clouds respectively to promote the quality of the data.


Finally, the algorithm of multi-features extend information filter which covers point features and planar features is exploited to deal with two stations at a time to reduce the deviation of registration.The algorithm searches the neighbor points with ICP, which makes the neighbor points as the homonymy points during the building of the model point patterns.The paper utilizes the method of least median squares to extract planar features from two adjacent frames of point cloud data.By extracting point and planar features, multi-feature information filter model will be built; then registration deviation will be corrected.

RANSAC Plane Segmentation
RANSAC was proposed by Fishier and Bolles.The influences of outliers are eliminated by random sampling to build a fundamental subset constitutive of the inner points data.When RANSAC is used for planar parameter estimation, instead of inputting all the data without distinction for iteration, a method of judging standard iteration is designed to remove the outliers which are inconsistent with the estimated parameter, then estimates model parameter.It requires that under a condition of a certain fiducial probability, the minimum sampling number N of the foundational dataset and the probability P of getting at least one benign sample subset is satisfied using the following correlation: Where:  = error rate n=the minimum data volume needed to calculate model parameter RANSAC and fundamental matrices are combined to calculate point cloud planar model parameters.The point cloud data in the same plane satisfies the equation of a plane: If Q represents the point cloud to be studied, then the fundamental matrix is as follows: The plane equations are obtained by using the fundamental matrix with three points extracted randomly from data volume.
Then the Euclidean distance of the point to the plane is calculated, theoretically, the distance between the inliers and the plane is zero.However, because of point cloud error and other factors, the results are only approximatewhich are need to set a threshold to extract inliers from data volume.After repeating M times the sampling fitting, the optimal planar model is the one which contains the largest number of inliers.
Because of the phenomenon of noises and over-segmentation in point cloud, 200-300 segmental objects could be obtained in a frame of point cloud.Here, through setting a certain empirical threshold value to remove small scale of objects, finally an ideal segmentation result is acquired.By the way, this process can further remove noise points.Figure 5 (b) shows the segmentation result of a point cloud frame.

FEATURE RECOGNITION
An indoor scene may include many frames of data, and each data frame of scene may have several point cloud blocks after plane segmentation.However, we do not know which part each block represents.Each semantic object has its own feature, and the semantic features in the scene can be extracted through calculating the feature of each segmental object and then processing it further with machine learning.In consideration of the objects, similar shape may disturb the extracting result of registration elements.Finally we use prior knowledge to evaluate the recognition result to increase reliability.

Characteristic Vector Calculation
Usually, the shape, texture and functions are the three most important parts which are used to judge what an object is.So, this paper calculates ten features totally 95-d feature vectors, including six color features, two orientation features and two shape features.The following are a detailed introduction to these ten features:  Six color features: Different from the color diversity outdoor, usually the color indoor is more balance and single.For example, the colors of all the walls, doors and floors of a building may belong to a color system respectively and they are balance.So, we calculate the mean value and variance of H, S and V of each segmentation block, after that, the objects with mussy colors will be filtered out.For example: advertising boards on the walls which is no doubt a disturbance term for doors objects recognition.
At first, the RGB of each point is converted to HSV: min min( , , );max max( , , ) (5) Then, the mean value and variance of H, S, and V of each segmentation object are calculated:


The intersection angle of plane and axis y：The axis y of Kinect instrument is perpendicular to the ground.So, the elevations of walls, doors, and staircases is parallel to axis y; the ceiling and ground is perpendicular to axis y.All the disturbances of the segmental objects which are parallel to the ground can be removed through this feature.


Height ： According to literature on indoor scene, the height of a wall is no less than 2.5m, the height of a door is no less than 1.8m, the elevation of a staircase is no more than 0.2m. D2 shape function ： It was proposed by Osada.The distance histogram of two random points is calculated to describe the shape feature of subjects.Different from the tri-dimensional experimental objects used by Walter Wohlkinger, doors and walls are planar objects.So, this paper conducts boundary extraction for segmental objects at first, then calculates its D2 feature.Through the experiment, it was discovered that the method proposed in this paper has a better result than that of extracting D2 feature.The Euclidean distance d, between two points is satisfied as follows: The planar model which extracts convex (concave) polygon algorithm in PCL is used to extract boundary in planar point cloud.

Doors
The height is greater than a certain threshold value H (For example 1.8m) Walls Parallel to doors; under and intersect with ceilings; if not, keeping classification results.

Stairs
The elevations of each staircase are parallel to each other and with the same height.Usually, the number is more than one.

Machine Learning
This paper chooses random forest classifier which is an original classifier.In the comprehensive features, the size of D2 feature vector is 83, and the size of the remaining features is 11.Each element of feature vector in the classifier is equally-weighted, which makes feature vector become the dominant factor in the process of classification, thereby ignoring the other 11-d features and decreasing the classification accuracy.So, this paper adopts hierarchical classification, and at first classifies features through 11-d features.Then shape feature classification is conducted based on the first level classification result.Finally, a-priori semantic is used to improve the reliability of the classification result.Table 1 is knowledge constrains.The recognition accuracy in experimental result is improved.The following is the pseudocode of classification.

Results
The experimental subject is an indoor scene in a teaching building and totally acquires over one thousand frames of point cloud.In consideration of the great degree of overlap of point cloud acquired by ORB-SLAM, key frames of point cloud data to be recognized in practice were choosen, thus, over two hundreds frames of point cloud, and about 2000 segmental objects.The experiment classifies indoor scenes into 4 categories : walls, doors, staircases and others (including floors, ceilings, cumulate barriers, advertising boards etc.) Figure 6 and Table 2 are the experimental results of this paper.Table 3 outlines the results of recognition by using ESF descriptor proposed by Walter Wohlkinger and Markus Vincze.By contrast, the method put forward by this paper have a better result than the algorithm of ESF to recognize the navigable elements from indoor scene.The reason may be that the ESF requires that the objects have multi-angle point clouds and the point clouds density be uniform.However, the data volume acquired by Kinect are with noises, uneven density and lack diversity of perspective making the classification results of ESF lower.This paper extracts the boundary of each object and the semantic information of the interior elements which could overcome these problems to improve accuracy.

CONCLUSION
This paper extracts indoor navigation elements, walls and staircases automatically by using machine learning through the calculation of comprehensive features of point cloud.The node data (including the semantic, location and size of objects) in the IndoorGML navigation data files is calculated using the final extraction results.The method adopted in this paper can acquire the information of navigation elements, reduce manual intervention, and save labor force -which has research significance.As for another significant element in IndoorGML navigation document-transitionwhich connects different nodes will be the next research content.

Figure 2 :
Figure 2: Recognition process of indoor navigation elements Figure 3: a) The original point cloud data of the first floor acquired by ORB-SLAM.b) The processed point cloud data number of each segmental object's pointsPlanar normal: All the doors on the same wall are with no doubt parallel to each other as well as perpendicular to the ceiling.The elevations of each staircase are parallel to each other.
y axis and the position is the highest

Figure 4 .
Figure 4.The D2 feature graph of doors, walls and the elevations of staircases 0 0.05 0.1 0.15 0.2 door wall stair

Figure 5
Figure 5: a) The original image acquired by Kinect.b) The result of point cloud segmentation.c) The boundary extraction result of each segmentation block.d) The stacking result of boundary and original point cloud.

Figure 6 :
Figure 6: The segmentation result of 3 scenes (upper) and recognition result (below).It is random color in upper picture to distinguish different blocks; the green color in below picture represents others; the white color represents walls; the magenta color represents doors; the red color represents staircases.

Table 3 .
The confusion matrix of ESF results