LIDAR-BASED INITIAL GLOBAL LOCALIZATION USING IMPERFECT ARCHITECTURAL SKELETON INFORMATION

: Initial global localization of a mobile robotic platform is the foundation for its navigation and mapping, especially when the platform enters into unknown environments. In GNSS-denied indoor scenes, LiDAR is widely used for robot localization, especially in indoor scenes with poor lighting. In most existing LiDAR-based initial global localization methods, it is necessary to build the point cloud reference map in advance, which costs a large quantity of manpower and time. For this reason, a LiDAR-based initial global localization method using imperfect architectural skeleton information is proposed in this work. Firstly, we propose a lightweight management scheme for collected imperfect architectural information, which is convenient for efficient registration with real scans. Secondly, we extract architectural skeletons (stable man-made structures such as walls and columns) from both architectural information and real scans, and design them as line pairs feature patterns like P-LP, V-LP and C-LP. Thirdly, we propose a matrix descriptor for line pairs feature patterns description and initial matching. Finally, we construct error equations to estimate the pose by initial matching line pairs, and acquire the optimal localization results with the highest hit ratio on architectural grid map. A mobile robotic platform with the 16 beam LiDAR is experimented in typical indoor scenes such as rooms, corridors and undergrounding parking lots. Experiments show that the success rate of initial global localization reaches 80%, the average position error is about 0.10m and the running time is about 400ms per 1000 scans, which meet the requirements of indoor autonomous driving.


INTRODUCTION
Since autonomous driving plays a more important role in public transportation, logistics and other services fields, robot localization has become a focused issue. Initial global localization is used to estimate the global pose of the mobile robotic platform, which is one of the key steps in robot navigation and mapping, especially in SLAM (Simultaneous Localization and Mapping). In GNSS-denied indoor scenes, the camera and LiDAR are two main reliable sensors for robot localization. Compared with the camera, LiDAR has many advantages such as applicable to poor lighting even dark scenes, wide viewing range and high precision, so that it has been widely used for robot global localization (Chan et al., 2021). Most existing LiDAR-based initial global localization methods are based on point cloud reference map built in advance, which needs expert operators spend lots of time to ensure the consistency and usability of the generated maps. Meanwhile, the reference map usually takes up a lot of memory space during the process of localization, especially in large-scale indoor scenes. These greatly limit the application scenarios of existing LiDAR-based initial global localization methods.
With the development of digital cities, almost every building has equipped with its architectural skeleton information from CAD, BIM or architectural information of other sources (Lu et al., 2020). Architectural skeletons usually refer to stable manmade structures like walls, columns, etc. Using architectural skeleton information in LiDAR-based initial global localization has its advantages in easier acquisition and lower cost than using the point cloud reference map. However, there are also two major difficulties in LiDAR-based initial global localization using architectural skeleton information. One is that this kind of localization method is based on the matching and registration of point cloud data and DLG (Digital Line Graphic). It belongs to the heterogeneous data registration, which is more challenging than using point cloud reference map. The other is that lots of ubiquitous architectural skeleton information has the problems such as scale inconsistency and information missing or error due to the interior decoration and reconstruction of buildings. Therefore, it is difficult for most existing matching and registration methods to use this kind of imperfect architectural skeleton information in robot localization.
In this work, a novel LiDAR-based initial global localization method using imperfect architectural skeleton information is proposed. The architectural skeleton is extracted as line pairs feature patterns for the heterogeneous data registration, with line pairs matrix descriptor proposed for feature patterns matching and error equations constructed for pose solution. Without point cloud reference map, and only using the easily available imperfect architectural skeleton information, our method can still realize high-precision LiDAR-based initial global localization, which meets the requirements of indoor mobile robotic platform.

RELATED WORK
In GNSS-denied indoor scenes, LiDAR-based initial global localization has been widely studied for the mobile robotic platform, since there are many indoor scenes with poor lighting even dark situation which are not suitable for camera-based methods. Existing indoor initial global localization methods for UGVs (unmanned ground vehicles) with LiDAR can be divided into the following two categories: Integrated with external sensors: The external sensors for indoor robot initial global localization are IMU (Inertial Measurement Unit) (Karam et al., 2021), WIFI devices, Bluetooth module and UWB (ultra-wideband) antenna (Zhou et al., 2021). However, localization based on IMU will result in large deviation accumulation and unreliable position in longterm robot localization, while other external sensors need plenty of radiation sources and beacons layout in advance. Therefore, these methods usually have higher additional cost, which are only applicated in large venues like exhibition centres and difficult to be pervasive.

Using reference information:
For most existing LiDAR-based initial global localization methods, point cloud reference map is the main reference information used in robot localization with point cloud registration. Suitable feature patterns are beneficial to registration of real scans and point cloud reference map, which is the foundation of robot localization. Since there are many feature points and lines in most indoor scenes, corners and horizontal lines are used for point cloud matching and registration, which has improved the efficiency of initial global localization (Wang et al., 2018). Meanwhile, walls and columns are extracted as simplified structures from real scans and point cloud reference map, which is convenient for point cloud registration and LiDAR-based initial global localization (Shi et al., 2020). These mentioned LiDAR-based initial global localization methods using point cloud reference map all need robot mapping process in advance. Since unfamiliar indoor scenes usually without high-precision point cloud reference map, there are plenty of investigations using architectural information for initial global localization. Most researches have been done on indoor robot localization using monocular camera (Hile and Borriello, 2008), depth camera (Ito et al., 2014) and other optical cameras. As LiDAR becomes more widely used by indoor mobile robot, a novel LiDAR-based localization method is proposed (Boniardi et al., 2017), which has realized the registration of real scans and architectural floor plans. But it can only suit for the perfect architectural information which is accurate, complete and many just in the form of floor plans. However, the easily available architectural information of most buildings is usually imperfect as incomplete, inaccurate and cannot meet the demand of existing methods.
In summary, in the indoor scenes without GNSS signal and any indoor radiation sources, LiDAR-based methods are widely used for robot initial global localization. Most existing methods either rely on point cloud reference map built in advance, or need accurate and perfect architectural information as reference.
In this work, we use the imperfect architectural skeleton information, which is easily available, as reference for LiDARbased initial global localization.

METHODOLOGY
The process of most existing LiDAR-based initial global localization is to build the point cloud reference map, then estimate the pose by point cloud registration. On this basis, the steps of our method using imperfect architectural skeleton information are as follows: a) Architectural information processing; b) Real scans processing; c) Pose solution based on feature registration.

Architectural information processing
The common forms of architectural information can be divided into the following two types of files: a) model files. CAD, BIM and others in standard formats; b) blueprint files. Blueprints with specific instructions such as safety evacuation map, shopping guide map and others in raster form. Different methods are needed for architectural skeleton extraction from different forms of architectural information, as shown in Fig. 1. Since the model files record the architectural information with its accurate coordinates and geometric attribute values, architectural skeletons can be easily extracted by geometric analysis. The blueprint files are usually in the form of raster images, so that digital image processing algorithms are used for architectural skeleton extraction, such as edge detection and key point extraction.
In order to simplify the extracted architectural skeletons and improve the efficiency of initial global localization, a novel architectural information management scheme is designed. Since architectural skeletons are simple and regular in most indoor scenes, line features are usually used for their representation. In this work, the line constraint file is used to describe the architectural skeleton information as shown in Tab. 1, where the number of lines and the geometry attribution values of each line (such as slope, intercept, coordinate of start point and end point) are included. Meanwhile, as shown in Fig.  2, with a certain grid resolution as the line width, the architectural grid map is generated from different formats of the architectural information for the visualization of initial global localization and the final optimal pose solution.

Real scans processing
For the architectural skeleton extraction from real scans, a set of point cloud feature patterns for indoor simplified structures have been proposed (Ye et al., 2020), which include lines and circles features. Due to the symmetrical layout and regular geometric shape of architectural skeleton, three kinds of line pairs feature patterns are designed in our method for the description of architectural skeletons in the indoor scenes, including parallel line pairs (P-LP), vertical line pairs (V-LP) and collinear line pairs (C-LP), as shown in Fig. 3. Three steps of line pairs feature patterns extraction from real scans are as follows: Point cloud pre-processing: For the original point cloud data, firstly, the statistical filter with a local density threshold is used to filter out discrete points and some small targets which interference with the architectural skeleton extraction and matching. Then, with singular value decomposition (SVD) algorithm, three main orthogonal directions of real scan are adopted as the corrected coordinate axises ( ,, x y z ). Finally, 3D real scans are projected onto the horizontal plane ( xoy ).
Line pairs extraction: RANSAC is used for line extraction from projected point cloud. Since existing line features extraction methods can only be suitable for rectifying structures, to ensure the extraction ability for some curve architectural skeletons, we propose a novel line pairs extraction method with line segmentation and merging. Long lines need segmented into limited-length lines to ensure the ability for fitting curve structures, and the linear equations are calculated by the least square algorithm. However, the line segmentation process above may lead to segmentation errors that the same line splits into several pieces. Therefore, with the slope difference threshold and the distance threshold of adjacent points set, the line pairs which are approximately parallel and close to each other are merged into the same line. Meanwhile, with this merging process, it can also solve the above problem that there are line segments caused by information missing or error in imperfect architectural skeletons and connect them. According to the angle and distance of line pairs, the feature patterns are extracted as P-LP, V-LP and C-LP.

Feature patterns description:
For the effective description of line pairs feature patterns, a symmetric line pairs matrix descriptor is proposed. In our matrix descriptor, non-diagonal elements reflect the line pairs feature patterns corresponding to their rows and columns, including 0, -1 and positive elements. As shown in Fig. 4, element 0 reflects C-LP, element -1 reflects V-LP and the positive elements reflect P-LP with its value the distance between parallel ones. Meanwhile, for the scale inconsistency, there are different scales between imperfect architectural information and real scans. To solve this problem, the positive elements in our matrix descriptor are normalized by equation (1), which is conducive to the fast and accurate registration of heterogeneous data with different scales. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2022

Pose solution
Since most of the indoor scenes are nearly flat and sensors are tightly set on the ground mobile platform, the global pose of UGVs can be simplified from 6 DOF (Degree of Freedom) to 3 DOF, the 2D position ( , xy) and the orientation (  ) included.
With the registration of architectural information and real scans, main steps of the pose solution are as follows: Feature patterns matching: In section 3.2, the line pairs matrix descriptors are generated from the architectural skeleton of both architectural information and real scans. On this basis, the feature patterns matching process is to select the submatrixes from the matrix descriptor of architectural information which seem to be similar with the matrix descriptor of real scans. With the same matrix size of the descriptor from real scans, potential matching sub-matrixes are selected by the distance between two matrixes which represents the matrix similarity. The Manhattan distance between matrixes is calculated by equation (2). The feature patterns are initially matched when the Manhattan distance is lower than the set threshold. As shown in Fig. 5, more than one potential matching results are obtained, since there are a large number of repeated scenes and imperfect architectural skeleton information may be incomplete. Among the matching feature patterns, the optimal pose can be obtained by following pose solution process.

Pose parameters estimation:
In the potential matched feature patterns, the relation between points from real scans and lines from architectural information has been established. The error equations of the distance between points and lines are constructed as the equation (3), and the pose parameters are calculated iteratively using least square algorithm. To improve the efficiency of error equation solution, the gradient descent method is used with an appropriate step length, which can also optimise the results of pose solution. Since architectural skeletons are usually symmetric in most indoor scenes, the orientation of mobile robotic platform is easy to misinterpret into the opposite direction. It leads to the fact that most of the existing LiDAR-based robot localization methods cannot obtain high-precision complete pose, only the robot positioning results, but the lack of orientation results. The covariance matrix is calculated to evaluate the uncertainty of three pose parameters. When the covariance of the orientation is much lower, the initial orientation needs updated into the opposite direction and estimate the pose parameters again.
Optimal pose acquisition: Using the imperfect skeleton information as reference, it is difficult for existing localization methods based on registration to acquire the only optimal pose among potential matching results. To solve this problem and get the optimal pose solution results, the hit ratio of point clouds is used in this work. The hit ratio we used here is to describe the points of real scans on the architectural grid map generated in section 3.1, where points are only from the feature patterns and the map is in the form of raster. The pose with higher hit ratio means the better results obtained in the registration of both heterogeneous data and the more authentic pose of initial global localization. A threshold like 90% has been set to determine whether the initial global localization succeeds. Among all the cases of successful localization initialization, the pose parameters with the highest hit ratio are selected as the final results of initial global localization. Figure 6. Experimental platform with a 16 beam LiDAR.

Experimental setup
Since there is usually no architectural information in public indoor point clouds benchmarks, we collect the experimental data in typical indoor scenes. As shown in Fig. 6, we use the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France mobile robotic platform with a 16 beam LiDAR for our experimental point cloud data collection. Meanwhile, we collect the imperfect architectural information of our experimental scenes, with sources like CAD and safety evacuation map. A laptop with Intel Core i7-10750H CPU @ 2.60 GHz 5 GHz and a 16 GB of RAM has been applied in our experiment.

Dataset 1-office building:
The office building is a common scene for indoor robots. As shown in Fig. 7, the first floor covers an area of about 600 square meters and about 8000 scans of point cloud are collected. There are rooms and long corridors in this scene, including rich architectural skeleton information such as walls, doors and windows. Long corridors are challenging for initial global localization. Meanwhile, there are also many interference targets, such as pedestrians, potted plants and furniture. The architectural skeleton information of this scene is extracted from the safety evacuation map of this building.

Dataset 2-underground parking lots:
As the typical large-scale indoor scene, underground parking lots is the focused issue for indoor robot mapping and navigation. As shown in Fig. 8, the underground parking lots covers an area of about 3000 square meters and about 10000 scans are collected. In this dataset, there are not only a large number of architectural skeletons like walls and square columns, but plenty of interference targets like cars and indicators. Meanwhile, the architectural skeleton information of this scene is provided from the floor plans.

Results
Limited by the paper length, only two real scans are sampled in each dataset for the presentation of initial global localization results (the 4500 th and 7402 nd scan sampled from dataset 1, the 207 th and 1300 th scan sampled from dataset 2). Fig. 9 shows the process of initial global localization from dataset 1, where (a) and (b) show the two sampled real scans, (c) and (d) show the extracted feature patterns from the sampled real scans, (e) shows the feature patterns registration results and (f) shows the point clouds located in the architectural information. The same process of initial global localization of two sampled real scans from dataset 2 is shown in Fig. 10. A green circle with a red arrow is used to show the optimal pose of real scans, in Fig.9 (e) and Fig. 10 (e).
The experimental results show that using the architectural skeleton registration as the common feature is conducive to the initial global localization of real scans. In the scenes with lots non-structure objects (Fang et al., 2021) or newbuilt stable structures not included in existing imperfect architectural skeleton information, our method can still obtain accurate localization results, as there are many cars in Fig. 10 (f) and the newbuilt stairs and stake in Fig.9 (f).

Precision analysis
One of the classic LiDAR-based SLAM algorithms-LOAM (Lidar Odometry and Mapping) (Zhang and Singh, 2017) provides the true pose value for precision evaluation in our experiment. LOAM is currently the top three SLAM algorithm on KITTI benchmarks, with high-precision pose solution results.
To obtain more accurate true values, an IMU is also installed on the mobile platform for precision evaluation in our experiment. And in order to ensure the accuracy of IMU positioning, the long-term collection of experimental data is avoided in our experiment.
5000 scans are selected from our datasets for precision evaluation, including independent rooms from dataset 1 (855 scans), corridors from both datasets (1534 scans) and underground parking lots from dataset 2 (2611 scans). The success rate of initial global localization, the RMSE (Root Mean Square Error) of localization (position and orientation) and the running time of our algorithm are counted, as shown in Tab Table 2. Precision evaluation of initial global localization.
From the statistics, the success rate basically reaches 80%, the precision of initial global localization is about 0.10 m and 1.00 °, meanwhile, the running time of our algorithm is with 400 ms per 1000 scans. In the corridor scenes, the precision is lower than others but the success rate of initial global localization is the highest. And in the underground parking lots scenes, the success rate of initial global localization is lower than another two scenes, however, it shows the highest precision of the position and orientation. The reason of low success rate in underground parking lots scenes is that many architectural skeletons away from the platform are involved in the localization process, which are inaccurate and unreliable. Since there is difference between the number of real scans in the corridor scenes and underground parking lots scenes, 1000 scans are sampled randomly from each scene in order to explore the influence on the precision by the scans number. As shown in Fig. 11, each point represents the error of initial global localization and the green line shows the mean precision of positioning. Figure 11. The position error statistics of two scenes with the same scans number.
With the same number of scans, the precision of initial global localization in underground parking lots scene is still significantly higher than corridor scene. Meanwhile, results from Fig. 11 are similar to the overall experimental results from Tab. 2, which means the difference on pose precision is not caused by the scans number. Also, it can be seen in Fig. 11 that the distribution of position error shows more concentrated in underground parking lots scene than corridors, and there are much more big errors in the corridor scene.
Besides the scans number, there are significant differences with types and numbers of feature patterns in different scenes. Different from above experiments, to distinguish contributions of different feature patterns to initial global localization, we only use each single feature pattern (P-LP, V-LP or C-LP) to localize. The number of feature patterns, the success rate and the precision of position for initial global localization using single feature pattern are counted in Fig.12 and Fig. 13.  Compared with corridors, there are much more feature patterns in underground parking lots, especially plenty of P-LP and V-LP structures. For the success rate of initial global localization, P-LP shows the lowest rate with only 76% and 64% in both scenes, while V-LP and C-LP show higher success rate. For the precision of initial global localization, there are less position errors in V-LP and C-LP feature patterns, especially V-LP whose position errors are only 0.08 m and 0.03 m in two scenes.
However, P-LP shows the lower position precision, with 0.21 m mean error in corridor scene. The reason for the difference in accuracy is that, compared with P-LP and C-LP only provides constraint in one parallel direction, a single V-LP feature pattern can provide constraints in both orthogonal directions, so it shows the largest contribution in high-precision localization.
For the further analysis with the orientation of the architectural skeleton in experiment scenes, the reason that the localization precision of C-LP is slightly higher than P-LP can be that there are different directions of C-LP in the same scene, while PL has only a few fixed directions.
In summary, the above experiments confirm that our method can obtain high-precision LiDAR-based initial global localization results, even using imperfect architectural skeleton information. Through experimental analysis, we can draw the following conclusions:  Experimental results show that, with our method, the success rate of initial global localization reaches 80%, the average pose error is about 0.10m and 1.00 °, meanwhile, the running time is about 400ms per 1000 scans, which meet the requirements of indoor autonomous driving.
 The number of feature patterns are the critical factors for initial global localization in our method. It can reach higher success rate and position accuracy in the scenes with more architectural skeletons, so that our method is suitable for application in large-scale indoor scenes with rich architectural skeleton information.
 Among line pairs feature patterns extracted in our method, V-LP shows the highest localization precision, C-LP second and P-LP lowest. Meanwhile, V-LP and C-LP shows the higher success rate of localization initialization, while the success rate of P-LP is lower. Therefore, for indoor initial global localization, more V-LP feature patterns can improve the localization precision and success rate.

Discussions
Compared with existing LiDAR-based initial global localization method, the advantages of our method are as follows: No GNSS required. In GNSS-denied indoor scenes, our method can also obtain the high-precision initial global localization results.
No radiation source and beacon required. Compared with localization based on Bluetooth, UWB and WIFI, our method can decrease the arrangement workload and cost of these radiation sources and beacons.
No appropriate light required. Since there are many indoor scenes with poor illumination, localization using vision sensors like cameras in these scenes has poor effect. But our method is suitable for these scenes with weak illumination even darkness.

No point cloud reference map required. Our method avoids
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2022 the dilemma that existing LiDAR-based initial global localization methods are based on point cloud reference map, while mapping needs the global localization which is usually provided by initial global localization. Using architectural skeleton as reference makes our method more suitable for initial global localization in many large-sale indoor scenes.

CONCLUSIONS
In this work, a novel LiDAR-based initial global localization method using imperfect architectural skeleton information has been proposed, providing the global position for perception and mapping of mobile robotic platform. An effective management scheme is proposed for architectural information from various sources. Since the foundation of our localization method is heterogeneous data registration, line pairs feature patterns and the line pairs matrix descriptor are designed for fast matching and registration. To solve the problems caused by using imperfect architectural information, the highest hit ratio on the architectural grid map is used to obtain the optimal localization results among potential matching feature patterns, and the iterative solution of updating the Initial orientation is propitious to overcome the orientation unreliability in symmetrical scenes. Experimental results show that our method can effectively obtain high-precision pose, even in some challenging scenes such as corridors and underground parking lots.
Our future work will enrich the types of feature patterns for the application in some complex indoor scenes like stadiums and exhibition halls. Meanwhile, we will expand the application of our method into some outdoor scenes with poor GNSS signals, which are convenient for the seamless switching of indoor and outdoor localization and the modelling above LOD4 (Level of Detail).