A BOUNDARY-ENHANCED SUPERVOXEL METHOD FOR EXTRACTION OF ROAD EDGES IN MLS POINT CLOUDS

Road extraction plays a significant role in production of high definition maps (HD maps). This paper presents a novel boundaryenhanced supervoxel segmentation method for extracting road edge contours from MLS point clouds. The proposed method first leverages normal feature judgment to obtain 3D point clouds global geometric information, then clusters points according to an existing method with global geometric information to enhance the boundaries. Finally, it utilizes the neighbor spatial distance metric to extract the contours and drop out existing outliers. The proposed method is tested on two datasets acquired by a RIEGL VMX-450 MLS system that contain the major point cloud scenes with different types of road boundaries. The experimental results demonstrate that the proposed method provides a promising solution for extracting contours efficiently and completely. Results show that the precision values are 1.5 times higher and approximately equal than the other two existing methods when the recall value is 0 for both tested two road datasets.


INTRODUCTION
Road transportation plays an important role in the modern intelligent society. High-precision maps and autonomous driving systems require more precise road information. Mobile Laser Scanning (MLS) systems that can produce precise information from 3D point clouds, have become the mainstream technology in 3D computer vision. Therefore, MLS has been successfully applied to extensive road research like road surface and marking extraction, and road pavement cracks detection. Many researchers have been concentrated for the extraction of road contours and corresponding subsequent applications in road environments.
State-of-the-art approaches for extracting road boundaries and contours usually detect road curb and then extract the road lines.
Considering that the researches about extracting 2D and 3D features in point cloud scenes, there were a variety of works related to them. (Weinmann et al., 2014) presents a novel and automatic algorithm with four steps so as to extract various features and classify the whole point scene. The algorithm framework obtained features based on the eigenvalue composition and select the features based on the Shannon entropy, then the authors utilized the Gaussian maximum-likelihood and Random Forest method to classify. The algorithm in (Weinmann et al., 2015) proposed a new algorithm using neighbourhood information and geometric features for the 3D point cloud classification test with minimizing the measure of eigenentropy and measure of symmetrical uncertainty. In (Niemeyer et al., 2015), Conditional Random Field method was applied to classify * Corresponding author the contexts. This methodology used the local neighbourhood information and construct the segments for further classifying. (Landrieu et al., 2017) constructed a framework to smooth semantic labels in 3D point clouds for the classification of points according to graph-structured optimization methods. Instead of different feature selection and classification methods based on neighbourhood information, line based scene representation and road line detection were cases in point. In , the authors built a framework for the sake of semantic classification and extracting contours on a large-scale point datasets. The approach were able to process large unstructured data and made use of machine learning methods to detect contours effectively. The algorithm in (Chen et al., 2016) presented the Lidar-histogram method to detect the roads, obstacles and water hazards. The method projected the 3D road plane as a straight line segment with obstacles near the segment and convert the 3D detection problems to line classification issue. The authors in Gu et al. (2018) fused the geometric knowledge and monocular camera images to detect roads on 3D points. The algorithm utilized the histogram and scanning method to estimate and improve the detection results. In (Jung et al., 2015), points were classified into two kinds of regions and detected the road lines based on an expectation-maximization method.
Furthermore, the contour extraction methods based on supervoxels were also employed. The authors in Yang et al. (2015) used a multi-scale supervoxel with color and intensity information, and merged segments into meaningful regions. After obtaining semantic information, roads were extracted in a given order by moving the window operator to detect curb points.  In (Guan et al., 2015), GeoReferenced Feature (GRF) images were used to segment road surface points. Trajectory data were used to divide the points into a lot of blocks as assistance and to detect the curb points from blocks. The algorithm in (Hackel et al., 2016) proposed a method to detect the contours automatically with two steps: predicting each representative points using a set of the points' neighborhood features with a binary classifier, and selecting an optimal set with a high-order Markov Random Field (MRF). In (Lin et al., 2017) data points were segmented into a number of facets by local k means clustering. Then the algorithm used an improved α-shape method to extract the boundary points and group lines with both proposed line-group and region to cylinder algorithms. The authors in (Zai et al., 2018) improved the facet segmentation method and extracted the road boundaries by using supervoxels, smoothness of plane, 3D α-shape algorithm and undirected graph energy minimization.  presented a multi-feature algorithm to extract the road edges based on LiDAR data.
It is difficult to use the supervoxel methods to extract the road contours without facets (Lin et al., 2017) or points attributes (colors, intensities and so on) (Yang et al., 2015). Facet segmentation method considered the smoothness of each point and it would extract redundant contour points without global information of scenes. If the contexts of points lacked enough other information (like colors and intensities) except the location, it was difficult for us to utilize the point attributes method for the subsequent processing. Furthermore, using supervoxel to extract contours, very few methods are available in existing literature. Hence, it is urgently needed to develop algorithms that can greatly enhance the road boundaries and subsequently extract the lines as boundaries of supervoxels. Inspired by the methods in (Zai et al., 2018) and , we present a novel algorithm to extract the road contours efficiently without generation of facets to take extra contour points and just making use of location information to process. The major contributions of our proposed algorithm are that it can be applied in laser scanning point clouds in road environment, and greatly enhance the boundary for extraction of road contours via a supervoxel segmentation method without generating any facets or points attributes, and without any trajectory data. Our proposed method first generates normal vectors of points by employing the wellknown iterative weighted least square method (Zai et al., 2018).
Using geometric information, we improve the Lin's method  to cluster the points when merging the point label into its neighbor representative points, which is significantly different from the original method. We obtain the boundaries of supervoxels that is covered by the road boundaries due to the consideration of global spatial structures. Finally, we use a spatial distance algorithm to extract the boundary points and drop out the existing outliers.
The remainder of this paper is organized as follows. Section 2 describes our method in three steps. Section 3 demonstrates a couple of experiments and evaluates the performance of the proposed method. Section 4 concludes the paper.

METHOD
The goal of our method is to extract the boundary contours from 3D mobile laser scanning roads and filter out existing outliers. The proposed method consists of three parts: points normal generation, points clustering with geometric information judgment and boundary contour extraction. Figure 1 shows the flowchart of our proposed method.

Normal vector generation
Normal vectors play an essential role in point clouds processing. In this paper, we use the tangent planes to generate normal vectors for further clustering points (Yang et al., 2015), called global spatial judgement. For each tangent plane of point , we use the three eigenvectors 1 ⃗⃗⃗⃗ , 2 ⃗⃗⃗⃗ and 3 ⃗⃗⃗⃗ corresponding to the three eigenvalues 1 , 2 and 3 to generate the normal vector as: where × denotes the cross product operation. Then we use the spatial judgement to cluster points by setting thresholds (Qiu et al., 2016): where ℎ is the height of . is the angle between ⃗ ( ) and (0,0,1). and are height threshold value and low threshold value, respectively. The angle threshold 0 is set to 10 o as empirically.

Point clustering
In this paper, we improve the Lin's  supervoxel segmentation method to cluster points. The supervoxel segmentation method is considered as a subset selection issue (Elhamifar et al., 2016) with no initialization of seed points, and the theoretical time complexity is optimized to the logarithmic time. In addition, the performance of this clustering method is better than traditional supervoxel segmentation methods with small supervoxel resolutions. In order to cluster the points near road edges, we use the global spatial structures from Section 2.1 to merge points into their neighbor region.
For our proposed improved supervoxel method, we adopt the same measure metric ( , ) to cluster points between two points m and n: where the ⃗ and ⃗ are the normal vectors of points m and n, respectively. |•| denotes the inner product operation. ‖•‖ denotes the Euclidean distance between two points. R is the supervoxel resolution. The measure metric only considers the geometric and local information between two 3D points so it can be better used in no color scenes (Lin et al., 2017).

Boundary contour extraction
In order to greatly enhance the point cloud scene boundary contours, we design a boundary contour extraction algorithm to extract the contours in 3D supervoxel scenes by making full use of the information of supervoxel labels. For generated supervoxels, each point in P (point cloud) has a corresponding label. Meanwhile, each supervoxel range (with several points) has a particular representative point with the same label. Hence, the number of representative point is same with the number of supervoxel. Our contour extraction method contains two steps: (i) extracting the contours of dense supervoxels, and (ii) filtering out outliers (if they exist) and visualize the results of coarse extraction (Nurunnabi et al., 2015). First, we obtain the label of neighbor points for each point based on k Nearest Neighbor (kNN) search in order to drop out large scale pieces and remain the small scale pieces for the coarse purpose. As shown in Figure 2, after utilizing global spatial judgement in Section 2.1, different labelled points in generated supervoxels are represented by different representative points that are denser than other regions for near boundary outline regions. We calculate the differences (we can define as the label distance) between neighbor points' different labels and judge whether the numbers value of the differences neighbor points labels is greater than m (we set m = 3) for a given k. Next step we want to extract the contours precisely and further filter out useless points. Then given a fixed radius r, we use the Radius Nearest Neighbor (RNN) search to estimate the size of neighbor points about each point. If the size of neighbor points is greater than n, we consider the point as an extracted point.

RESULTS AND DISCUSSION
Our experiments include two public outdoor 3D point cloud dataset benchmark and two test datasets were acquired using the RIEGL VMX-450 mobile mapping system.
The experiments are performed in three ways: 1) comparison results of using different boundary-enhanced methods on two public benchmarks and MLS data; 2) using different supervoxel resolutions, simply written as R, based on our proposed boundary-enhanced method on MLS data; 3) using other contour extraction methods to compare with our method on MLS data and evolution by the Precision and Recall curve. Our methods are implemented in C++ and the experiments are conducted on a PC with Ubuntu 18.04, Inter Core (TM) i5-3470 3.2GHz CPU and 16.0 GB memory.

Test dataset
In our experiments, we utilize the two public outdoor 3D point cloud dataset benchmarks to visualize the boundary-enhanced results compared with any other existing supervoxel methods and two MLS datasets are used to measure the Precision-Recall (PR) curves to prove our method effectively and accurately.  (Jeremie et al., 2013) 105 VCCS-kNN  193 Lin  193 Zai (Zai et al., 2018) 5748 Proposed 9676 Table 2. The supervoxel information on IQTM dataset.

Method
The number of supervoxel VCCS (Jeremie et al., 2013) 43 VCCS-kNN  80 Lin  80 Zai (Zai et al., 2018) 16512 Proposed 80598 Table 3. The supervoxel information on Semantic3D dataset. Figure 3, our experiments include two public outdoor 3D point cloud dataset benchmarks, IQmulus & TerraMobilita (IQTM) (Cassette) dataset benchmark (Vallet et al., 2015) and Semantic 3D (Station) dataset benchmark . IQTM dataset is an urban point cloud analysis benchmark. It has 12 millions labelled points manually in Paris with 200m street. Semantic 3D dataset is a large-scale point cloud classification benchmark. It has 15 manual labelled points. The information of our tested scenes are described in Table 1. The ̅ is the average resolution of each scene . It can be calculated as the average distance between two arbitrary adjacent points. The information of ̅ on each dataset is shown in Table 1 in detail. Our proposed supervoxel method in Section 2.2 has five parameters: supervoxel resolution, seed resolution, LT and HT and k. In order to cluster points based on edges, we set the supervoxel resolution and seed resolution as the same value R. LT and HT, which are used to limit the range of cluster, are defined in Section 2.1, respectively. k represent the number of the neighbourhood points for each point by k nearest neighbourhood search method. The detail was discussed in (Lin et al., 2017). Due to the lack of our computer memory, we down-sampled the dataset.

As shown in
Since the public benchmarks have existing labels, we are able to better show the results of boundary enhancement by setting the generated boundaries to black lines, which is compared with ground-truth scenes (Figures 4 and 5).
Subsequently, we down-sample the Ring Island Road datasets acquired by a RIEGL VMX-450 system, which includes two fullview RIEGL VQ-450 laser scanners, an inertial measurement unit (IMU), a global positioning system (GPS), and a distance measurement indicator (DMI) (Zai et al., 2018). And then we manually select the boundaries of roads as ground-truth. (Figure  6 (b) and Figure 7 (b)).
The supervoxel information is shown in Table 2 and Table 3 when the resolution of supervoxel R is set as 5m. Our goal is to generate more supervoxels and make the boundaries of supervoxels become the road boundaries as much as possible. As shown in Table 2 and Table 3, the number of supervoxel of our proposed method is largest. Meanwhile, global spatial judgement The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2020, 2020 XXIV ISPRS Congress (2020 edition) is used in our proposed supervoxel in Section 2.1. During clustering, more supervoxel boundaries exist near the road edges. So it is the more accuracy than any other mentioned supervoxel methods and it is significant to extract the contours in next step. The pictorial presentation of the results from ground-truth, VCCS (Jeremie et al., 2013), VCCS-kNN ), Lin's method ), Zai's method (facet segmentation) (Zai et al., 2018) and proposed method on IQTM dataset are shown in Figure 4. Visual representation of ground-truth, VCCS, VCCS-kNN, Lin's method, Zai's method (facet segmentation) and proposed method on Semantic 3D dataset are in Figure 5. As shown in Figure 5, Lin's method have no correlation to the ground-truth, this method is not effective due to the high resolution setting.
Then we measure how well the extracted points can match the ground-truth points by calculating the precision-recall (PR) curves (Arbelaez et al., 2011). The points we can obtain are divided into three different parts. First, a point belongs to both the ground-truth point sets and the extract point sets. Second, a point belongs to both ground-truth point sets and is not in the extract point sets. Third, a point belongs to the extract point sets and is not in the ground-truth point sets. Hence, the definition of precision and recall is different from the Machine Learning (ML) theory (Luo et al., 2019). Precision is denoted as the rate that the true positive (TP) points belong to the ground-truth points. Recall is the rate that the TP points belong to the extracted points. We use a distance threshold d (at different times of the average resolutions of points ̅ ) to determine whether a point is belong to the extracted regions and produce the PR curves. In order to simplify the problems, we manually remove buildings, trees and all kinds of any other barriers from the scene and voxelize the MLS road scenes (Figure 6 (b) and Figure 7 (b)).

Experiment in different supervoxel methods
According to the results we got, our precision and recall values are positively correlated. The ideal curve should be both high precision and recall in the case of efficient and complete extraction. The parameters (HT and LT) in Eq.2 are set as the lowest z-axis values and the highest z-axis values after manually cutting (Figure 6 (b) and Figure 7 (b)), respectively. Figure 8 gives information about the PR curves based on five different supervoxel method on two MLS road datasets. As shown in Figure 8, our proposed supervoxel method can greatly enhance the boundaries of roads compared with the VCCS, VCCS-kNN, Lin's method, Zai's method (facet segmentation) since the PR curves of our proposed method in Figure 8

Experiment in different supervoxel resolutions
In this section, we set five kinds of supervoxel resolutions (according to the Eq.3) (R=1, R=1.5, R=2, R=2.5, R=3, respectively). Figure 9 illustrates the PR curves based on five different supervoxel resolutions on two MLS road datasets. The plots reveal the effectiveness of our proposed boundaryenhanced supervoxel method on different supervoxel resolutions. As shown in Figure 9 (a) and (b), we can observe that the PR curve is high with the increase of supervoxel resolution. But the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2020, 2020 XXIV ISPRS Congress (2020 edition) exception is when R=2.5 and R=3. When the recall value is 1 on road 1, the precision value when R=1 is 0.2 lower than the situation when R=3. Meanwhile, the precision value in R=3 is 0.1 higher than that in R=1 on road 2. The reason is that the points are clustered near the boundaries instead of any other regions with the supervoxel resolutions increasing due to our proposed global spatial structures in Section 2.1. So higher resolution solely based on our proposed supervoxel method can reach better results.

Contour-based road extraction
As shown in Figure 10, visual represents of the results are shown.
In this experiment, we set the parameters k = 20 in step one and r = 0.5, n = 6 in step two in Section 2.3, respectively. In Figure  10 (b), we observe that there exist some outliers in on-ground surface. Figure 10 (c) shows that the outliers are filtered out through our proposed method. Furthermore, we use the proposed method to compare with a region growing method in (Zai et al., 2016 andChauve et al., 2010) and a facet segmentation method in (Lin et al., 2017). We observe that the road edge contours extracted by region growing method are incomplete in Figure 7 (c). As shown in Figure 6 (d) and Figure 7 (d), the existing facet segmentation method further extract the contours due to the small smoothness values of edge points, so this method misjudges the extracted points. Whereas, the proposed method efficiently and completely extracts the road contours in Figure 6 (e) and Figure 7 (e). Figure 9 presents information about the PR curves based on five different supervoxel resolutions on two MLS road datasets. The plots represent the effectiveness of our proposed whole method compared with the region growing method and the facet segmentation method. As shown in Figure 11, the Precision-Recall curves have demonstrated that our approach achieves consistent and promising performance compared to the two baselines. Our proposed method can reach the same level with the region growing and 1.5 times higher than the facet segmentation method when extracting the contours.

CONCLUSION
In this paper, we have presented a boundary-enhanced supervoxel method to extract the road boundaries from MLS point clouds. After obtaining normal vectors for global information, we improved Lin's  method to cluster points. Then we have designed a boundary contour extraction algorithm to further analyze the results of extracted contours. The experimental results obtained using two MLS point cloud datasets demonstrate that the precision values of our method are 1.5 times higher than an existing region growing method when the recall value is 0 in Section 3.4. At the same time, the precision values of proposed method are higher than region growing method and approximately equal compared to the facet segmentation method during all different recall values. Meanwhile, the PR curves are higher and higher with the supervoxel resolutions increasing. There are also some drawbacks in our method, our proposed method can use the global spatial judgement in Section 2.1 for the reason that the roads we used have some small vertical edges ( Figure 6 and Figure 7). But without these such borders, we would not utilize our proposed method. Our future work will focus on extraction of road boundaries in more complex roadway environments with buildings, trees and all kinds of any other barriers.