ROBUST APPROACH FOR URBAN ROAD SURFACE EXTRACTION USING MOBILE LASER SCANNING 3D POINT CLOUDS

Road surface extraction is crucial for 3D city analysis. Mobile laser scanning (MLS) is the most appropriate data acquisition system for the road environment because of its efficient vehicle-based on-road scanning opportunity. Many methods are available for road pavement, curb and roadside way extraction. Most of them use classical approaches that do not mitigate problems caused by the presence of noise and outliers. In practice, however, laser scanning point clouds are not free from noise and outliers, and it is apparent that the presence of a very small portion of outliers and noise can produce unreliable and non-robust results. A road surface usually consists of three key parts: road pavement, curb and roadside way. This paper investigates the problem of road surface extraction in the presence of noise and outliers, and proposes a robust algorithm for road pavement, curb, road divider/islands, and roadside way extraction using MLS point clouds. The proposed algorithm employs robust statistical approaches to remove the consequences of the presence of noise and outliers. It consists of five sequential steps for road ground and non-ground surface separation, and road related components determination. Demonstration on two different MLS data sets shows that the new algorithm is efficient for road surface extraction and for classifying road pavement, curb, road divider/island and roadside way. The success can be rated in one experiment in this paper, where we extract curb points; the results achieve 97.28%, 100% and 0.986 of precision, recall and Matthews correlation coefficient, respectively.


INTRODUCTION
Understanding details and getting precise information about a road surface and its attached components (e.g., road curb, divider and roadside way) is a fundamental task of many applications of 3D (three-dimensional) city analysis including urban path planning, corridor mapping, accessibility diagnosis, road asset management, monitoring and reconstruction, autonomous vehicle navigation, mobile robotics, intelligent transportation, and overall, for ensuring road safety Wen et al., 2019;Feng et al., 2021;Xu et al., 2021;Zhao et al., 2021;Sholevar et al., 2022).
Road surface extraction and classification of related components has been under research for a long time (Auclair-Fortier et al., 2001). In recent years, LiDAR (Light Detection and Ranging) based laser scanning systems have been recognized as an efficient and cost-effective means of acquiring precise georeferenced point sets (point clouds) of our environment. Roads can be sampled at centimetre level using a mobile laser scanning (MLS) system. MLS is the most successful system for road corridor mapping due to its fast, accurate, continuous and cost-effective 3D data collection capability providing a large amount of point cloud data. Unlike images, laser scanning point clouds offer detailed 3D geometry including distance information of objects. But, analysis of point clouds is not very easy as point clouds are usually unstructured, have inhomogeneous point density and irregular data format, and are typically capture sharp features (e.g., edges and corners) and arbitrary surface shapes (Nurunnabi et al., 2022). Additionally, point clouds are not free from the presence of noise, outliers and occlusions. Nurunnabi et al. (2014Nurunnabi et al. ( , 2015 showed that even the presence of a very small portion of noise and outliers may produce non-robust and misleading results in point cloud processing. The objective of this paper, i.e., extraction of road surface and its related components (road pavement, curb, divider and roadside way) is highly challenging, because in addition to the above complexities, more difficulties occur, e.g., roads have varying spectral features due to the various kinds of material (e.g., asphalt, cement, concrete, and gravel) used, and are regularly covered by hindrances (e.g., vehicles, bush and dense vegetation) present in the scenes. Moreover, these structures are intricate in nature, due to the absence of clearly defined edges, corners, while differences in widths, varying heights of curbs, and the large and inhomogeneous changes of curvature may confuse the extraction of detailed information of various road components (Cira et al., 2020;Romero et al., 2021;Zhao et al., 2021).
Many methods are available in literature for road surface and related components extraction and analysis (Kumar et al., 2013;Rodriguez-Cuenca et al., 2015;Bai et al., 2021;Zhao et al., 2021). Bottelier et al. (2005) developed an approach for ground filtering of echo sounding data. One of the main tasks of road surface extraction is road curb detection. Ibrahim and Lichti (2012) proposed a five-step curb based street surface extraction method, in which density based segmentation was used to find the ground, and a Gaussian filtering was used for curb detection. Serna and Marcotegui (2013) developed a method for curb detection by using raster images with mathematical morphology. *Corresponding author Hervieu and Soheilian (2013) introduced a method for semiautomatic road/pavement modelling and reconstruction of road and sidewalk surfaces. This method adjusts a plane to a group of points and the angular distances between normal and elevation of the points. This method is successful for straight road sections, but can fail for curved and occluded areas. Kumar et al. (2013) developed a method to extract road edges by using a snake model based on the navigation information obtained from the mapping vehicles. Yang et al. (2013) proposed a scan line partition based approach, that uses a moving window operator to filter out nonground points, and curb points are detected based on curb patterns. Rodriguez-Cuenca et al. (2015) extracted curbs using the information of MLS trajectory, working on perpendicular sections to trajectory direction. Sun et al. (2019) proposed a road boundary detection algorithm for autonomous vehicles. The algorithm is successful for curved road sections, and sections with varying road widths, but struggles at road intersections.
Most recently, machine learning (ML) and deep learning (DL) approaches have been used for road surface and related components extraction. Sholevar et al. (2022) reviewed recently introduced ML based methods for road pavement condition assessment. Van der Horst et al. (2019) used a Random Forest approach to extract the road surface in areas affected by different types of damage. Zhang et al. (2018) employed deep residual learning and U-Net to develop ResUnet for road extraction by semantic segmentation that replaces the convolutional blocks with residual modules. Liu et al. (2018) proposed the RoadNet framework that composed of a modified version of the VGGNet (Simonyan and Zisserman, 2015) architecture, which is a standard Convolutional Neural Network (CNN), to analyse and predict the pavement, edges, and centrelines of roads in urban environments. Wen et al. (2019) introduced a two-step DL approach for road marking extraction, classification and completion. In this method, at first, a modified U-Net model is used for road marking extraction, and then a classification framework is developed by integrating multi-scale clustering and CNN for different types of road markings classification. Cira et al. (2020) proposed a DL-based solution to extract the surface area of secondary roads at a large scale. The method is based on hybrid segmentation models trained with high-resolution remote sensing orthoimage. In addition to the above methods, recently introduced point-based DL semantic segmentation algorithms such as Qi et al. (2017), Thomas et al. (2019) and Nurunnabi et al. (2021) that have potential for road surface extraction. Bai et al. (2021) showed the potential of RandLA-Net for road type classification. Romero et al. (2021) and Sholevar et al. (2022) presented comprehensive surveys of road surface and related components extraction methods.
Most of the existing algorithms have been proposed to extract very specific information to serve individual purpose of road pavement, road curb or roadside way extractions. They did not consider the problems of the presence of noise and outliers in the data, and they are reasonably non-robust. It is recognized that in the presence of noise and outliers, efficiently and accurately extracting road components is still a challenging task (Yang et al., 2020). Hence, the main concern in this paper is to develop an algorithm which is reliable and robust in the presence of noise and outliers. The proposed algorithm performs mainly in two parts; at first, separation of ground surface points from the nonground objects surfaces has been implemented by using a robust statistical filtering approach (Nurunnabi et al., 2016), which effectively removes non-ground points in the presence of noise and outliers. In second next part, road surface (pavement), curb, road divider/island and roadside way are extracted by developing a method using yet again robust statistical approaches. Scientific contributions of this paper are as follows. (i) This algorithm extracts road-ground surface, curb, footpath/sideway, and road island/divider, (ii) it performs well in the presence of steep slopes, sharp edges, and corners, (iii) it produces robust results in the presence of noise and outliers and (iv) it is successful for both straight and curved roads.
The rest of the paper is presented as follows. Section 2 briefly presents the basic ideas of robust locally weighted regression (RLWR), RLWR based ground surface extraction, and an existing curb extraction method that is used in this paper to compare the performance of our method. In Section 3, we propose the methodology of the new algorithm. Section 4 demonstrates the new algorithm through real-world MLS data. Section 5 concludes the paper.

RELATED PRINCIPLES AND METHODS
This section presents a brief discussion about related methods and principles that are used for the road surface extraction algorithm proposed in Section 3 and is used for comparison.

Locally Weighted Regression (LWR) and robust LWR (RLWR)
A standard regression model can be determined by fitting the following parametric function, where and are the observed values for the response and explanatory variable(s), respectively, and is the random error variable, which is assumed normally distributed with mean 0 and a variance 2 . Cleveland (1979) introduced local regression, which is a nonparametric statistical approach that has been used successfully to model regression functions between explanatory variable(s) and response variable without any prespecified functional relation between the variables. It is nonparametric as it does not define a functional form for the data set as a whole, but rather focusses locally around a point of interest. Local regression is a generalization of ordinary least squares (LS) methods for fitting smooth curves to empirical data (Jacoby, 2000). Locally Weighted Regression (LWR), also dubbed 'lowess' (LOcally WEighted Scatterplot Smoother) is a technique that determines a regression surface by fitting parametric functions locally in the space of the response variables using the well-known weighted LS principle (Rousseeuw and Leroy, 2003). LWR uses a local neighborhood ( ) of k points in x-space, and each point in the neighborhood is weighted according to its distance to the point of interest . Both linear and non-linear polynomial functions can be used to fit the model. The following 'tricube' weight function is the common choice for the LS fit within the local neighborhood.
where ( , ) is the Euclidean distance between xi and xj in xspace. Cleveland (1979) mentioned that the Tricube weight function can produce robust results in almost all situations. However, due to the use of the LS error minimization principle, LWR can be biased to noise and outliers, and may cause inaccurate non-robust estimations. A robust fitting process is necessary that guards against outlying (deviant) points that can distort the smoothed estimates based on the regular (inlying) points. The well-known Tukey's `bi-square` weight function is used to get robust LWR (RLWR). The `bi-square` weight function is defined as, , and MAD (median absolute deviation) is the median of the absolute values of the deviations | |. Finally, the estimates of the parameters of Eq. 1 minimize the Eq. 4.
The parameters from each local neighbourhood are used to estimate the fitted values for the whole data set at xi, ̂(xi). Nurunnabi et al. (2013) employed RLWR for the first time to develop an algorithm in a statistically robust way for ground surface extraction in the presence of noise and/or outliers in point clouds. The ground point filtering algorithm, first, slices the data set into manageable stripes to process, and then the results from every stripe are merged. For each stripe, the algorithm performs the same tasks on the two 2D x-z and y-z orthogonal profiles of point coordinates. The algorithm performs two main steps iteratively. In the first main step, RLWR is used to get a robust fit for the whole stripe. Robust linear fitting is used within a sufficiently small neighborhood to get an approximation of a nonlinear polynomial fit for the whole stripe. The next main step consists of four consecutive tasks: (i) calculation of residuals for the fit, (ii) classification of points into the points below or on the fitted (RLWR) line versus points above the fitted line, (iii) points above the fitted lines are down weighted using the bi-square weight function, and (iv) the new set of reweighted points is used to get the next fit. Task (i) to Task (iii) are repeated until the difference between the two consecutive fits is insignificant (i.e., less than a predefined small threshold, e.g., 0.005). The last (and the lowest) fit is considered as the ground level, and points within a band (± a threshold based on similar data) of the lowest level are considered as the ground points. Finally, the common points that are identified as the ground points from both x-z and y-z profiles are filtered as the ground points. The interested reader is suggested to consult Nurunnabi et al. (2013Nurunnabi et al. ( , 2016 for more details on robust ground surface points filtering.

RLWR based ground point filtering
In the proposed algorithm, we employ the RLWR based ground point filtering method proposed in Nurunnabi et al. (2016) to remove the non-ground points because, the RLWR based algorithm has the following benefits. (i) LWR satisfies many desirable statistical properties including its capability to adapt with bias problems at boundaries and in areas of high curvature, (ii) since significant variable point density is very usual for point cloud data, fitting within a local neighborhood enables to preserve fine point cloud details by smoothing, (iii) for areas with sharp edges and steep slopes, global parametric model fitting may lead to misclassification results while local fitting may avoid the problems of sharp edges, corners and steep slopes within a small local region. Moreover, (iv) to get rid of the influence of outliers/noise, the authors couple LWR with a regression diagnostic (i.e., assigning robust weight to the outlying points) approach or robust regression fitting (e.g., least median squares (LMS) and least trimmed squares (LTS); see Nurunnabi, 2008;Rousseeuw and Leroy, 2003) for each point with its local neighborhood.

An existing method to compare
We compare the proposed algorithm with a recently introduced method for street curb extraction in urban environments (Zhao et al., 2021). This algorithm performs in three steps.
Step 1 uses principal component analysis (PCA; Jolliffe, 2002) based multiscale dimensionality classification (Brodu and Lague, 2012) to remove vegetation (grass and trees), Step 2 does the main task of curb extraction that follows a filtering approach using three characteristics of objects' points: intensity, elevation differences between points, and slope changes between points.
To determine the intensity threshold for separating different objects such as road markings and buildings, Otsu (1979) thresholding, an image-based approach is employed because it does not require a user to define parameters for finding the threshold. It is assumed that the elevation values of road points are usually lower than the elevation of those points from buildings, trees and street lamps. Necessary thresholds for the elevation filtering have been fixed empirically based on the underlying data, study area and street conditions. The slope filtering is performed based on pseudo scan line generation and road edge detection. Finally, a curb refinement algorithm following quadratic Bezier curve spline (Bian, 2017) has been used to refine the curb edges and to fit the curb boundary line segment. The Radial Bounded Nearest Neighbor (RBNN) graph clustering (Klasing et al., 2008) algorithm is used for boundary points clustering. In this paper we did not include the last two tasks of curb line fitting and boundary line clustering as they are not necessary to serve the focus of our paper. The authors claimed that their algorithm is successful in the scenarios of vegetation covering curbs, curved curbs and occluded curbs. The reader is referred to Zhao et al. (2021) for more details on the curb extraction algorithm.

METHODOLOGY
The road ground surface and related components extraction algorithm proposed in this section, first employs a RLWR based ground surface filtering method (Nurunnabi et al., 2016) to remove non-ground points. Then road surface points are classified into pavement, curb, road divider, and roadside way. The new algorithm couples robust and diagnostic regression (Nurunnabi et al., 2008) as well as robust statistical approaches to produce robust results in the presence of noise and outlying points. It develops a 5-step process for road surface extraction (workflow is summarized in Fig. 1). It starts with slicing the whole data set into many parts (stripes) and then extracts road surfaces and its components stripewise (small portion across the road). Finally, all stripes are accumulated yet again to get full results of the given road point cloud data.

Step 1: Slicing the raw point clouds
At first, we slice the raw point cloud, a road data set into a number of road stripes with sufficient length perpendicular to the road direction, so that stripes pass the road from one side to the other (Fig. 2a). The following sequential steps (Step 2 to Step 5) are performed for all the stripes.

Step 2: Filtering ground and non-ground points
Removing non-ground objects/surface points can minimize time, cost, and labour, and increase the efficiency of the remaining works. Hence, in this step non-ground objects are separated from the ground surface (Fig. 2b). To avoid effects of the presence of noise and outliers, a RLWR based robust filtering method,  (Rousseeuw and Leroy, 2003). This process continues until reaching the lowest (i.e., ground) level (see Fig. 3 d, e; Experiment 1).

Step 3: Slicing the stripes from Step 2 into small patches
Road curbs can be geometrically defined as the road edges where the roadside suddenly is raised by a significant height from the main road surface (pavement). Since road curbs jump with a certain height within a very small area (along the road's width), the range of the z (elevation) values of the points of the road curbs will be rationally larger than the z range of the road pavement points (see Fig. 2b, c). To get the values, and to find the differences/changes within a small distance, we subdivide the stripes from Step 2 into many small patches (see Fig. 2d; portrayed in multiple colors) along the road's width (i.e., x) direction.

Step 4: Calculation of the range of the patches
We find the z values of the patches from Step 3 and calculate the values. An example of a bar diagram of the values versus x can be seen in Fig. 3h (Experiment 1). The diagrams show that most of the values (bars) are consistent and lower than some others, and very few of the bars are significantly higher. That indicates that some of the values for some of the patches are abnormal, and can be considered as outlying cases.

Step 5: Decision making to identify road surface categories (components)
We use the robust statistical principle (Hadi et al., 2009: Rousseeuw andHubert, 2018) to find the outlying cases of the values, which are treated as curb candidates (CC) and are defined in Eq. 5, where ( ) = 1.4826 * median| − median ( )|, and c (a constant) value can be fixed as 2, 3 or 4, or the user can define it based on their own data and the study area. In this paper, we use = 3. It is possible to get several CC patches, we will select the two patches as curbs that have the highest and similar length of Rz and are closer to the trajectory of the road. We define the points of the road pavement as the points for which the following two conditions are true.
i.e., mostly ≈ ( ), and (ii) points having a position between the two curbs, excluding road divider/island points.
Some of the bars based on the values can be significantly higher around the end of the roadsides, because they are wrongly filtered as ground points. This may happen in the presence of roadside non-ground objects such as bushes, trees, road furniture or buildings. We can simply eliminate them by considering their positions regarding the x values. Sometime, we can get few more bars significantly taller than the pavements height and located within the curbs that can be identified as road divider or traffic/road island. An example is given by the topmost outlying cases for , i. e., the three highest bars marked with red asterisks on the top as shown in Fig. 4h (Experiment 2). We can identify the middle one as road island/divider, and the other two as curbs. Thus, the respective patches will be classified as road curbs, road island and sideway/footpath. Points outside (i.e., on the left side of the left curb and on the right side of the right curb) of the road curbs having values almost similar to the road pavement points are points corresponding belonging to the roadside way. Significantly distant points from the road curbs can be eliminated as unwanted objects. The same above steps will be repeated for all the stripes of the full data set. Final results are found by merging all the results from different stripes as made in Step 1.

EXPERIMENTS, EVALUATION AND DISCUSSION
In this section, we demonstrate the proposed algorithm, and compare results with an existing method (Zhao et al., 2021). Two experiments are conducted on two MLS data sets. To quantify the performance of the methods, we use the following well- .
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France We consider MCC rather than the classical F1-score or overall accuracy, because MCC has some nice statistical properties, e.g., it is perfectly symmetric, i.e., every class gets equal importance.

Experiment 1.
In the first experiment, we use a data set acquired by an MLS system. This data set (Fig. 3a) covers around 53 m of road along an urban road corridor having 1,112,462 points of ground surface (road pavement, curbs, footpath/roadside ways), and non-ground objects (trees, buildings, and road furniture such as sign posts and traffic signs). We slice the data along the road length (trajectory) into 106 stripes of equal length. We perform the ground filtering algorithm (RLWR) for every stripe, results of one stripe (Fig. 3c) are shown in Fig. 3f. Now, we subdivide the stripe again into 100 smaller patches along the road width (x), results are shown in different colors in Fig. 3g. We calculate the values for all the patches and arrange them according to their x values. We draw a bar diagram based on the values in Fig.  3h. We find the median value and outlying cases for the values. Figure 3. Road surface extraction, Experiment 1: (a) road point cloud, (b) different stripes in different colors, (c) one selected stripe along the road length, (d) iterative fitting for ground filtering using RLWR on the x-z profile, (e) iterative fitting for ground filtering using RLWR on the y-z profile , (f) filtered ground points for plot (c), (g) patches along the road width, (h) bar diagram for the values for the patches of plot (g), (i) classified road surface points for plot (c), (j) ground truth curb surface, (k) curb extracted by the proposed method, (l) curb extracted by Zhao et al. (2021), and (m) classified road and non-ground surfaces for the full data set.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France Following the rules described in Section 3.5, we find the road pavement, road curb, and roadside ways/footpath. Results are in Fig. 3i. We perform the same process for all the stripes, and the results are merged together to get the final result (Fig. 3m) for the given point cloud in Fig. 3a. Results show that the new method is highly successful for road surface extraction and classification. To compare with Zhao et al. (2021), we perform their method and their results for the curb points are in Fig. 3l. Results clearly show that our algorithm extracts significantly better results than theirs. The main cause of their lower performance is as they mentioned, that their algorithm does not work for high/steep-slopes. Fig. 3i shows the presence of high and steep slopes of the ground surface. Although we have some points close to the curbs (within the black rectangle) that are miss classified (Fig. 3k), the method of Zhao et al. (2021) has many non-ground points belonging to surrounding objects that are falsely identified as curb points. For quantitative evaluation of the proposed algorithm, we manually labelled (ground truth; GT) the points by careful visual inspection. We calculate P, R and MCC for Zhao et al., (2021)

Experiment 2.
In the second experiment, we use another MLS data set (Fig. 4a) that covers a road section of around 31 m long with a total of 899,017 points covering road pavement, curbs, roadside way, road island, trees, building facades, vehicles, and road furniture: signposts, billboard and traffic signs. We slice the data along the road length into 62 stripes of 0.5m for each stripe. We perform the ground filtering algorithm for every stripe, results of one stripe Fig. 4c are shown in Fig. 4f. Now we subdivide the stripe again into 80 smaller patches along the road width (x), results are shown in Fig. 4g. We calculate the values for all the patches, draw a respective bar diagram (Fig. 4h), and arrange them according to their x values. We find the median value and outlying cases for the . Following the rules described in Section 3.5, we find the road pavement, road curb, road island, and roadside walkways. Results are in Fig. 4i. We perform the same process as for the previous experiment for all the stripes, and the results are merged together to get the final result (Fig.  4k) for the given point cloud. Results show that the method is successful for road surface extraction and classification. In Fig.  4j, we see some curb points are missed within the ellipses because of the presence of vehicle.
We manually labelled the ground truth (GT). Table 2 presents the quantitative results for the performance metrics. Results show road pavement classification accuracies are of 99.9% and 99.6% for P and R, respectively. In this experiment, we find road island with R of 99%. This time the results (P, R and MCC) for the curb points are slightly less accurate than the previous experiment mainly due to the curved road. Some pavement points are misclassified (FP) as curb points. Ground-truth; GT, and detected; D.

CONCLUSIONS
In this paper, a statistically robust algorithm for road surface and its related components (road pavement, curb, road divider/islands, roadside way) extraction in MLS data is presented. The algorithm can be considered as a split and merge approach that performs its workflow on a number of small stripes and then merges results to get the final and complete results. For each slice, it begins with a RLWR based approach to filter out non-ground points. Using the robust statistical filtering algorithm, it is possible to extract ground surface successfully in complex areas including large and tall buildings, high vegetation, and road furniture. RLWR based ground filtering is an iterative approach, hence this step takes significantly more time than the classification process in the next step. Although the proposed algorithm works in an iterative fashion, using local weights for the x-z and y-z profiles of the small slices for the proposed workflow makes the algorithm efficient and effective. Then three other sequential steps are performed in an iterative fashion to detect and delineate road pavement, curbs, road divider and roadside ways. Use of robust statistical approaches for small stripes and patches produces robust results in the presence of noise and outliers. Additionally, our method, unlike many of the methods that use local neighborhood based geometric features (e.g., normals and curvatures) typically influenced by outlier and noise, is robust in the presence of noise and outliers. This proposed algorithm is successful for both straight and curved roads, additionally it is independent on characteristics of road boundary (e.g., boundary type), because it divides data into many sufficiently small pieces (stripes and patches) in both along and across road directions (x and y). For the same advantage of using small and apposite sizes of stripes and patches; unlike, many existing methods, the new method performs well also for highslope areas. Our method does not require the conversion of 3D point clouds into 2D images or any structured regular format that could result in possible information loss.
The new method was demonstrated on two road sites in an urban environment. As the proposed approach is applied to small stripes of long road, it is scalable to a large extent as it is easy to run in parallel. Some parameters e.g., sizes of stripes and patches are data dependent, and need careful tuning, when dealing with long and curved roads point clouds. The new algorithm struggles when curb surface points are missing due to obstructions such as vehicles, human, etc. Future study will attempt fixing some of the above limitations by robustly fitting a curb line to the road edge.