AUTOMATIC ROAD SIGN INVENTORY USING MOBILE MAPPING SYSTEMS

The periodic inspection of certain infrastructure features plays a key role for road network safety and preservation, and for developing optimal maintenance planning that minimize the life-cycle cost of the inspected features. Mobile Mapping Systems (MMS) use laser scanner technology in order to collect dense and precise three-dimensional point clouds that gather both geometric and radiometric information of the road network. Furthermore, time-stamped RGB imagery that is synchronized with the MMS trajectory is also available. In this paper a methodology for the automatic detection and classification of road signs from point cloud and imagery data provided by a LYNX Mobile Mapper System is presented. First, road signs are detected in the point cloud. Subsequently, the inventory is enriched with geometrical and contextual data such as orientation or distance to the trajectory. Finally, semantic content is given to the detected road signs. As point cloud resolution is insufficient, RGB imagery is used projecting the 3D points in the corresponding images and analysing the RGB data within the bounding box defined by the projected points. The methodology was tested in urban and road environments in Spain, obtaining global recall results greater than 95%, and F-score greater than 90%. In this way, inventory data is obtained in a fast, reliable manner, and it can be applied to improve the maintenance planning of the road network, or to feed a Spatial Information System (SIS), thus, road sign information can be available to be used in a Smart City context.


INTRODUCTION
Traffic signs are one of the most common visual aids in road networks.They provide useful information (warnings, prohibitions, etc.) to the road users, and contribute actively to the safety of the traffic environments (Koyuncu and Amado, 2008).The quality of a traffic sign is influenced by several factors (ageing, different forms of damage, loss of retroreflectivity properties) which are taken into account for maintenance activities.Periodic inspection of road facilities has to be ensured (European Commission, 2013) in order to keep the safety standards in the road network.The aforementioned maintenance activities are typically conducted by qualified personnel of the transportation agencies, who draw up reports taking into account, among other features, the type, location, geometry or retroreflectivity of each traffic sign.The report is finally used for maintenance planning, considering all the elements that do not meet the quality standards.Normally, the whole inspection process is carried out manually, and therefore it may be biased under the knowledge and subjectivity of the inspection team.The automation of the inventory and maintenance planning tasks will reduce the inspection bias and the life-cycle cost of the traffic signs, consequently improving both quality and safety of the road network and saving public resources.
Mobile Mapping Systems are vehicles equipped with different remote sensing systems, such as light detection and ranging (LiDAR) laser scanners, RGB cameras, and navigation sensors.Laser scanners are capable of collecting dense and precise threedimensional (3D) data that gather both radiometric and geometric information of the surveyed area.These data is comprised of a set of unorganized 3D points (point cloud) that can be processed * Corresponding author and applied to road network inspection and analysis.Precisely, the development of methodologies for the semantic labelling of road areas is an active research topic.Some works aim to detect and classify a relatively large number of objects, for example Luo et al. (2015) distinguish seven categories of objects including several forms of vegetation using a patch-based match graph structure.Yang et al. (2015) extract urban objects (poles, cars, buildings…) segmenting a supervoxel structure and classifying the segments according to a series of heuristic rules.Serna and Marcotegui (2014) classify up to 20 different objects using Support Vector Machines (SVM).Other works focus in the detection of a single object class within a point cloud.Street lights (Yu et al., 2015), curbs (Zhou and Vosselman, 2012;Wang et al., 2015) or trees (Reitberger et al., 2009) can be detected using LiDAR data.Regarding traffic signs, Pu et al. (2011) distinguish several classes of planar shapes that correspond to the possible shapes of traffic signs.In Riveiro et al. (2015) a linear regression model based on a raster image is used for classifying traffic signs based on their shapes.However, the resolution of a point cloud is not enough to distinguish the specific meaning of a traffic sign, therefore the study of optical images is needed.Wen et al. (2015) detect traffic signs based on their retroreflectivity, and project the 3D data on 2D images in order to classify the previously detected traffic signs.There exists a vast literature regarding traffic sign recognition in RGB images.The Traffic Sign Recognition Benchmark (GTSRB) (Stallkamp et al., 2012) gathered more than 50,000 traffic sign images and established a classification challenge.The best results were achieved by Cireşan et al. (2012).They combined various Deep Neural Networks (DNN) into a Multi-Column DNN, getting a recognition rate of almost 99.5%.Sermanet and Lecun (2011) or Zaklouta et al. (2011) are other state-of-the-art algorithms that derive from the GTSRB.
Both laser scanners and optical sensors present advantages and disadvantages for the traffic sign inventory task.Laser scanners collect accurate geometric data but their resolution may not be enough for a semantic analysis, while RGB images are not as reliable for 3D analysis but can solve the semantic recognition problem.
In this paper, both sources of information are combined in order to detect and classify vertical traffic signs in urban and road environments, and to extract geometric and contextual properties which can be of interest for inventory purposes.In Section 2, the proposed method is detailed.In Section 3, the study case and the obtained results are shown, and a comparison with Riveiro et al. (2015) and with Wen et al. (2015) method (which follows a similar workflow) is established.Finally, the conclusions are presented in Section 4.

PROPOSED METHOD
While driving a vehicle, drivers are capable of distinguishing traffic sings with ease in a relatively complex environment, as they typically have a previously learned schema for scanning the road.However, if factors like the position, the visibility and the condition of the signs do not conform the drivers' expectations, they may be skipped or omitted by them (Borowsky et al., 2008).This is one of the reasons that justifies the need of an optimal traffic sign maintenance planning in the road network.An automated inspection and inventory of the infrastructure will improve the efficiency and applicability of the above-mentioned maintenance planning.This section describes a method for the identification of geometric and semantic properties of traffic signs, using point cloud data and imagery acquired by a MMS that travels along the road network.The method is summarized in Fig. 1.First, a 3D point cloud is preprocessed and the ground points are removed.Then, the retroreflectivity of the signs surface is considered, and an intensity-based filter is applied in the cloud.Subsequently, the remaining points are clustered and further filtered, considering a planarity filter.Finally, each cluster of points is projected into 2D images (which are synchronized with the point cloud) in order to classify each traffic sign.

Point cloud preprocessing and ground removal
The preprocessing procedure requires two main inputs, namely the trajectory of the vehicle and the 3D point cloud itself.The MMS is comprised of two LiDAR sensors with a maximum range of 250 m, however the study cases are restricted to the road network.Therefore, points further than 15 meters from the trajectory are removed from the point cloud.With this step, unnecessary information (mainly buildings in urban environments and vegetation in highways) can be effectively removed from the process.
Once the point cloud has been preprocessed, the next step aims to remove the ground form the 3D cloud.For that purpose, the point cloud is projected to the XY plane, where a raster grid is created.An index   is assigned to every cell in the grid, so each point in the cloud is unequivocally related to a single cell.Subsequently, two features related with the height of the points are considered within each cell, which are the accumulated height ℎ  , and the vertical variance   .Both features are used for creating a raster image.For a cell   , the image  ℎℎ is defined as follows: where (  ) = number of points in the cell   .
The resulting image is normalized to the range [0,1] and binarized, mapping to 0 all pixels where 0 ≤  ℎℎ ≤ 0.001 and mapping to 1 the remaining pixels.
Analogously, a second image   is created, computing the average intensity of the points in each cell, and binarized, using the mean of all the elements such that   > 0 as threshold value.
The binary image that results from the logical operation (  ∧  ℎℎ ) is used for filtering out cells whose points have small elevations and intensity values.Therefore, only off-ground points will be analysed in further steps (Fig. 2a).The most important parameter that takes part in this process is the raster grid size, which is directly related with the resolution of  ℎℎ and   .Acceptable trade-off between processing time and ground removal results are obtained for grid sizes between 0.3m and 0.5m.The results in Section 3 have been obtained using a grid size of 0.5m.

Intensity filter and point clustering
The surface of any traffic sign incorporates sheeting with retroreflectivity materials.Therefore, light is redirected from the sign surface to the source and traffic signs are still visible at night (McGee, 2010).The intensity attribute of a 3D point cloud acquired with a laser scanner is proportional to the reflectivity of the objects, and it is a property that can be used to distinguish highly retroreflective objects from their surroundings (González-Jorge et al., 2013).
Once the ground is removed from a point cloud, a Gaussian Mixture Model (GMM) with two components is estimated using the intensity distribution of all the points in the cloud.The points from the component with biggest mean are considered as retroreflective points, whereas the remaining points are removed from the cloud.
This step performs a finer intensity filter than the previous step where the intensity based raster image   was used.The remaining point cloud after the application of this filter may comprise objects made of retroreflective materials, such that traffic signs, license plates, metallic parts in buildings that are relatively close to the laser scanner or even pedestrian reflective clothing.
The points of these objects, however, still remain unorganized.It is necessary to group together points that belong to the same object in order to analyse them separately.For that purpose, DBSCAN algorithm (Ester et al., 1996) is used.It clusters close points with a certain density and marks points in low density areas as outliers.This way, the points in the cloud are organized and any noise remaining after the previous filtering stages is removed.
Finally, it is possible to filter out those clusters that do not follow the geometric specifications for vertical traffic signs, which are previously known.Only planar clusters whose height is between 0.3m and 5m are kept.The planarity of a cluster is defined as: where ( 1 ,  2 ,  3 ) are the eigenvalues of the covariance matrix of the points within the cluster.Following the criteria of Gressin et al. (2013), a cluster of points is considered as planar if  2 < 1/3.This removes non-planar clusters, while the height restriction filters objects such as license plates that could be a source of false positives (Fig 2b).

Geometric inventory
At this point of the process, it is assumed that each cluster represents a traffic sign.Therefore, the organized point cloud data can be analysed and several features of interest can be retrieved.
Note that the traffic sign detection process relies on the intensity attribute of the 3D points and therefore in the retroreflectivity properties of the traffic sign surface.The detected clusters do not contain traffic sign poles even though they should be taken into consideration for the inventory together with the sign panel, as they can be bent or inclined.Before obtaining any parameter, each cluster is analysed in order to determine if the traffic sign is held on a pole.For that purpose, a region growing approach reconstructs the sign panel surroundings, and an iterative pole-like object detection is performed, searching for a group of points under the traffic sign panel whose linearity  1 > 0.5, being If a pole is found, its director vector is defined as the eigenvector  1 corresponding with the biggest eigenvalue  1 for the points which scored the largest  1 value in the iterative process, and the inclination angle of the pole is computed in both front view (  ) and profile view (  ) of the sign.Furthermore, and independently of the existence of a pole holding the sign, the following parameters are extracted from each cluster:  Position (  ), a (x,y,z) point defined as the centroid of the points of the traffic sign panel. Height (ℎ  ) of the traffic sign with respect to the ground. Azimut (  ).Clockwise angle between the projection of the normal vector of the sign panel on the XY plane and the North. Contextual parameters, namely the distance (  ) and the angle (  ) between the traffic sign and the trajectory.
Finally, a vector with the parameters  = (  , ℎ  ,   ,   ,   ,   ,   ) is assigned to each cluster of points, and defines the geometry of the traffic sign.(Fig. 3) These data may be used to update any existing database and to be compared with the information collected in previous surveys in order to detect changes in the geometry of the sign.

Point cloud projection on images and traffic sign recognition
The geometric parameters defined in the previous step do not offer the complete meaning of a traffic sign.The specific class of each sign must be retrieved from the MMS data together with the geometric inventory.For that purpose, the 3D points of each traffic sign are projected into 2D images taken during the survey.
Every 2D image has associated with it the position and orientation of the vehicle, and the GPS time in the moment each image is taken, which is synced with the 3D point cloud.The usage of 2D images is motivated by the aforementioned insufficient resolution of laser scanners for the traffic sign recognition task.
The inputs for this process are the traffic sign clusters (which were obtained in Section 2.2 and contain the coordinates and time stamp for each point of the sign), the 2D images, the data associated with each image (mainly time stamp and vehicle frame for every image), and the orientation parameters of the four MMS cameras with respect to the vehicle frame.
Let  = (, , ,   ) be a traffic sign cluster.First, only those vehicle frames whose time stamp is in the range of   ̅ ± 5 are selected.Subsequently, the points in the cloud are transformed from the global coordinate system to each vehicle coordinate frame.This way, both the 3D cloud and the coordinate system of each camera  ( = 1 … 4) are measured from the same origin.
The camera plane   and the principal point   of the camera  are defined as: This process is graphically described in Fig. 4a.With this projection, the region of interest (ROI) of one or several images which contains the same traffic sign can be precisely defined.In order to add some background to the ROI and minimize possible calibration errors, a margin of 25-30% of the ROI size is added to the initial region (Fig 4b).
Once the images that contain a traffic sign are stored, a Traffic Sign Recognition (TSR) algorithm can be applied.There already exist numerous TSR methods which have proven remarkable results as mentioned in Section 1.Since this work is focused on 3D point cloud processing and not on image processing, the TSR method used will be briefly outlined.
For each image, two colourred and bluebitmaps are created, based on the pixel intensity of the image in Hue-Luminance-Saturation (HLS) space.The shape of the binary image is classified in seven classes that represent traffic sign categories (prohibition, indication…) using Histogram of Oriented Gradients (HOG) (Dalal and Triggs, 2004) as feature vector and Support Vector Machines (SVM) (Cortes and Vapnik, 1995) as classifier.Given the sign category, a second classification is performed.A colour-HOG feature is used in this case, which is created by concatenating the HOG feature of each colour channel of the image in the CIELab colour space (Creusen et al., 2010).This feature is again classified by means of SVM, and the type of the sign is finally retrieved.

RESULTS
This section presents, in the first place, the case study in which the method in Section 2 was tested.Then, the results for traffic sign detection and point cloud projection on images are shown, compared, and discussed.

Case study
The surveys for the case study were conducted using a LYNX Mobile Mapper by Optech Inc. ( 2012), which is equipped with two laser scanners, located with a 45 degree angle with respect to the trajectory and 90 degrees between their rotational axes.The field of view (FOV) of the scanners is of 360 degrees.An Inertial Measurement Unit (IMU) and a two-antenna measurement system (GAMS) compose the navigation system.Furthermore, it is equipped with four 5-MPix JAI cameras which are synced with the scanners (Fig 5a).An analysis of the MMS can be found in (Puente et al., 2013).
Two different scenarios were surveyed in Galicia, in the northwest of Spain, concretely a centric, crowded avenue in the city of Lugo, and a highway section that includes Rande's bridge (which is 1.5km long) and sections of conventional roads (Fig 5b

Results and discussion
Traffic sign detection results for the study case are shown in  A precision of 89.7% and a recall of 97.9% are obtained.The recall value is especially remarkable as only a 2% of the traffic signs in the case study are not detected with this method.However, there exist a 10% of false positives.Pedestrian reflective clothing and planar, metallic objects are the main sources of false positives.The results are worse for the urban environment as it is a cluttered scenario where the number of reflective objects is larger.
The proposed method is compared, first, with in Riveiro et al. (2015).The modifications in the methodology have significantly improved the recall although the precision is slightly worse.Finally, a comparison with Wen et al. (2015) shows a better global performance of this method.However, the case studies are different for both results, therefore the comparison may not be totally accurate.
Finally, the overall quality of the point cloud projection onto 2D images is quantified.More than 200 images which contain a traffic sign detected in a 3D point cloud (both urban and highway case studies were considered) were manually cropped in order to generate a ground truth.Then, these images were compared with the bounding boxes obtained with the automatic projection.Precision and recall metrics were used for quantifying the results, being: where   is the rectangle obtained after the projection of the traffic sign points on the image.  is the manually cropped rectangle.
A precision of 92.8% and a recall of 67.75% were obtained.These results justify the addition of a margin that increases the size of the projection, as introduced in Section 2.4.This way, the recall gets close to 90%.
Results for traffic sign recognition in 2D images are omitted in this paper as they are still far from the state of the art.

CONCLUSIONS
This paper presents a method for an automated detection, inventory and classification of traffic signs using 3D point clouds and imagery collected by a LYNX Mobile Mapper system.The radiometric properties of the traffic sign panels (made with retroreflective sheeting) are essential in this method, as the intensity attribute of the points in the cloud is the main feature used for the segmentation process.Existing algorithms (DBSCAN, GMM, PCA…) are combined in order to develop a methodology that allows to group together points that belong to the same traffic sign.Several geometric properties can be extracted from each detected sign in the point cloud which can be useful for the maintenance planning of the road network.Furthermore, the 3D information is projected into RGB images collected by the MMS allowing the application of a TSR automatic algorithm and the consequent semantic definition of each traffic sign.
The results obtained in this work are promising.However, the TSR has to be improved in order to offer a robust, fully automated process.The lack of a Spanish traffic sign image database has been a drawback, as the number of instances for several sign classes was not enough to obtain robust classification models.
Together with an improvement in the TSR results, future work should be focused in offer a more complete definition of the road network, that is, to detect other elements of relevance as road markings or traffic lights.Finally, this information should be conveniently integrated in a road network database.

Figure 1 .
Figure 1.Flowchart which describes the proposed method.

Figure 2 .
Figure 2. (a) Ground segmentation.The points in the ground (painted in brown) are filtered from the point cloud.(b) Intensity filter.Points of reflective objects (painted in red) are selected.

Figure 3 .
Figure 3. Geometric inventory.Different parameters can be extracted using the point cloud data (a) Front view of the traffic sign.(b) Lateral view of the traffic sign.(b) Contextual features using the trajectory.
, ) − (  ,   ,   )] = 0 (4)   = −  (  ,   ,   )    (5) where   is the principal axis,   is the focal distance and (  ,   ,   ) is the optical centre.Finally, points in  are projected to   .The projection on camera coordinates (  ,   ) of each point is transformed to pixel coordinates (  ,   ):  ′ ) is the distorted image point, considering radial distortion, (  ,   ) is the principal point of the camera and   is the pixel size.

Figure 4 .
Figure 4. Point cloud projection on images.(a) The points of a traffic sign panel are projected into the camera plane.The distances of the projected point with respect to the principal point of the camera are transformed to pixel coordinates.(b) The bounding box of the projected points (red rectangle) is extended (green rectangle) in order to add some background to the detection, and the image is cropped.

Figure 5 .
Figure 5. Case study.(a) Mobile Mapper System used for the survey.(b) Map of the surveyed areas.

Table 1 .
).The survey areas were open during the circulation of the MMS, and the traffic was dense in both cases.The data associated with the survey can be found in Table1.Case study data

Table 2 .
The parameters that measure the results, where TP are True Positives, FP are False Positives and FN are False Negatives, are defined as:

Table 2 .
Assessment and comparison of the method.