A DEEP LEARNING APPROACH FOR THE RECOGNITION OF URBAN GROUND PAVEMENTS IN HISTORICAL SITES

: Urban management is a topic of great interest for local administrators, particularly because it is strongly connected to smart city issues and can have a great impact on making cities more sustainable. In particular, thinking about the management of the physical accessibility of cities, the possibility of automating data collection in urban areas is of great interest. Focusing then on historical centres and urban areas of cities and historical sites, it can be noted that their ground surfaces are generally characterised by the use of a multitude of different pavements. To strengthen the management of such urban areas, a comprehensive mapping of the different pavements can be very useful. In this paper, the survey of a historical city (Sabbioneta, in northern Italy) carried out with a Mobile Mapping System (MMS) was used as a starting point. The approach here presented exploit Deep Learning (DL) to classify the different pavings. Firstly, the points belonging to the ground surfaces of the point cloud were selected and the point cloud was rasterised. Then the raster images were used to perform a material classification using the Deep Learning approach, implementing U-Net coupled with ResNet 18. Five different classes of materials were identified, namely sampietrini , bricks , cobblestone , stone , asphalt . The average accuracy


INTRODUCTION
Due to a growing interest in smart cities and sustainable cities, the management of urban areas is a relevant topic for urban administrators.A comprehensive understanding of the urban environment is fundamental for making coherent interventions.The proper detection of sidewalks and pedestrian pathways can be helpful to the planning of routes within the city.Furthermore, for the proper management of physical accessibility, the identification of different pavements is important to plan accessible paths between Points Of Interest.
Typical methods for the detection of sidewalks from point clouds are based on the search for an element that acts as a separator between roadway and sidewalk.Such an element could be a curb or even a sudden change in points elevation.In both cases, the hypothesis is that sidewalks and roadways are clearly separated.To reach the purpose, many approaches were implemented in the literature.The authors of (Serna and Marcotegui, 2013), based on range images, height, and geodesic features, to segment urban objects and detected curbs; they also performed accessibility analysis.Curbs detection and classification was also performed by the authors of (Ishikawa et al., 2018), from MLS they extracted curbstones and classified whether they allow or not access to off-road facilities.A method to automatically classify urban ground elements from MLS data was proposed by Balado et al., 2018b, their method was based on a combination of topological and geometrical analysis.Element classification was based on adjacency analysis and graph comparison.Road, tread, riser, curb and sidewalks were detected to provide useful data from an accessibility point of view.Au-thors of (Hou and Ai, 2020) proposed a deep neural network approach to extract and characterize sidewalks from LiDAR data through a stripe-based analysis method.
When dealing with a historical urban environment, typical methods to detect sidewalks may fail, due to a not standardized organization of urban elements, and to a higher difficulty in finding replicable rules.For example, consider the fact that sidewalks and roadways tend not to be clearly separated either by curbs or by different elevations.However, one characteristic that is generally shared by urban historic areas is that the ground surfaces are often paved with a great diversity of materials.In fact, the use of different pavements to identify different urban elements is frequent in historical sites (e.g., one material for the sidewalk and a different one for the roadway).The term 'material' is used in this article to mean the material representing the macro area of the flooring.For example, a floor consisting of stone slabs placed side by side will be referred to simply as "stone", and the same concept applies to a floor consisting of several bricks placed side by side, which will be called "bricks".
The diversity of materials was already exploited to perform a semantic segmentation to identify the urban elements of the historic city.In a previous work (Treccani et al., 2021), we segmented roads and sidewalks of a historic site through a knowledge-based method.The case study of that work was the same tested in this paper: the city of Sabbioneta.In that case, the method started by subdividing the point cloud into several sub-clouds along the trajectory, then unsupervised machine learning (k-means algorithm) was exploited to split each sub-cloud into clusters according to some features (Omnivariance, Sphericity, Roughness, Intensity of the returning signal emitted by the laser scanner, and Z coordinate of points) that were identified as representative of the different characteristics of the ground surfaces.Finally, a voting system based on topological relations was used to detect roadways and sidewalks.
In this paper, we aim at reaching a similar goal, but by following a different approach.The aim of the work presented in this paper is the classification of different urban pavings within the city, from a MMS point cloud.The method consists of the selection of the ground points in the point cloud, rasterizing the point cloud and then classifying them according to their paving materials using Convolutional Neural Networks (CNN) through a DL approach.The result could be the basis for future work, where considering the topology and position of different pavings, the detection of sidewalks could be performed.It is, therefore, possible to say that, after the classification of the point cloud in different paving materials, to each paving a specific meaning could be assigned, also relating to the context, in such a way a second level of classification related to roadway and sidewalk could be exploited.
The paper is organized as follows.In section 2, a brief recall of scientific literature dealing with material recognition is presented.Section 3 describes the method, organized in preprocessing and DL predictions.Section 4 presents the result and the parameter used in the approach.Section 5 discusses the results, comparing also to the previously published method.Section 6 outlines some possible conclusions that can be drawn from the work presented in this article.

RELATED WORK
Classification of materials is a topic of great interest, which has been addressed in the literature at different levels and for different purposes.In fact, it is possible to find research works dealing with the detection of any kind of material, or works more focused on specific types of materials, for example the ones used in buildings, or in urban areas.Furthermore, there are various purposes, from methods to detect the materials from images or from point clouds, to the generation of libraries of images of materials to be exploited in further works.
Regarding the identification of materials of everyday objects from images, authors of (Kalliatakis. et al., 2017), proposed a comparison of different CNN-based approaches, tested on existing material databases.They found out that Medium CNN architecture in general performs the best in combination with different data augmentation strategies.Other authors focused more on the identification of specific types of materials, building materials, from point clouds.Yuan et al., 2020, identified 10 common building materials (including concrete, stone, mortar, wood, plastic) by implementing Machine Learning techniques (One-class Support Vector Machine method and Support Vector Data Description method) on a Terrestrial Laser Scanning dataset, basing on material reflectance, HSV colours, and surface roughness.Similarly, Zahiri et al., 2021 characterise building materials (concrete, mortar and bricks) by analysing data obtained with multispectral imaging and light detection and ranging (LiDAR).They implemented Partial Least Squares Discriminative Analysis to classify the three materials and also to differentiate distinctions between subtypes of the same material.Also related to the field of building materials is the work of Ilehag et al., 2019, which focusing on imaging spectroscopy and hyperspectral remote sensing techniques, proposed an urban spectral library composed of building materials such as roofs and fac ¸ades materials.
Focusing also on ground surfaces, Degol et al., 2016, proposed a method to use 3D geometry data (from photogrammetric point clouds) to improve features of 2D images of building materials and grounds.They also contributed to GeoMat, a material dataset composed of image and geometry data for isolated walls and ground areas and a large scale construction site scene.Finally, targeting only materials in urban areas, the work of Ilehag et al., 2020 deals with the classification of some urban materials, using spectral samples from remote sensing data from the publicly available material library KLUM (Ilehag et al., 2019), and exploiting Machine Learning models.They implemented three classifiers: Random Forest, Histogram-based Gradient Boosting Classification Tree, and Support Vector Machine.They tried the classification using either spectral features, textural features or a combination of both.From their results, they concluded that material classification is heavily based on spectral features.
From the mentioned papers, it is possible to conclude that in literature there are works that dealt with Deep Learning for the classification of materials, some of them also focused on point clouds, but to the best of our knowledge, none of them focused on the type of materials present in the urban pavements of a historical city.

MATERIALS AND METHODS
The purpose of the method presented here was to segment the point cloud according to different materials of the pavement on the ground surface.The raw point cloud was pre-processed, to select only the points on ground surfaces, then geometric features were computed and the point cloud was rasterized.On the basis of raster images, train, validation and test datasets were generated and DL method was used to perform the classification.The workflow is schematized in Figure 1.

Case study
The case study selected for these tests was the historic city of Sabbioneta, located in northern Italy, UNESCO site together with the city of Mantova since 2008.The city was surveyed in 2020 with the mobile laser scanning system Leica Pegasus:Two.Almost the entire city centre was surveyed, for a complete    All the other tracks were used for the testing (Fig. 2).The length of the training dataset covered almost 1000 m, while the testing dataset was about 5000 m.Both datasets include ground surfaces (mainly roads and sidewalks) with various pavements.
The pavings most used in the city centre are essentially five: sampietrini, bricks, cobblestone, stone, asphalt.There are few areas (less than 10 square meters) with pavings not exactly equal, but very similar to the first five.For example, under some of the porches there are brick paving similar to the ones used for sidewalks on the road.For the purposes of this test, and considering that this happens in limited areas, it was decided to consider a unique brick class later in the DL model.The same approach was used to other similar cases that may happen in the city.

Pre-processing
As a starting point, the point clouds were pre-processed to select only the points of the ground surface.The territory on which the city stands is predominantly flat, so an initial refinement consisted in selecting only those points that are located within a slice of the point cloud that lies between two horizontal planes, positioned respectively above and below the average height of the ground surface.Then, to remove all points that lie on an almost vertical surface, the value of the Nz component of the normal vector of each point was used as a threshold.The resulting point cloud included only points of the ground surfaces.
Expert architects made several on-site visits to the city of Sabbioneta to define all the materials of the ground surfaces.The result was a list of 12 classes, as provided by Figure 3.A manual classification of the point cloud was done to generate the Ground Truth (GT).Analysing the distribution of points on every class (see Tab. 1), it was deduced that some of the classes (6-12) were not enough represented in the point cloud to be correctly identified by the DL model.In fact, it is noticeable that the first five classes are the most present in the whole dataset, while the classes from 6 to 12 correspond to a few points (if summed, they are only the 7% of the whole dataset).Plus, those classes were referred to materials that could easily be assimilated to one of the first 5 classes.In fact, stone curbs (class 7) were considered as stone (class 4); brick layer in road (class 8), was considered as cobblestone (class 8); stone type 2 (class 9) was considered as stone (class 4); brick type 2 (class 11) were considered as bricks (class 2).Regarding rural terrain (class 10) and gravel (class 12), the points of those classes were outside the city centre, and the main interest in this paper is the city centre, so those two classes were removed.Finally, the GT consisted of 5 classes: sampietrini, bricks, cobblestone, stone, asphalt).
Then, specific geometric features of the point cloud were computed, and after a visual inspection of the results, the most representatives were selected for the DL approach.The point cloud was then rasterized, using the selected features as colour field for the pixel of the images (Balado et al., 2018a).The rasterization consists of the following steps.First, a bounding box is generated on the XY plane to contain the point cloud.Second, cells are generated based on the raster resolution.Points are assigned to each cell according to their corresponding XY coordinates.Third, features of points contained in the same cell are averaged to generate a pixel value.The reference ter image with the GT was generated based on the mode of the point labels in the same cell.The raster resolution was chosen according to the density of points in the input point cloud.To optimize the process, the rasterized point cloud was then split into several images; the images that contained no ground data were discarded.

Deep Learning classification
The network used for semantic segmentation was U-Net coupled with ResNet 18 to extract features.U-Net (Chen et al., 2018) is a network designed for image segmentation based on image coding and decoding.Meanwhile, the ResNet (He et al., 2016) architecture has shown to be one of the best performing in recent years.The use of residues (preserving features through the hidden layers) gives great versatility to this ResNet.In this work, Resnet 18 was used due to its good efficiency.
The classes used were the 5 previously described for the GT, with the addition of a further class, reserved for the (black) background pixels.

Pre-processing
The raw point cloud, after the survey of Sabbioneta, was made by 1.3 billion points.To select only the ground surfaces points, the point cloud was first cropped by two horizontal planes, placed 2 m above and below the average Z level of the terrain.
The normal vectors of the points were computed in CloudCompare (with a neighbourhood radius of 0.05 m) and only points with the Z component of the normal vector Nz < 0.9 were selected and used for further analysis.The resulting point cloud was made by 264.8 million points.
The geometric features were computed in CloudCompare, with two neighbourhood radius (0.05 m and 0.1 m).After a visual inspection of the point cloud coloured in false colours according to each feature, those with the attributes that showed strongly different patterns were selected.In specific: Intensity, Roughness (radius 0.05 m), and Omnivariance (radius 0.05 m).
The raster resolution was selected according to the point density on the point cloud, and it was set to 0.02 m.The values of the selected features were normalized and saturated before rasterizing the point cloud.In specific, the average Intensity was referred to the points falling on each pixel, offset at 0 and saturated at 1000; the roughness was computed using the mean of the points falling on each pixel and saturated at 0.02; the Omnivariance was computed using the mean of the points falling on each pixel.
After the rasterization procedure, the rasterized point cloud was split into several images of 500 × 500 pixel each.The training dataset was composed of 276 images (20% were used for validation).The test dataset consisted of 634 images.

Deep Learning classification
DL computations were set into MatLAB.The training took 133 minutes on a i7-3820 CPU 3.60 GHz, 64 GB RAM, NVIDIA GeForce GTX 560Ti.The overall accuracy of the test dataset was 94%.The confusion matrix is provided in Figure 4, while the computed performance metrics are presented in Table 3.Since the rasterized point cloud used for the test dataset was split into 634 images, it was not possible to prepare a meaningful merged image; with the purpose of visualising the prediction result, instead, it was possible to select some of those images and compare the GT with the predicted classes.This was done in Figure 5, where the comparison between ground truth and predicted labels in two portions of the test dataset was provided.

DISCUSSION
The method proved to be effective, with an overall accuracy of 94%.However, looking carefully at both the performance metrics and the confusion matrix, it could be seen that some classes were predicted more accurately than others.In fact, it could be observed that the two classes with the lowest values were Bricks and Stone.Both materials were present only on sidewalks (in the case of bricks, also under the city porches).
Although the result obtained was valid, some considerations are noteworthy.For the Stone class, looking at the confusion matrix, it could be seen that the misclassified pixels were evenly distributed among the other classes.Looking at Table 1, it could be noted that the Stone class is present in the dataset in only 4.7% of the points, it could therefore be deduced that the lowest accuracy could be attributed to a low representation of this class in the dataset.
For Bricks, on the other hand, it can be seen that 20% of the pixels were wrongly classified as cobblestone.This can be explained by the fact that in many places in the city of Sabbioneta the bricks are damaged.In fact, the main difference between the surface covered by bricks and the surface formed by cobblestone can be summarised in a great diversity in the spatial distribution of the points in the point cloud.Bricks tend to have a flatter surface than cobblestone.Therefore, considering the above, it could be deduced that in areas where the bricks were badly damaged, they may have been incorrectly identified as cobblestone.
Concerning the other classes, disregarding class background which is there, as said, just for technical reasons, the materials were correctly identified with high accuracy values.Even looking at Figure 5, with the comparison between GT and predictions, it could be noticed that, although the shape of the predicted areas was not so sharp and clean, a large part of them was correctly identified.
As already written in the introduction of this article, the Sabbioneta dataset has been previously used to perform a semantic segmentation using a knowledge-based method (Treccani et al., 2021).In that case, the method has been tested only on some portions of the dataset (not on the whole), with the aim of identifying the sidewalks and the road.In addition, through a threshold-based method, the pavement materials of the sidewalks were also identified, with a good success rate.
With the purpose of comparing the two methods, it may be noted that the method presented in this paper leads to the subdivision of the point cloud into materials and not into urban elements.In fact, to add information related to the urban element it is necessary to make a further step.For example, using some topological information it is possible to define sidewalks and roads starting from areas with similar material.It is therefore easy to later add a second level of information based on the one obtained by the DL method.
In terms of accuracy, computation time and pre-processing overhead, it is clear that DL requires less effort.In fact, in the knowledge-based method previously published, it was necessary to divide the point cloud into sub-clouds, following the acquisition trajectories of the instrument.Moreover, in that case, it was also necessary to compute some additional features.In conclusion, the DL method seems to be more convenient than the previously presented one.

CONCLUSIONS
In this paper, a method to recognize the urban ground pavements in a historic site was presented.The starting point of the method is a point cloud of a historical Italian city, Sabbioneta, acquired with a Mobile Mapping System.After an initial pre-processing, only the points belonging to the ground surface have been selected, the geometric features have been computed and the point cloud has been rasterized.The raster images obtained have been used to train, validate and test a DL method.
The results showed an average accuracy of 94%.
In this paper, it was necessary to make wide reasoning on the material classes identified in the city, choosing the most representative and most present in the whole dataset.Five were selected and at the end of the procedure it was possible to predict with great accuracy a point cloud of more than 191 million points, in less than 3 hours of computation.
The outcome of the presented Deep Learning method is a set of labelled images corresponding to the urban point cloud, where all the pavements are mostly correctly classified.It is therefore easy to move the labels from the pixels to the original point cloud by applying a back-projection as in Paz Mouriño et al., 2021.The method presented in this paper can be used upstream, or downstream, of other methods.Indeed, it can be used to characterize the material of a specific ground surface, previously segmented.Thus, it can be used downstream of a semantic segmentation that divides the point cloud into sidewalk and roadway; by selecting only the points relative to the sidewalk, it is possible to predict with the DL model the material of the sidewalk surface.Alternatively, the DL method can be used upstream, and then, as previously discussed, by selecting points of different materials, and by exploiting topological relations would then be possible to distinguish roadway pavements from sidewalk pavements.
Finally, it is possible to make some considerations related to the analysis of the confusion matrix and the two classes with lower accuracy.In fact, it is interesting to consider that future analysis and an improvement of the DL parameters, such as a different selection of the training dataset, or the addition of additional features, could lead to higher accuracy.It is also appropriate to observe that the prediction errors of the brick class were mainly in the areas where the bricks were damaged.In the future, it could be possible to exploit this kind of result, in order to quickly identify the damaged areas of the urban pavement.
Concluding, the results obtained with this method, open the way to possibilities related to further analysis, like computation of personalized paths for people with special needs, identification of damaged areas, or prediction of slippery ground surfaces.

Figure 1 .
Figure 1.Workflow of the method presented in this paper.

Figure 2 .
Figure 2. A map of the city of Sabbioneta.In red the roads used as train dataset and in blue the test dataset.

Figure 3 .
Figure 3.The 12 classes of materials identified on the ground surfaces of the city of Sabbioneta.Class 1-5 (highlighted in blue) are the one selected for the DL model.
length of 6000 linear metres of trajectory on the road framework.The survey was composed of 10 different tracks of acquisition.One track (named Track-A) was selected and used for training and validating the DL network, as it was very representative of all the materials of the ground surfaces of the city.
The training cycle was set to 10 epochs, with 55 iterations per epoch.The point cloud selected for the training of the DL model (Track-A, previously discussed) was split into train set (80%) and validation set (20%).Both training and validation set were used into each iteration of the training cycles of the method.The validation set was used every 50 iterations.

Figure 4 .
Figure 4. Confusion matrix after the DL prediction.The classes are referred to the paving materials present in the city of Sabbioneta.The label Background is referred to the black pixels of the background on each raster image.

Figure 5 .
Figure 5.A comparison between the Ground Truth and predicted labels (b,d,f), for two different areas of the road in test dataset.The presented in the image are three portions of roads, with different road width and different paving materials.

Table 2 .
Table 2 summarize the characteristics of training and testing datasets.Summary of the characteristics of the training and testing datasets.

Table 3 .
Performance metrics computed for each class of the DL predictions.Precision, Recall and F1-score were computed on the basis of the confusion matrix.