OBJECT-BASED AND SUPERVISED DETECTION OF POTHOLES AND CRACKS FROM THE PAVEMENT IMAGES ACQUIRED BY UAV

: Roads are the basic element of land transportation system. After construction, the quality of road will decrease because of the aging and deterioration of the road surface. In the end, some distresses will appear on the pavement, such as the most common potholes and cracks. In order to improve the efficiency of pavement inspection, nowadays some new forms of remote sensing data without destructive effect on the pavement are widely used to detect the pavement distresses, such as digital images, LiDAR and Radar. In our study, the digital pavement images acquired by Unmanned Aerial Vehicle (UAV) and four popular supervised learning algorithms (KNN, SVM, ANN, RF) were used to distinguish between the normal pavement and pavement damages (i.e. cracks and potholes). Each of learning algorithms was given a series of different parameters, and the classification accuracy and computational time as two assessment criteria of the algorithm performance were calculated. Finally, four best models for each kind of learning algorithms were selected based on the standard of highest accuracy and minimum running time.


INTRODUCTION
The quality of pavement has a close relationship with the lifetime of road (Pan et al., 2017).In a general, because of the combined effect of aging and deterioration of road surface, some kinds of distresses would always appear on the pavement finally.Potholes and cracks are the two most common categories of road surface damages (Hajek et al., 1986).Previously, time-consuming field investigations and manual measurements were the traditional methods to detect and evaluate the pavement distresses, many of which were destructive to the road surface meanwhile (Eriksson et al., 2008).Currently, with the support of computer and remote sensing technologies, many forms of remote sensing data without destructive effect on pavement and some advanced pattern recognition algorithms are introduced into the detection of pavement damages, such as digital images, LiDAR and Radar (Mettas et al., 2015;Schnebele et al., 2015;Zhang & Bogus, 2014).Pavement Management System is one highly integrated system with some types of sophisticated remote sensing sensors, which is commonly mounted on a mobile vehicle to collect the remote sensing data for pavement monitoring by majority of road departments (Schnebele et al., 2015).Digital pavement images are the most commonly used data type that can be used to extract the features of pavement distresses, such as spectral features, geometry features and texture features (Koch, et al., 2015).These features are imported into appropriate classification models (e.g.support vector machine) to determine the categories of road surface damages finally (Mokhtaria et al., 2016;Xu et al., 2008).This is the basic procedure of pavement distress detection using digital pavement images.LiDAR technology can directly acquire the elevation information of the deteriorated pavement to measure the depth of pavement damages (Choi et al., 2016).Recently, one microwave device called Ground Penetrating Radar (GPR) has been widely used to detect the pavement defects.GPR utilizes radar pulses to image the subsurface profile to detect subsurface objects, changes in material properties, voids and cracks, which is very convenient and accurate (Loizos & Plati, 2007).
However, some limited abilities and issues occurred in the previous studies.For instance, most of studies just only focused on one kind of distress, such as cracks or potholes, whereas more than one type of damages could exist on the pavement at the same time.The mobile vehicle integrated with PMS also has a potential risk for the traffic safety and is unable to cover the full pavement of different lanes simultaneously.Given these above problems, the pavement images acquired by Unmanned Aerial Vehicle (UAV) were used to implement the study, and four supervised learning algorithms including K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Artificial Neural Network, and Random Forest (RF) were evaluated in terms of the performance on the detection of potholes and cracks from the UAV images of road pavement.

Image Acquisition and Segmentation
The asphalt pavement located in rural area of Shihezi City, Xinjiang/China was selected in the study.According to the field investigation, the majority of the pavement was in poor condition with a variety of severe pavement distresses, such as potholes and cracks.A multispectral camera Micro-Miniature Multiple Camera Array System (MCA), designed by Tetracam Inc. USA, was mounted on a fixed-wing UAV to capture the pavement images.Theoretically, MCA configures six bands spanning from blue to near infrared, i.e.Blue, Green, Red and three near infrared channels (Kelcey & Lucieer, 2012).However, the images captured by the three infrared channels do not have sufficient exposure, which results in a lower contrast between the non-distressed and distressed pavement.Therefore, only the images in RGB channels were chosen in this study.The UAV flew along the road at 30 meters above the ground level, in which case one pixel corresponded to about 13.54*13.54mm area in the pavement.In total, 126 pavement images were acquired with 70% of overlap between two sequential images.However, there is no white traffic line in those above pavement images, which is also one of the common objects on the road surface.In order to increase the generalization of this study, a sample UAV pavement image provided by Airsight Company (https://demo.airsight.de/uav/index_en.html)was used to extract the white traffic lines.This pavement image also has three RGB channels and was captured by a digital camera with a higher resolution (1 pixel = 5 mm).
Given the high resolution of pavement images, Multiresolution Segmentation (MS) algorithm integrated in eCognition Developer Software 9.0 was used to extract the objects of potholes and cracks from pavement images.MS identifies single image objects of one pixel in size and merges them with their neighbours based on relative homogeneity criteria.This homogeneity criterion is a combination of spectral and shape criteria, which is calculated through a comprehensive scale parameter.Higher values for the scale parameter result in larger image objects, smaller values in smaller ones (Darwishet al., 2003).However, it is difficult to choose one appropriate scale parameter to extract intact potholes and cracks simultaneously.The contrast, one texture feature calculated based on the Graylevel Co-occurrence Matrix (GLCM) (Su et al., 2008), was selected to measure the variations within the distress and nondistressed areas in the study.The formula for calculation of contrast feature is: Where, i, j are the row and column number of GLCM respectively.P(i,j) is the value in the cell i, j.N is the number of rows or columns.In order to obtain an intact pothole object, one merge action was conducted based on the contrast values of objects over the initial segmentation resulting from the lower scale parameter.Namely, all image objects, the contrast values of which exceed the given threshold, will be merged into one image object.

Dataset Preparation and Feature Selection
Sufficient sample data are necessary for training and validating machine-learning algorithms (Mokhtaria et al., 2016).Three classes were defined in this study, i.e. pothole, crack and nondistressed pavement that includes damage-free pavement, with white and yellow traffic lines.However, there are limited numbers of potholes and cracks on the pavement we studied.In comparison of two sequential images, it can be observed that the pixel values in the same location has a bias because of the illumination differences caused by the different solar incident angle.Consequently, this will lead to some degree of difference between the segmentation results of the same target derived from different images.Hence, dataset preparation will be implemented based on three rules: (a) 126 pavement images are segmented individually following the procedure mentioned in section 2.1; (b) the same target in two of sequential images are thought to be of two different objects; (c) white traffic line samples were collected from the image provided by Airsight Company.Finally, 1430 samples containing 221 potholes, 678 cracks and 531 non-distressed pavements with 299 damage-free pavements, 122 yellow and 110 white traffic lines respectively were collected.
Feature selection has a great influence on the performance of learning algorithms.Reasonable numbers and types of features are able to increase the accuracy of algorithm while decreasing the computation time (Oliveira & Correia, 2009).Generally, three types of image features can be extracted from digital images, i.e. spectral feature, geometry feature and texture feature.In this study, based on the prior knowledge of feature value distribution of every kind of image objects, 18 features containing 6 spectral features, 6 geometry ones and 6 GLCM texture ones were introduced to train and validate the learning algorithms (Table1).Furthermore, considering the different value distribution of each feature, feature normalization was implemented based on the equation (2) below.
Where X  is the normalized feature vector.  ,   are the maximum and minimum values of feature X respectively.Consequently, values of all features are in the same range from 0 to 1, which should speed up the convergence efficiency of learning algorithms.In order to verify the capabilities of each type of feature towards the detection of potholes and cracks, six combinations of three types of features were introduced to each classification algorithm, i.e. spectral(C1); geometry(C2); texture(C3) features; spectral and geometry features(C4); geometry and texture features(C5); spectral, geometry and texture features (C6).

Detectors of Potholes and Cracks
Four supervised classifiers including K-Nearest Neighbours (KNN), Support Vector Machine (SVM), Artificial Neural Network (ANN) and Random Forest were selected to detect the potholes and cracks in this study.In order to examine the predictive accuracy of learning algorithms, and to protect against overfitting, the 1430 samples were randomly divided into 5 folds.For each fold, a model is trained using the out-offold observations, and the classification accuracy of the model is calculated using in-fold data.Finally, the average classification accuracy over all folds is an indicator of the model performance.Exceptionally, the performance of Random Forest would be validated using the Out-of-Bag (OOB) Error (Breiman, 2001) instead of the above n-fold validation procedure.All the algorithms are run on one PC configured with Core i7-6700HQ CPU@ 2.6GHZ, Nvidia Quadro M1000M GPU and 16GB RAM.The running time of different models was also recorded as one of important indicator of the algorithm performance.

K-Nearest Neighbours
K-Nearest Neighbours (KNN) is one type of instance-based and lazy learning algorithm, which determines the class of observation that represents the maximum of its neighbours ( Zhang & Zhou, 2007).The parameter K determines the number of neighbours considered.The distance between the observation and samples could be defined by each of their Euclidean distances, Minkowski distance etc.Generally, the class of observation will be assigned directly based on the class of majority neighbours.However, KNN might bias the outcome when the number of nearest neighbours in one class is less than other relatively distant neighbours that belong to another class.Therefore, distance weighting is always introduced to refine the classification result of KNN.Namely, the nearer neighbours will contribute more to the outcome than the more distant ones.
A common weighting scheme consists in giving each of K neighbours a weight of 1/d 2 , where d is the distance of observation with respect to its neighbours.Among these parameters, the parameter K has a great impact on the accuracy of KNN.In this study, we present a series of K to verify how many of K would best towards this application.The Minkowski distance and weighting scheme of squared inverse of distance were selected for the experiment.

Support Vector Machine
Support Vector Machine (SVM) is a classification system derived from statistical learning theory.It separates the classes with a decision surface that maximizes the margin between the classes.The surface is often called the optimal hyperplane, and the data points closest to the hyperplane are called support vectors.

Artificial Neural Network
Artificial Neural Network mimic the way human brain solves problems with a large number of neurons (Saar & Talvik, 2010).ANN is composed typically of three kinds of layers, i.e. the input layer, the hidden layer and the output layer.Every layer comprises a certain number of nodes similar to the neurons in the brain.The number of nodes in the input layer is determined by the number of features in the example data, while the number of output classes decides the number of nodes in the output layer.The number of hidden layers and associated nodes could vary for different applications.Moreover, every node corresponds to a kind of activation function which defines the output of that node given a set of inputs.Sigmoid, Softmax, Rectified Linear unit (ReLU) are commonly used in ANN.
Which of them should be used depends on the objective of application.Back propagation is one widely used training procedure for ANN to adjust the weights and bias between the nodes.In this study, a three-layer feed-forward network with one input layer, one Sigmoid hidden layer and one Softmax output layer was constructed to classify the potholes and cracks.The network will be trained with the conjugate gradient method to minimize the difference between the output node activation and the output.In order to find out the appropriate number of nodes in the hidden layer for pavement distress detection, a series of numbers from 1 to 10 was evaluated based on the accuracy of classification result.

Random Forest
Random Forest (RF) is one member of ensemble learning algorithms, which combine a certain number of decision tree classifiers together as a forest to predict the class of new examples (Breiman, 2001).Every tree in the forest is trained with a subset training set, which is resampled from the original training dataset.The resampling is implemented with replacement and follows the bootstrap sampling procedure, i.e. the number of subset examples is the same as the original examples.In addition to the resampling of training examples for every tree, the features used to find the best split at each node of tree are resampled from the original feature set as well.The class of new examples is predicted by every tree in the forest, and is assigned based on a majority vote of them.The number of trees has a significant effect on the computation time of RF.
As a result, a series of evaluation for what size of forest will perform best on pavement distress detection was conducted in this study.

RESULTS AND DISCUSSION
Classification accuracy and computational time are selected as the two indicators of the performance of four learning algorithms.Classification accuracy is defined as the ratio of the number of successfully classified and total samples.Figure 1 illustrates the classification accuracy of KNN trained and validated using different settings of K and six groups of features.The accuracy of all models has a slight increase first and then decreases gradually while increasing K.It can be observed that the model trained with the combination (C6) of spectral features, geometric and textural features always performed best with the highest accuracy, while both the individual set of spectral or geometric features always presented almost similar performance with lower accuracy.Moreover, the figure presents that the individual textural feature set contributes more to the accuracy of KNN among the three types of features (Figure 1(a)).Figure 1(b) is the running time variation of different KNN models and indicates that the running time has no significant fluctuation over increasing the K for every feature combination.In general, the more features were used, the more time taken for KNN.The model with combination of C6 cost the most time while it can achieve the highest accuracy.Figure 1(c) shows the relationship between running time and classification accuracy of the best performance of each of six feature combinations.In order to make a compromise between the time and accuracy, K equals 4 and feature combination C5 containing the geometric and textural features were the best choice, which can result in an overall accuracy of 98.81% and 0.65s running time.(Table2).
Figure 2 indicates the performance of SVM configured with different types of kernel functions and six feature combinations.
Figure 2(a) shows that the SVM with linear kernel presented a lower classification accuracy when it is trained and validated only using either spectral features or geometry features individually.Along with introducing texture features or more types of features, the four kinds of SVM models (linear, quadratic, cubic, Gaussian) almost performed similarly on feature combination C3, C4, C5 and C6, and the highest accuracy was acquired by using three types of features together.3).OA: Overall Accuracy Figure 3 presents the variation of classification accuracy and running time of ANN with respect to different numbers of neurons in the hidden layer.Specifically, when the number of hidden neurons was set to one, it means that only one abstract feature in hidden layer was used to classify the objects, which was not sufficient to distinguish between the pavement and distresses (cracks and potholes).Moreover, it took the most time to train and validate ANN in this case.With increasing the number of hidden neurons, the classification accuracy could benefit a lot from the more abstract features learned by ANN,  (a) shows that the ANN models with more than one type of features (C4, C5, and C6) and two more hidden neurons could always result in a higher accuracy.It also can be observed that when the number of hidden neurons was set over two, the classification accuracy did not change so much.Taking account of the running time as illustrated by Figure 3(c), the ANN with 12 hidden neurons and feature combination C4 was the best model to classify the pavement and distresses with the overall accuracy 98.81% and the corresponding running time was 0.35s.(Table 4).
Figure 4 shows the performance of RF with different number of trees in the forest.Obviously, the accuracy of RF maintained increasing along with the growth of quantity of trees until a flat trend.The feature combinations with one more type of features (C4, C5, C6) performed best and similarly when the number of trees in forest exceeded about eight.Figure 4(b) shows the running time of RF and demonstrates that the RF with feature combination C1 always cost most time compared with other feature combination.Moreover, there is a positive correlation between the trees and running time.As Figure 4(c) shows, the RF with 18 trees in the forest was the best model to detect the pavement and distresses when using the feature combination C4 (Table 4).The calculation time was only 0.09s.

CONCLUSION
Remote sensing technology as a non-destructive method for road surface inspection has been widely used in road departments nowadays.UAV is one flexible platform that can be configured with different kinds of remote sensing sensors to monitor the pavement condition.Compared with the conventional vehicle-based PMS system, the UAV remote sensing system can acquire the full pavement images of different lanes simultaneously and does not have significant impact on the normal traffic.Moreover, benefit from the full coverage of the pavement, different kinds of pavement distresses can be extracted from UAV images at the same time.
In this study, a set of digital pavement images acquired by UAV and four popular learning algorithms (KNN, SVM, ANN, RF) were used to identify the road surface damages.It can be concluded that each kind of learning algorithms when given a specific set of parameters and features can achieve a high classification accuracy (over 98%) while using less computational time.Finally, taking account of the classification accuracy and running time together, four best models for each kind of learning algorithms were recommended, which all have the best performance on the detection of pavement potholes and cracks.It includes the KNN with K being 4 and feature combination of geometric and textural features, the SVM with linear kernel and feature combination of spectral, geometric and textural features, the ANN with 12 nodes in hidden layer and feature combination of spectral and geometric features, the RF with 18 trees and feature combination of spectral and geometric features.Among the four best models, the RF could get the best performance with a higher classification accuracy and minimum running time.In the future, more pavement images acquired by UAV should be used to further evaluate the performance of these best models on the detection of potholes and cracks.Other kinds of remote sensing data including LiDAR and Radar by UAV also have a great potential ability in the pavement condition monitoring.Additionally, other advanced learning algorithms could also be introduced into the pavement distresses detection, such as convolutional neural networks.

Figure 2
Figure2indicates the performance of SVM configured with different types of kernel functions and six feature combinations.Figure2(a) shows that the SVM with linear kernel presented a lower classification accuracy when it is trained and validated only using either spectral features or geometry features individually.Along with introducing texture features or more types of features, the four kinds of SVM models (linear, quadratic, cubic, Gaussian) almost performed similarly on feature combination C3, C4, C5 and C6, and the highest accuracy was acquired by using three types of features together.Figure2(b) indicates the running time by different SVM models.It can be seen that the SVM with polynomial kernels (Quadratic and Cubic) cost most time on the feature sets of C1, C2, C3.For C4, C5, and C6, all types of SVM models performed similarly on the running time.It is interesting that the SVM models with linear and Gaussian kernel took almost the same time on each of six feature combinations.The best performances of the six feature combinations were plot in the time-accuracy diagram (Figure2(c)).It shows that the SVM model configured with linear kernel and the feature combination C6 could achieve the highest classification accuracy (98.95%) while cost the least running time (0.59s) (Table3).

Figure 1 .
Figure 1.(a) The classification accuracy and (b) running time of KNN with respect to different K, and (c) the relationship between running time and accuracy of the best performance of each of six feature combinations of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W4, 2017 Tehran's Joint ISPRS Conferences of GI Research, SMPR and EOEC 2017, 7-10 October 2017, Tehran, Iran and the running time decreased generally (Figure 3(b)).Figure 3

Figure 2 .
Figure 2. (a) The classification accuracy and (b) running time of SVM over six feature combinations and four types of kernel function, i.e. linear, quadratic, cubic and Gaussian; (c) the relationship between running time and classification accuracy of the best performance of six feature combinations

Figure 3 .
Figure 3. (a) The classification accuracy and (b) running time of ANN with respect to different numbers of hidden neurons; (c) the relationship between running time and classification accuracy of the best performance of six feature combinations

Figure 4 .
Figure 4. (a) the classification accuracy and (b) running time of Random Forest over a series of numbers of trees; (c) the relationship between running time and classification accuracy of the best performance of six feature combinations

Table 1
The support vectors are the critical elements of the training set.SVM is one of non-probabilistic binary classifiers to assign new examples to one category or the other.It means one SVM can only solve the two-class problems.SVM can also perform the multiclass problems by combining several binary SVM classifiers together based on the logic classification procedure of one-vs-one or one-vs-all.One special feature of SVM is the kernel function, which is introduced to deal with non-linear classification problems.The kernel function can map the original examples into a high-dimensional feature space, in which case the non-linear classification problem will become the linear case.There are several types of kernel model with different performance for different applications, such as linear kernel, polynomial kernel, Gaussian kernel etc.In the study, the performance of four types of kernel models on detection of potholes and cracks were evaluated, i.e. linear, quadratic, cubic and Gaussian.