A DEEP LEARNING APPROACH FOR URBAN UNDERGROUND OBJECTS DETECTION FROM VEHICLE-BORNE GROUND PENETRATING RADAR DATA IN REAL-TIME

: GPRs (Ground Penetrating Radar) are widely adopted in underground space survey and mapping, because of their advantages of fast data acquisition, convenience, high imaging resolution and NDT (Non Destructive Testing) inspection. However, at present, the automation of the GPR data post-processing is low and the identification of underground objects needs expert interpretation. The heavy manual interpretation labor limits the GPR applications in large-scale urban scenarios. According to the latest research, it is still an unsolved problem to detect targets or defects in GPR data automatically and needs further exploration. In this paper, we propose a deep learning method for real-time detection of underground targets from GPR data. Seven typical targets in urban underground space are identified and labelled to construct the training dataset. The constructed dataset is consist of 489 labelled samples including rainwater wells, cables, metal/nonmetal pipes, sparse/dense steel reinforcement, voids. The training dataset is further augmented to produce more samples. DarkNet53 convolutional neural network (CNN) is trained using the constructed training dataset including realistic data and augmented data to extract features of the buried objects. And then the end-to-end YOLO detection framework is used to classify and locate the seven specific categories buried targets in the GPR data in real time. Experiments show that the automatic real-time detection method proposed in this paper can effectively detect the buried objects in the ground penetrating radar image in real time at Shenzhen test site (typical urban road scene).


INTRODUCTION
With the rapid process of urbanization, the development and utilization of urban underground space has aroused great attention.Accurate and efficient detection of urban underground space targets, as well as identifying their types and distribution conditions are the premise and key to ensure the safety of urban underground space.The targets underground the city include a large number of artificial structures (e.g.subway, pipeline corridor) and complex potential diseases (e.g.empty holes).Traditional surveying and mapping methods based on photography and laser scanning are not penetrating, and cannot effectively detect the positions and characteristics of urban space targets underground.Ground Penetrating Radar (GPR) has the advantages of high efficiency, non-destructive, penetrability and high imaging resolution, making it an essential role from geophysical exploration (Enze Z, 2017), building quality detection, road and bridge detection, tunnel quality detection to underground detection and classification (LAI W L and DÉROBERT X, 2017).Because of the complex structures of underground target and the diversity of potential diseases, the analysis of GPR image still relies mainly on human-computer interaction to locate and detect underground targets, which cannot meet the needs of large-scale urban underground space exploration and census.
The main detection methods for underground pipeline targets in GPR images are hyperbolic feature extraction based on Hough transform (Windsor C and Capineri L, 2014;Li W and Cui X, 2016), which is limited by the huge amount of computation caused by processing and discretizing a large number of parameters.The hyperbolic feature based on template matching requires manual design of a large number of parameters to depict different target features (Sagnard F and Tarel J, 2016;Terrasse G and Nicolas J, 2016).The automation and accuracy of featurebased gradient direction histogram (HOG) and Haar-like features algorithm for underground target detection in GPR images need to be further improved to eliminate false alarm and missed detection (Torrione P and Morton K, 2014;Maas C and Schmalzl J, 2013).
In recent years, breakthroughs have been made in optical image target detection based on deep learning (Krizhevsky A and Sutskever I, 2012;Girshick R, 2015).In underground target detection with hyperbolic echo characteristics in GPR images, some scholars proposed a deep learning method based on FASTER RCNN (Pham M and Lefè vre S, 2018).However, due to the variety and complexity of underground targets, it is impossible to detect underground space targets precisely and effectively only depending on one feature.In order to solve the problems above, this paper proposed a real-time deep learning method for underground target detection based on YOLO V3 target detection and Darknet-53 convolution neural network (REDMON J and FARHADI A, 2018), constructed several typical underground target databases, and verified the method by taking the vehicle-borne GPR image of Shenzhen urban road as an experiment.The acceptable result shows the average precision and recall rate of target detection, both of the two are over 85%.

DETECTION OF UNDERGROUND TARGETS IN GPR IMAGES BASED ON DEEP LEARNING
The flow chart of GPR image underground target detection method based on depth learning presented in this paper is shown in Figure 1.The three key steps of this method are as follows: 1. Constructing the sample data set of underground targets.It mainly completed the labeling of underground target samples and data compatibility through humancomputer interaction, and constructed underground targets standard data sets.
2. Using joint training mechanism to train convolutional neural network.Darknet-53 convolutional neural network was trained with ImageNet data set, COCO data set and PASCAL VOC data set, through which the set of pre-trained neuron parameters were obtained.

Construction of Underground Target Training Samples
The purpose of the initial phase is to obtain enough labelled data for training the CNN (Convolutional Neural Network).We used the SIR-30 vehicle-borne GPR system to collect data from a typical city area at a frequency of 400 Hz.And manually labelled targets in the data, including rainwater wells, cables, metal/nonmetal pipes, sparse/dense steel reinforcement.The albumentations library (Alexander B and Alex P, 2018) is used to augment the actual GPR data collected.Considering the resolution of the GPR image and the characteristic of the underground target, the GPR image data augmentation involves a combination of random cropping, small angle rotation, blurring, mirror flip, etc.

Underground Target Detection Network
Darknet-53 takes both network complexity and detection accuracy into account.

Training and Inference
The network training method in this paper followed the network training method proposed by YOLO V3.The anchor box obtained by K-means clustering was used to assist the prediction of the boundary box, and the logistic regression classifier was trained to predict the object score of each boundary box.Each bounding box can contain multiple predicted categories.The neural network directly convoluted the GPR image to form a feature map, and then predicted the location and probability of the underground target grid by grid.The core of the algorithm was to transform the underground target detection problem into a regression problem, and realized the end-to-end real-time detection.
After the GPR image was input, the feature map of N*N was formed by CNN convolution.Since YOLO V3 is a multi-layer detection, the values of N are 13, 26 and 52, respectively.If the centre of the target fall in a grid cell, the grid cell assumes responsibility for the inference of that target.In this method, three boundary boxes are predicted for each grid element.Same as the original frame, each boundary box output 12 prediction results, including x, y, w, h, confidence and seven conditional class probabilities.(x, y, w, h) is the absolute coordinate calculated from the original prediction results.Among them, x, y represent the value of the centre of the boundary box relative to the image boundary.And w, h are the length and width of the boundary box.
The formula for calculating confidence is as follows: If the centre of the ground truth does not fall in the cell, the confidence score is zero.Otherwise, the confidence score is equal to the Intersection over Union (IOU) between the prediction box and the ground truth.In addition, each grid predicts seven conditional class probabilities, expressed as (|), which are conditioned on the inclusion of objects in grid cell.The formula for calculating the confidence of a particular category of each box is as follows: In this paper, the sum-squared errors between output and actual vectors with both N*N*(2*5+7) dimension are used as loss function optimization parameters.In order to enhance the detection of small targets, λ  = (2 − ℎ  * ℎ ℎ ) was introduced to correct the coordinate errors, while   and   were set to 1, so that the model can converge in training.
The final loss function is: )) (3) The loss function is mainly divided into three parts: coordinate loss, confidence loss and classification loss. denotes the maximum IOU between the prediction box and the ground truth of a specific cell; Thresh is a pre-set threshold of 0.6;   0 is the confidence of predicted box; ℎ   and    represent the coordinates of the real box and the predicted box respectively; while ℎ  and    represent the classification of the real box and the predicted box respectively.Loss value is still calculated using SSE (The sum of squares due to error) in the same way as YOLO V1.
In YOLO V3 algorithm design, multi-label classification is used to enhance the ability of boundary box to contain multicategories.Usually, a target only falls in one grid element.However, under some circumstances targets can be predicted by multiple grid cells, such as the prediction of large targets or the target near the boundary of several grid.YOLO algorithm before YOLO V3 can only predict a boundary box for each target, so it is very likely to miss detection.Unlike YOLO9000 (Redmon J and Farhadi A, 2016), YOLO V3 abandons the softmax function, which has some limitations in improving the performance of network detection for multi-target.Instead, it applies a separate logical classifier to predict multi-label.In the training process, the binary cross-entropy loss is used to calculate the category loss.
The method in this paper uses Adam (Diederik P and Jimmy B, 2014) instead of the traditional Stochastic Gradient Descent (SGD) optimization algorithm.Similar to other optimization algorithms of deep learning, the weights of neural networks are updated iteratively after back propagation based on training data.
To be more precise, it is a method of learning rate self-adaption.
The Figure 2 shows the line chart of loss values varying with the number of epoch.
As can be seen from loss curve, the loss value does not change after about 85 epochs, and finally a stable model is obtained.The loss is stable at about 5. To sum up, underground target detection is to extract the feature map of the input GPR image, and then determine whether there are underground targets in it grid-bygrid.Similar to the forward propagation in the network training process, only one prediction is needed to obtain the boundary boxes of the possible targets, the target categories and their probabilities of each box, and then the prediction results are refined according to the pre-set thresholds.According to the actual situation, the IOU threshold is set to 0.3 and the confidence threshold is set to 0.5.Pulse answer that does not belong to any of the mentioned classes are mostly filtered because the confidence is below 0.5.

EXPERIMENTS AND ANALYSIS
In order to verify the effectiveness of this method, SIR-30 vehicle-based GPR system was used to test the GPR data of a goreturn route in Caitian Road, Shenzhen City.The parameters such as antenna centre frequency, acquisition length and scene category are shown in Table 1.Table 2 shows the number of samples interpreted by experts and the number of samples after expansion.For the purpose of ensuring the convergence of the model and not over-fitting, the number of data after expansion is an empirical value in a suitable range.

Detection Results of Underground Targets
The test data in this paper are the data of Shenzhen City obtained by SIR-30 vehicle-based GPR system at 400 MHZ frequency.Figure 3(a)-(e) are the recognition results of rainwater wells, sparse/dense steel mesh, bridges, metal/non-metal pipelines, cables in GPR waveform image data, in which bridges are represented with two parts: dense steel mesh and empty.The recognition results show that the deep learning method proposed in this paper can accurately detect and locate the type and position of underground targets according to GPR images.

Evaluation of Detection Results
In the classification tasks of underground target detection, three indicators, precision and recall, and F1 score usually measure the performance of the network model.F1 score is calculated to evaluate precision and recall comprehensively.The precision rate represents the proportion of the classified objects correctly identified to the total targets, which reflects the ability of the model to distinguish the background in the image.The recall rate represents the proportion of a certain category of objectives eventually classified into that category, which indicates retrieval ability of the model for the targets in GPR images.The higher precision indicates that the detection model can better distinguish the target from the background, whereas the higher the recall rate depicts the stronger the detection ability of the detection model for underground targets.F1 Score is a comprehensive expression of both sides.A high score can present a more robust classification model.The formulas for the three indicators are as follows: Among them, TP denotes the number of targets detected correctly, FP is the number of targets not detected, and FN represents the number of targets detected incorrectly.The experimental results show that all three evaluate indices, the average values of recall, precision and F1 Score of the deep learning method designed in this paper are over 85%.The comprehensive evaluation of the three indicators indicates that the method here is very effective for the detection of underground targets and defects in GPR images.

CONCLUSION
In order to solve the problem of efficient and accurate detection of underground targets in GPR images, this paper proposed and designed a method of detecting underground targets in GPR images based on deep learning.Through data compatibility, the problem of insufficient underground target samples in GPR image was solved.Darknet 53 network drawn lessons from ResNet idea to add residual blocks, which can avoid gradient dispersion or explosion even if the network level was very deep.What's more, a large number of open data sets were used to pretrain the network, ensuring the ability of the network to extract the features of underground targets after training.And the multibranch prediction in YOLO V3 framework also guaranteed the detection of small underground targets.
The actual scene test results show that this method can detect and classify underground targets in real time from vehicle GPR images.The detection speed can reach 16 frames/s on two GTX 1080 GPU platforms, and the average precision and recall rate can reach more than 85%.At present, the method in this paper only detects the category and location of the targets.Future research will combine the structural information of underground targets, including buried depth, area and aspect ratio, to construct feature space and random forest algorithm will be used to extract these features to assist in identifying target categories and improving the detection quality.

3.
Training and inference.Using transfer learning method, firstly, the first 50 layers of network parameters were frozen.Then the first 50 layers of parameter sets were unfrozen to join the training process and obtains the final model which can predict the location of multiple boundary boxes and multiple types in real time.Finally, the model after network training is loaded, inference is used to obtain the location and category information of underground targets in the test data set.

Figure 1 .
Figure 1.Target Detection from GRP imagery based on Convolutional Neural Network

Figure 2 .
Figure 2. The line chart of Loss value

( a )
Figure 3. Recognition results (Simonyan K, 2014)16(Simonyan K, 2014), it has less computation, lighter model and stronger feature extraction ability.It consists of consecutive 3*3 and 1*1 convolution kernel, which reduces the amount of network parameters in case of fully extracting image features.Drawing on ResNet idea, residual blocks are added to solve the problem of gradient dispersion and/or explosion caused by the deeper layers of the network.And the traditional deep neural network training is adjusted to stage-by-stage from layer-by-layer.In addition, (IOFFE S, 2015) (BN) layer(IOFFE S, 2015)and LeakyReLU layer (Clevert D and Unterthiner T, 2016) are joined after each convolution layer, which can simplify the calculation while accelerating convergence and also prevent over-fitting.

Table . 2
Statistical quantitative table of samples of different categories after expansion

Table 3 .
Table 3 shows the evaluation results of target detection in GPR images under a longdistance road in Caitian experimental area of Shenzhen.Evaluation of the GPR image data target detection result