DEEP LEARNING AND IMAGE PROCESSING FOR AUTOMATED CRACK DETECTION AND DEFECT MEASUREMENT IN UNDERGROUND STRUCTURES

This work presents the combination of Deep-Learning (DL) and image processing to produce an automated cracks recognition and defect measurement tool for civil structures. The authors focus on tunnel civil structures and survey and have developed an end to end tool for asset management of underground structures. In order to maintain the serviceability of tunnels, regular inspection is needed to assess their structural status. The traditional method of carrying out the survey is the visual inspection: simple, but slow and relatively expensive and the quality of the output depends on the ability and experience of the engineer as well as on the total workload (stress and tiredness may influence the ability to observe and record information). As a result of these issues, in the last decade there is the desire to automate the monitoring using new methods of inspection. The present paper has the goal of combining DL with traditional image processing to create a tool able to detect, locate and measure the structural defect.


INTRODUCTION 1.1 Tunnel inspection: traditional methods and new technologies
This paper presents a novel pipeline for meaningful visual defect detection.The authors focus on tunnel surveys as they pose several challenges (McKibbins, Elmer, & Roberts, January 2010.).In order to keep the serviceability of such tunnels, regular inspections are needed to assess their structural status.At the time of writing, the standard industry monitoring procedure consists of regular visual inspections carried out by an expert operator who has knowledge of the monitored structure and its material.This kind of inspection is simple, but slow and relatively expensive and the quality of the output depends on the ability and experience of the engineer as well as on the total workload (stress and tiredness may influence the ability in detecting information).As a result of these issues and progress in technology over the last decade, there has been the desire to automate the monitoring process using new methods of inspection; the main purpose is to reduce the workloads for the engineers and to obtain reliable outputs.The reasons are various and can be divided into several classes: a) Limitations of visual inspection  The reliance on visual inspections depends not only on the visibility of the defect but also on the subjectivity of observations;  To reduce the closure time of the infrastructure (particularly in the case of transportation tunnels), the investigations are often carried out during the night shift; because of that, another limit of the visual investigation may be the stress level of the operator.


Lack of continuity of inspectors or inspection methods can lead to reduction in the effectiveness of inspections and confidence in results.

* Corresponding author b) Safety reasons 
The underground structures represent in some cases, like sewer pipes, a confined space and operators need to be trained;  Several hazards may put the health of the operators at increased risk.c) Speed and economic reasons  Traditional visual inspection needs an engineer to go down the tunnel and record all the defects resulting in long shifts.The economic disadvantage is double: on one side the cost on site of the engineer (mainly during night shifts), on the other side the closure of the infrastructure.All these factors might influence the results.Recently many engineering companies have begun to use and develop sophisticated techniques to create high resolution 360º photographic datasets of their project sites for inspection purposes (McDonnell & Devriendt, 2017).However, the process is still applicable to a general scenario.The use of immersive photographic tunnelling surveys has the great advantage of speeding up the inspection process, resulting in cost savings and shorter shifts in the tunnel (with the further benefit of improved health and safety implications).These photographic surveys can replace the visual inspection: the immersive tours give the possibility to spend a minimum amount of hours on the site (the time needed to collect pictures, with an appropriate mount, is less than 1 min/metre) and to process and analyse the inspection from the office.To overcome other limitations, such as subjectivity of the observations and the dependency of the output on the stress level of the engineer, a new processing and analysis pipeline has been developed by the authors.The idea is to collect photos and extract a posteriori all the needed information with image processing and image analysis.In detail, the proposed pipeline outputs a meaningful defect detection providing the user with metrics, localisation and extents of the defect.The authors have chosen cracks as the defect of choice to be detected because in the underground structures this is one of the more prevalent defects which is possible to detect and the monitoring of the cracks can give important information about the structural behaviour of the structure.Table 1 gives a qualitative comparison between the traditional Visual Inspection (VI), the 360º Immersive View Inspection (360-IVI) and the novel pipeline developed by the authors (CrackDet).The comparison is expressed with values from 1 (non-functional) to 5 (fully functional).Can be seen that the repeatability of the new pipeline is as limited as 360-IVI.This restriction might be resolved creating a 3D model using photogrammetric technique and applying the crack detection to the textured mesh.Future research.In chapter 2 several state of the art methods for automated crack detection in underground structures or buried pipes are analysed.The costs in terms of computing time and accuracy are compared against CrackDet.In chapter 3 the pipeline proposed by the authors is analysed step by step and the results are summarised.Chapter 4. contains the final considerations and the proposals for future research.

Introduction
Several works were proposed for the autonomous crack detection in underground structures or buried pipes.Three main workflow are identified:  Image processing only;  Image classification only (image processing is limited to image tiling);  Combination of image processing and Machine learning.
2.1.1.Image processing based crack detection (Sinha & Iyer, 2005.)propose a high-level image processing based on an initial contrast enhancement to highlight the dark pixels, morphological transformations to clean the image from small connected objects, Laplacian or Gaussian as an object detector (considering that the crack intensity in the image has a Gaussian shape; so, it is possible to clear the image from all the connected objects with a different gradient of intensity); combination of morphological transformation for a final cleaning.They also evaluate the probability of detection (Pd) and falsealarm (Pfa).This probability depends on the value of the threshold.
(Dapeng Qi, Yun Liu, Qingyi Gu, & Fengxia Zheng, October 2014.)focus on the importance of making the whole process fast and real time.For that purpose, the set-up consists of several linear CCD cameras mounted on the front of a train and all the equipment needed for the image processing and object classification is installed within the car itself.In order to be able to make the process fast enough, the image processing has to be simple.Their proposed algorithm is depicted below (Figure 3).2).

Image classification based crack detection
(Young-Jin Cha, Wooram Choi, & Oral Büyüköztürk, 2017) this article proposes a vision-based method using a deep architecture of convolutional neural networks (CNNs) for detecting concrete cracks without calculating the defect features.As CNNs are capable of learning image features automatically, the proposed method works without the conjugation of IPTs for extracting features.The trained CNN is combined with a sliding window technique to scan any image size larger than 256 × 256 pixel resolutions.

Introduction
The proposed workflow combines the advantages of image processing, such as the possibility of extracting information from the segmented images, with the advantages of DL, such as the possibility to develop a classifier without providing any features.This last aspect is important because the vectors of features are highly dependent on lighting conditions and on the filtering algorithm adopted.The present paper has the goal of creating a more generic purpose tool.The pipeline proposed in the present paper is summarized in Figure 4.

Image acquisition
At the time of writing the full set-up for image acquisition is being established, with trials and testing under development.Considerations for the setup include: Final measurement in the metric system (not in pixels) and attention during the acquisition process of the following parameters:  Average distance from the wall: in the environment of underground structures it is related with the diameter of the structure itself and it is easy to keep it within a small range;  Focal length: in order to fully automate the process, it is preferable to set the focal length at the beginning and to keep it fix during the survey;  Horizontal sensor size and number of pixel; the ratio of these values give the dimension of a single pixel.Through determination of these parameters, it is possible to realise a triangulation and to obtain a metric measurement of the detected cracks.

Image processing and Crops extraction
Several algorithms were tested.(Sinha & Iyer, 2005.)gives the best results in terms of isolating the cracks but it is highly time consuming: it is based on several geodesic reconstructions and each of them has to iterate 18 times (a linear structuring element rotating in the range 0-180º with a step of 10º).Considering that, the authors decided to adopt an easy image processing algorithm based on:  Low level contrast enhancement to highlight dark pixels;  Median blur: this is highly effective against salt-andpepper noise in the images;  Adaptive Gaussian Threshold;  Edge detection;  Creation of the minimum area bounding box for each contour;  Cropping out of the bounding boxes.The low-level algorithm proposed is time effective.However, many misleading elements are segmented along with the true cracks.Therefore, there is a need for an image classifier.3.4 Image classification (Samuel, 1959)defined Machine Learning (ML) as the "field of study that gives computers the ability to learn without being explicitly programmed".At its most basic it is the practice of using algorithms to parse data, learn from it, and then decide or predict something in the world.So rather than hand-coding software routines with a specific set of instructions to accomplish a particular task, the machine is "trained" using large amounts of data that gives to the machine the ability to learn how to perform the task.(Wenyu, Zhenjiang, Dapeng, & Yun, 2014) developed their algorithm based on feature of vectors.The challenge in this case is that ML neurons need to be fed with engineering features and it might be complex to find the right combination of features to correctly classify the objects.That task becomes even harder if the aim is to realise a general-purpose tool.
To overcome the aforementioned problem, Deep Learning (DL), a special discipline in ML is required for classification purposes.
Neural Networks are inspired by our understanding of the biology of our brains -all those interconnections between the neurons.You might, for example, take an image, chop it up into a bunch of tiles that are inputted into the first layer of the neural network which trigger the layers of hidden units, and these in turn arrive at the output units.Each unit receives inputs from the previous units, and the inputs are multiplied by the weights of the connections they travel along.Every unit adds up all the inputs it receives in this way and (in the simplest type of network) if the sum is more than a certain threshold value, the unit "fires" and triggers the units it's connected to (those on its right).
In the present work the image processing is carried out in order to automatically crop out the object of interest from the whole images.The crops will feed the DL classification tool and they will be classified as Crack or No_Crack.
To train a network from scratch requires hundreds of thousands of labelled images.In order to deal with the availability of a smaller dataset (188 crack_objects and 188 no_crack_objects), pre-trained networks are adopted for the purpose.That process is called Transfer Learning.It is commonly used in deep learning applications.Fine-tuning a network with transfer learning is much faster and easier than constructing and training a new network.The advantage of transfer learning is that the pre-trained network has already learned a rich set of features.These features can be applied to a wide range of other similar tasks.To use that networks for the purpose of crack detection, only the last layers (the fully connected ones) need to be changed introducing the desired labels.Two architectures were used and compared by the authors: AlexNet and GoogleNet in order to compare their performances in terms of precision and accuracy as well as in terms of time consumed for training and testing.
According to (Culurciello, Canziani, & Paszke, 2017) AlexNet, the oldest architecture should achieve the lowest top-1 one-crop accuracy versus amount of operations required for a single forward pass whilst GoogleNet should provide a good compromise between these parameters (Figure 6).Upon comparison of the two architectures, with respect to the Authors' crack detection purposes, some unexpected results arose:  With a such small dataset, the training time is around 10 seconds using a GPU and around 30 minutes using only the CPU for both architectures. The results in terms of precision and accuracy look to be in conflict with the comparison made by (Culurciello, Canziani, & Paszke, 2017): on a test dataset with 244 candidates of which only 27 cracks (the unbalanced test set was chosen to create conditions similar to the real environment: from a real survey the crack elements are certainly less than no_crack elements) AlexNet achieved the higher accuracy and precision (Table 2).The results are promising, considering the small and random dataset, with an average Accuracy (proportion of correctly classified instances) over 98%, a Precision (proportion of positives that are classified correctly) over 92% and a Recall (true positive rate) of 94%.

Outputs
After the classification, it is possible to ground truth the cracks with respect to each image (Figure 7).In addition to this, collect information about the location, the orientation and the dimensions in the object space (Table 3) can be collected.Such information is obtained as follows:  Location: position of the centroid of the crack element in the image space;  Orientation: assuming that the aspect ratio of the crack elements (length/height of the minimum area bounding box) is quite high, the orientation has being considered as the inclination of the maximum dimension of the bounding box itself (0º is horizontal);  Dimensions: from image processing it is possible to extract Area and Perimeter of the connected objects in the image space.The assumption to obtain the length and the width of the crack is to consider the structural defect as a regular polygon.With this assumption, it can be said that the width has no influence, compared to the length, in the value of the perimeter.Therefore the length of the crack can be evaluated as half of the perimeter (Eq.1).Knowing the length dimension, the width can be approximately evaluated as the area divided by the length (Eq. 1) To translate these values from the image space to the metric system, the following triangulation has being considered (Eq.2): At the time of writing, the precision and accuracy of these types of readings is solely dependent on the capability and observation of the user in taking notes of the parameters presented in 3.2.

CONCLUSIONS AND FUTURE RESEARCH
The pipeline is simple, and also fast and robust.It shows a good accuracy in the total length of the detected and correctly classified defects.Of note, the dataset is not homogenous, and it shows that the proposed tool can be considered a general purpose crack detector, provided that a bigger dataset will be given to train the DL algorithm.The robustness of the proposed pipeline might be improved adding the depth channel to the processed images.To do this, without losing the time efficiency given by the single shoots, an ad hoc camera network has to be developed.Lastly, in order to make the overall process faster and faster, the image classification has to be replaced by the image semantic segmentation.This process will make the process fully automated and independent of the human requirement to calibrate the image processing parameters.The process will enable segmentation and measurement of any other kind of defect within the underground structures, without the need for different scripts each time.Based on a trained FCN, the new process will detect the elements included in the training dataset and it will perform the segmentation of such elements.

Figure 1 :
Figure 1: Matching procedure for detection of true and false pixels; (a) original image, (b) detected cracks, (c) ground truth, (d) and (e) detected and true cracks dilated by a 5X5 structuring element, (f) good points of the filter, (g) false +ve, (h) truly detected cracks and (i) missed cracks (false -ve).(Sunil K. & Paul W., 2006) propose the two step algorithm: The first step is local and uses statistical properties to extract crack features from the segmented image, which are treated as crack segment candidates.In the second step, global cleaning and linking operations merge segments to form cracks.

Figure 2 :
Figure 2: Illustration of the procedure of matching the images for detection of true and false pixels.

Figure 7 :
Figure 7: After image processing (left), Crack Ground Truth after image classification (right)

Table 1 :
comparison between the inspection methods

Table 2 :
comparison of different Algorithms/Networks in terms of accuracy of the prediction.

Table 3 :
Example of output