TACK PROJECT: TUNNEL AND BRIDGE AUTOMATIC CRACK MONITORING USING DEEP LEARNING AND PHOTOGRAMMETRY

: Civil infrastructures, such as tunnels and bridges, are directly related to the overall economic and demographic growth of countries. The aging of these infrastructures increases the probability of catastrophic failures that results in loss of lives and high repair costs; all over the world, these factors drive the need for advanced infrastructure monitoring systems. For these reasons, in the last years, different types of devices and innovative infrastructure monitoring techniques have been investigated to automate the process and overcome the main limitation of standard visual inspections that are used nowadays. This paper presents some preliminary ﬁndings of an ongoing research project, named TACK, that combines advanced deep learning techniques and innovative photogrammetric algorithms to develop a monitoring system. Speciﬁcally, the project focuses on the development of an automatic procedure for crack detection and measurement using images of tunnels and bridges acquired with a mobile mapping system. In this paper, some preliminary results are shown to investigate the potential of a deep learning algorithm in detecting cracks occurred in concrete material. The model is a CNN (Convolutional Neural Network) based on the U-Net architecture; in this study, we tested the transferability of the model that has been trained on a small available labeled dataset and tested on a large set of images acquired using a customized mobile mapping system. The results have shown that it is possible to effectively detect cracks in unseen imagery and that the primary source of errors is the false positive detection of crack-like objects (i.e., contact wired, cables and tile borders).


INTRODUCTION
Civil infrastructures, including tunnels, bridges, roads and dams, are becoming older and older all over the world. For this reason, nowadays, detailed analyses and monitoring systems are required to determine the health and safety level of these kinds of infrastructures. Cracks are early indicators of damage; crack detection and measurement represent, indeed, key parameters in evaluating the safety and durability of structural components (Koch et al., 2015). Cracks should, therefore, be detected as soon as possible and monitored over time to assess the condition of the infrastructure and to identify the necessary countermeasures to be taken.
Nowadays, large infrastructures are routinely visually inspected by trained workers to detect and measure cracks. However, this type of monitoring is time-consuming, labor-intensive and prone to human errors. Furthermore, since the infrastructures must be closed during monitoring, inspections are normally carried out during the nights in a limited time interval to minimize the impact of tunnel or bridge downtime. These conditions, in combination with the length of the infrastructure system, make it often impossible to inspect every meter of bridges, tunnels and roads, increasing the risk that cracks are not detected. Moreover, the lack of light, fresh air and traffic noise make these procedures unhealthy for the inspectors. To overcome these drawbacks and preserve the safety of operators, in recent years, inspections have also been carried out by using a semi-automatic method where a mobile mapping equipment (usually mounted on a vehicle) is employed to capture the scene and to reconstruct the 3D model of the infrastructure using a set * Corresponding author of geomatics sensors (i.e., visible and infrared cameras, laser scanning, IMU). In particular, this digital representation, or the so-called "digital twin", of the infrastructure is subsequently analysed manually by visual inspection for finding cracks and mark their extent. However, due to the large amount of collected data, this approach is still time-consuming, inefficient and affected by errors.
To overcome the limitations of standard visual inspections, different approaches have been widely applied. First of all, the use of standard measurement instruments for deformation and crack measurement such as strain gauges and Linear Variable Differential Transducers (LVDT) has been investigated. However, when dealing with large infrastructures, the use of conventional devices is limited since they can only provide local information and they need to be attached to the surface to monitor. To overcome the limitations of pointwise sensors, Structural Health Monitoring (SHM) systems have been investigated by the scientific community to perform structural damage detection and integrity assessment using multiple sensors, such as accelerometers, fiber optic sensors, interferometric radar systems and camera-based sensors (Feng, Feng, 2018) (Brownjohn, 2007. Among camera-based techniques, the non-contact Digital Image Correlation (DIC) method has been widely applied (Küntz et al., 2011), (Mathieu et al., 2012) (Belloni et al., 2019). DIC can reconstruct the displacement and deformation of an object by comparing the position of corresponding pixels in different images acquired over time. The technique is easy to adopt and it can provide displacement and deformation fields without direct contact with the surface to monitor. However, it requires a permanent setup (i.e. a fixed camera mounted on a tripod) which is, of course, not suitable for long-term mon-itoring and difficult to ensure due to potential vibrations, wind or ground instability. In general, even if SHM approaches can provide more complete measurements than standard methods, they can be complex to adopt since they require a large number of sensors to install and the integration of the data coming from distributed sources.
Nowadays, image-based techniques seem to represent the most powerful alternative in the field of infrastructure monitoring. For this reason, in the last two decades, a significant amount of studies have been conducted to understand the potentialities of these methods in detecting and measuring cracks. Among the developed methods for crack detection, edge-detection algorithms (Abdel-Qader et al., 2003), mathematical morphology (Sinha, Fieguth, 2006), high-speed percolation (Yamaguchi, Hashimoto, 2010), Principal Component Analysis (PCA) (Abdel-Qader et al., 2006), Extreme Learning Machine (Zhang et al., 2014) and Support Vector Machine (SVM) (Nashat et al., 2014) have been adopted. Furthermore, to improve the performances of image-based techniques and develop a method able to cover unexpected real-world situations, deep learning approaches (CNN -Convolutional Neural Network) have been also recently investigated , (Cha et al., 2017) (Gopalakrishnan et al., 2017), (Zou et al., 2018), (Liu et al., 2019), (Ren et al., 2020). Indeed, CNNs represent very powerful techniques for automatic feature extraction and classification problem and they have received considerable attention in the field of infrastructure monitoring thanks also to the spread of drones and other mobile mapping systems which can acquire a large amount of data. Starting from a set of labeled images necessary to train the network, CNNs can be adopted to build a classifier for automatically detecting cracks in the new images. Furthermore, these techniques can easily handle a large amount of collected data. In the last years, three different deep learning approaches have been adopted to perform crack detection: object detection , image classification (Cha et al., 2017) and semantic segmentation (Ren et al., 2020). The first one aims only at detecting the location of the cracks using bounding boxes without providing additional information such as the width and the shape of the cracks. The second approach performs image classification using sliding windows to scan the images. This method can provide more detailed information compared to the previous one but the accuracy of the detection highly depends on the area division. The last approach provides pixel-level classification by detecting all the pixels which belong to the crack. It represents, therefore, the most accurate method for this specific task (Liu et al., 2019).

TACK PROJECT
TACK (TACK, 2020) is an ongoing research project carried out by KTH -Royal Institute of Technology, Sapienza University of Rome and WSP Sweden company under the IN-FRASWEDEN2030 program funded by VINNNOVA. The project aims at the development of a methodology for the automatic detection and measurement of cracks on a tunnel lining or other infrastructures combining advanced deep learning approaches and innovative photogrammetric algorithms. The main idea is to apply deep-learning algorithms (CNNs) to detect cracks in data collected using a mobile mapping system, as accurately as visual detection. In the project, the system is provided by the company WSP Sweden and consists of six LiDAR scanners which can produce a point cloud with an average density of 5000 points per square meter and two panoramic cameras used to capture the complete view of the infrastructure (with a particular focus on tunnels). Furthermore, nine high-resolution IR cameras combined with LED light and IR flashes are adopted to obtain photos of the roof and walls to be able to accurately assess the condition of the infrastructure. Specifically, overview photos are used to determine the frequency of cracks in specific areas and detailed images (pixel resolution < 1mm) are used to measure the crack width and understand the cause of cracking. Then, once the cracks are detected, an innovative and recently developed photogrammetric algorithm is applied to the raw images of each detected crack to estimate the geometric characteristics (i.e. crack length and crack width) and the deformation of cracks over time. The algorithm, developed thanks to the collaboration between KTH -Royal Institute of Technology and Sapienza University of Rome, can compute the deformation of an object using a time series of images (as the standard DIC methods) captured with a moving camera, overcoming the main limitation of using a fixed setup to measure the deformation pattern (Sjölander, 2019). Finally, to assess the associated risk with different types of cracks, numerical simulations based on the finite element method can be used. Specifically, the acquired data from monitoring can be adopted to model existing cracks. Then, non-linear material models that are able to describe the behaviour after failure initiation are used to simulate the structural behaviour for different load cases. As regarding tunnels built using shotcrete, the condition of the shotcrete can be used as input for numerical simulations using the finite element method in which the behavior of the rock support can be simulated for different load cases to assess its structural response and risk of failure. Furthermore, if possible, data from the construction such as the thickness of the shotcrete and quality of the rock can be used to get a deeper understanding of where and why cracks form. An overview of the method workflow is reported in the Figure 1. The proposed technique can automatically detect and measure cracks from the imagery acquired using a customized mobile mapping system which leads to highly efficient monitoring that can increase the overall safety of infrastructures. With a complete digital model of the infrastructure, all the information is stored, and it can be easily accessed for further deep investigations; the detected critical areas can be also double-checked using other methods. Furthermore, the developed procedure can be applied using infrastructure imagery acquired in different epochs to monitor the evolution of the cracks over time. The method can thus enable continuous and automatic monitoring of infrastructures, increasing the efficiency of the monitoring process and decreasing the risk that cracks are not found. Finally, the detailed mapping of cracks and the possibility to measure their geometric characteristics give a highly efficient basis to assess the need for maintenance of infrastructures. This can improve knowledge regarding tunnel or bridge conditions and facilitate maintenance planning which will reduce infrastructure downtime and costs of monitoring and maintenance. Finally, with an automatic procedure to collect and process the data, the inspection and monitoring of infrastructures can instead take place in an office during normal working hours, increasing the safety of inspectors.

DATASETS
Two different datasets were used in this study. The first one, provided by Ren et al. (Ren et al., 2020), includes a total of 409 RGB images of cracks (4032×3016 pixels) acquired in a tunnel under different light conditions and then cropped into 919 small images (512 × 512 pixels). The small images are divided into a training set and a test set at a ratio of 4 : 1. Specifically, 735 images are adopted for the training and 184 images to test the network. For this specific dataset, crack annotations of all the images are available in binary format. The second dataset includes images and LiDAR data of three tunnels acquired in the area of Stockholm using a mobile mapping system developed by WSP. Among them, one metro tunnel is particularly affected by cracking and it represents, therefore, a very interesting case for investigating the reliability of the proposed approach. The metro tunnel dataset is composed of more than 34000 images (2448 × 2048 pixels) acquired using nine high-resolution IR cameras and more than 2000 overview images (8000 × 4000 pixels) captured with the two panoramic cameras. For each image, the mobile mapping system collects position and attitude of the cameras to know exactly where the images were taken along the infrastructure. Furthermore, LiDAR data are available to reconstruct the 3D model of the tunnel and to produce a digital twin of the infrastructure.

PRELIMINARY RESULTS ON CRACK SEGMENTATION
To investigate the potentialities of CNN approaches, a U-Net (Ronneberger et al., 2015) based semantic segmentation architecture was considered for a first test. The idea of this preliminary study is to investigate the generalization capability of a U-Net based segmentation model trained using a small labeled dataset and tested on a large set of unseen imagery depicting cracks in concrete materials. Specifically, the deep fully CNN, named CrackSegNet (Ren et al., 2020), was adopted to perform a pixel-level classification of cracks. CrackSegNet is an end-to-end crack detection architecture that combines backbone network, dilated convolution, spatial pyramid pooling and skip connection modules to detect cracks. In this study, the model proposed by Ren et al. was trained using 80% of the imagery available in the small labeled dataset. Specifically, the model was trained for 30 epochs using a combination of dice and focal loss functions and the Adam optimizer. During testing, a fixed threshold of 0.5 was set to obtain binarized segmentation images from the computed CrackSegNet probability maps. The remaining 20% of the imagery was used as the test set to assess the model performances. The standard binary classification metrics were computed according to the following formulas: Accuracy = T P + T N T P + F P + F N + T N Intersection over U nion (IoU ) = T P T P + F P + F N P recision = T P T P + F P where: • T P is the pixel number of true positives; • T N is the pixel number of true negatives; • F P is the pixel number of false positives; • F N is the pixel number of false negatives; • P is the Precision; • R is the Recall.
The trained model achieved IoU of 51%, Precision of 67%, Recall of 71%, F1 of 68% and Accuracy of 99% on the test set.
The obtained results are shown in Figure 2. The transferability of the model provided by Ren et al. was investigated in this study; the trained model was used to detect cracks using the unseen tunnel imagery acquired with the WSP mobile mapping system. Some examples of the images and the detected cracks using the CrackSegNet model are shown in Figure 3.
The results, although preliminary, demonstrate the capability to detect cracks in unseen images using a U-Net based model trained with a completely different dataset, highlighting the model good transferability. However, it is worth noticing that, even if real cracks are correctly detected, other objects with a similar shape can be erroneously classified as cracks compromising the overall reliability of the model. It is, therefore, necessary to investigate the problem related to similarly-looking objects such as tile borders, signaling cables, contact wires and joints that can be normally found in images acquired in tunnels.

CONCLUSIONS AND FUTURE PROSPECTS
TACK is an ongoing research project carried out by KTH -Royal Institute of Technology, Sapienza University of Rome and WSP Sweden company. The aim of this project is to investigate and develop a new methodology for the automatic detection of cracks using an integrated approach based on deep learning and photogrammetry. Specifically, cracks are automatically detected and measured from the imagery acquired using customized mobile mapping systems, leading to higher efficiency and accuracy of the overall monitoring process and providing detailed information regarding the conditions of the whole infrastructure under analysis. Preliminary results demonstrate the potentialities of deep learning algorithms to detect cracks in imagery acquired in a tunnel using a mobile mapping system. However, the tested architecture can erroneously segment pixels belonging to cables, wires and tile borders as cracks. For this reason, other kinds of architectures will be investigated in order to increase segmentation performances.