BUILDING DAMAGE ASSESSMENT WITH DEEP LEARNING

Global warming modifies the climate balance. Warming parameters are observed by many Earth Observation satellite systems, and the huge amount of data modifies the way to process them. This paper presents a few studies relative to damage detection on buildings, occurred during natural disasters. Recent advances in deep learning techniques are used for the building detection such as EfficientNet networks. Additional networks as Siamese models are used to evaluate the damage level with preand post-event images. Different techniques to merge detection masks are described and compared to a multiclass segmentation network. Results are presented and performances of the different solutions are compared.


INTRODUCTION
This paper presents a few studies relative to damage detection on buildings, occurred during natural disasters. Recent advances in deep learning techniques are used for the building detection, and additional networks are used to evaluate the damage level with before / after event images. Several neural networks architectures have been tested, and their evaluation is presented. Such techniques lead to interesting results for building damage assessment.

Natural disasters problem
Global warming modifies the climate balance. Warming parameters are observed by many Earth Observation satellite systems. Warming consequences are also monitored: among them, everyone can note the increase of natural disasters number. The population on Earth has also increased in the last century, with a major urbanization. As a consequence, the number of vulnerable areas has increased. In order to answer to the rising number of natural disaster events, the International Charter "Space and Major Disasters" (Charter) has been created. Many space agencies provide an important source of satellite imagery to respond to major natural and man-made disasters worldwide. The monitoring of major disasters and their impacts has become an issue.

Data volume problem
The number of events increased. The number of Earth Observation satellite (in particular VHR images) also led to an augmentation of the amount of data to be downloaded and processed. Building damage assessment is important to organize rescue operations. For now, damaged buildings are analyzed by photointerpreters, which remains complex and time-consuming. For this reason, automatic solutions to detect and classify damaged buildings are welcomed. Recent advances in object recognition with deep learning algorithms offer new opportunities to do so. * Corresponding author 2.3 How to define a damage ?
In order to offer consistent multi-scale building damage assessment, the first step is to define a proper standard nomenclature for damages. Several studies such as HAZUS, FEMA's Damage Assessment Operations Manual, the Kelman scale, and the EMS-98 propose a common ground on which it is possible to build a damage scale of four damage classes: no-damage, minor damage, major damage and destroyed. This nomenclature has been used to label the xView2 database.

Building segmentation
Building detection may be solved by semantic segmentation. Several architectures of convolutional networks may be used. (Maggiori et al., 2017) compares several architectures and shows that using Unet architecture with both encoder, decoder and skip-connections is the best way to obtain precise semantic segmentation.
Recent works based on this approach, try to refine detection results, focusing on building outlines. Some approaches as (Marmanis et al., 2017) combine images and digital surface models (DSM), and extract first building outlines and then introduces it into the segmentation network.  uses multi objects loss function, evaluating both segmentation and building outline maps, provided by two distinct networks. (Bischke et al., 2019) proposes an approach using SegNet with a VGG16 encoder where the masks are replaced by distance mask generated by computing for each pixel the distance to the outline. An interesting idea is also to add constraints on intermediate decoder layouts to make them look like the final segmentation at lower resolution, which permits to optimize multi scales descriptors. A similar multi-scale approach is also used in (Ji et al., 2018), and this work introduces Siamese networks working at their own different resolutions. Enhancement of building segmentation is possible with modifications of skip-connections. (Li et al., 2018) proposes to add a compression step into skip-connections in order to reduce network weights. And (Yang et al., 2018) proposes to add an attention module into skip-connections to focus network attention on the most informative parts of the image and to add bottleneck layout into the decoder.
Several neural networks architectures have been selected in this study, and trained for building segmentation purpose.

Damage classification
Damage evaluation may be considered as a classification task. The first idea is to use directly outputs of two building segmentation (images masks) and to define remaining, new or destroyed buildings between two satellite acquisitions. Other approaches are possible. There are a few references on damage classification, in particular on optical images. (Fujita et al., 2017) defines a CNN (VGG or Alexnet) that classifies building patches as destroyed or not, using several architectures as a unique network or Siamese networks, and integrating both pre and post images. (Ji et al., 2018b) proposes the same destroyed/not destroyed classification with SqueezeNet architecture and (Cao et al., 2018) with VGG16 architecture. (Gupta, 2019) introduces multi-classes damage classification, using first a Unet type network to identify buildings (segmentation) and then an architecture combining both a Resnet50 and a shallow network in parallel to classify damages.

PROPOSED METHOD
Our method is based on deep learning algorithms. Several satellite images with ground truth are used for the training. Two datasets are considered.

Used databases
3.1.1 xView2 dataset: The xView2 dataset is available freely since the end of 2019. This dataset, described in (Gupta, 2019), offers a wide variety of situations with 6 types of disasters : earthquake, flood, hurricane, fire, tsunami, volcano. These natural events occured in eleven countries. It includes both pre-and post-disaster Very High Resolution images (Ground Sampling Distance between 0.3 and 3 meters) for each disaster. Most importantly, this dataset provides a realistic situation with images taken from various satellites (GeoEye1, WorldView 2 and 3), sometimes old pre-image (maximum gap of 3 years) and a wide range of nadir angles (from 5 deg to 36 deg). This dataset appears perfectly suited to test the processing chains considered in this work. Classification ground truth contains 4 labels : no damage, minor damage, major damage, destroyed buildings.   Ground truth associated to these images is a vector file, retrieved from Copernicus EMS R&R 050, containing manual labelling on building prints and damages classification between 3 classes: no damage, damaged, destroyed. Distinction between minor and major damages is not available, but this dataset is a great application case for our study.  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France

Unet:
The semantic segmentation is done with a Unet architecture (Ronneberger, 2015), which is based on a multiresolution segmentation network, with an encoder part and a decoder part. The Unet network has proven its strength in a high number of image processing configurations with deep learning. The encoder extracts features (convolutional layers) while reducing the spatial resolution. The decoder sequentially increases the resolution and, merges information from the previous layer and from the skip-connection coming from the encoder layer at the same resolution. Several Unet architecture with different encoders have been compared. firstly a 1x1 convolution allowing features space expansion, secondly an activation such as ReLu, thirdly a depthwise convolution (one convolution per feature), then another ReLu activation, and finally a 1x1 convolution in order to reduce features number. In addition, a skip connection can link input and output blocks (same size), to reduce loss due to ReLu activation.

SE-ResNet:
Among the interesting solutions, we consider the SE-ResNet network. (He, 2016) introduces ResNet architecture which aims at modeling convolution filter nearby identity in order to transfer information between layers without having small loss gradient during learning phase. To do so, it uses skip-connections allowing information to bypass convolutions. (Hu, 2019) introduces Squeeze-Excite blocks which aims to balance weights between channels, and to increase receptive field size. (Xie, 2017) improves ResNet network, introducing Squeeze-Excite blocks, to define SE-ResNet architecture. We can notice that Squeeze-Excite blocks is likely an average on the features which could lead to smooth effects. As it introduces a dependence to a large part of the image, if this model is used to infer the image with a tile processing, depending on the margins, discontinuities near tiles borders are visible.
3.2.4 EfficientNet:  proposes to integrate mobile inverted bottleneck, Squeeze-Excite and depthwise convolution into an architecture as light as possible. It also presents a method allowing to adapt the architecture hyperparameters in order to point the best compromise between performance and weight. This adaptation consists in modifying both resolution, depth and features numbers. This high performance network is considered for this study.

Damages classification with deep learning
For this second way, damages are classified with the neural network. Three approaches have been compared : direct use of the building segmentation associated with an image classifier, Siamese network to achieve segmentation with 5 classes (4 damages classes + background) and several binary networks that generates 4 semantic segmentation, and are then merged afterwards to build the 5 classes map.

Decision based on building semantic segmentation:
After the building semantic segmentation step, building mask are computed. For each building detected, tiles centered on this object are extracted from pre-and post-event images. Then, a classifier is used to decide the type of damages. This approach presents a major drawback : we need a perfect building segmentation which distinguish buildings individually. UNet-EfficientNet architectures, which got the best performance for building segmentation, have been chosen. Their performance is good enough for building detection, but not sufficiently precise to separate each building individually. So the ground truth of building is used to extract separate tiles. Then the classification network is trained.
3.3.2 Siamese segmentation network: It consists in implementing two parallel encoders : one to process to pre-disaster image, and one to process post-disaster image. These two encoders are identical and share the same weights. The two features outputs are then merged (by concatenation) and used as input in the decoder part. All parallel outputs of the skipconnections are also concatenated and used as input of the decoder. Figure 5. Architecture of Siamese network 3.3.3 Binary siamese segmentation networks: Is it more efficient to train one multiclass network or to train several binary networks ? In order to answer this question, several binary Siamese networks "all vs one" (one per class) have been implemented with EfficientNet-B0 as encoder. Then, a post treatment retrieves all results and defines the final multi-classes result, as defined in the following schema: Neural networks are trained following classical backpropagation procedure. A Stochastic Gradient Descent is used. Tiles with 224x224 pixels are sent to the network. Batch size is equal to 32. Each class is represented into the batch with an approximate ratio: 1 tile with background and no building, 2 tiles with no-damage, 9 with minor damages, 9 with major damages, and 9 tiles with destroyed buildings. The global loss includes a balanced cross-entropy and dice loss.

RESULTS
In this part, results are given for the building semantic segmentation and damages detection.

Building segmentation
This study has been focused on U-Net type architecture for building segmentation part. Evaluations have been run on the xView2 test database. As usual, test database has not been used during learning phase. But the test database is related to the same disasters and it is extracted from the same original dataset as the learning database. Among the different architectures, Unet-EfficientNet-B5 rises the best performance, with F1-score up to 83%. Unet-EfficientNet networks performance is confirmed on this use case. We can notice that Unet-EfficientNet-B0 got better perfomance than Unet-SE-Resnet-101 while it got only 8.6M parameters.

Model Parameters Precision Recall
The largest studied network EfficientNet-B5 gives the best results, but the computation time for the training is larger, higher occupied memory into the Graphical Processing Unit. Also the inference time and computation cost is larger with such network compared to EfficientNet-B0 and MobileNet-V2.

Damages classification
The results presented here are computed for all pixels of the images, whereas in the xView2 challenge, results are computed at pixel level on buildings area only. Despite its lower number of parameters (5.3M related to 9.2M), the EfficientNet-B0 network gives better results than the EfficientNet-B2. In order to evaluate the performance of the classification network, evaluation is done considering a perfect building detection : consequently, the ground truth of building is used to create each tiles. The buildings ground truth contains well separated buildings map contrary to the output map of the MobileNet or EfficientNet semantic segmentation with fused buildings. Global performance results should also take into account previous building segmentation model errors.

Validation
Building detection validation tests on Haiti data have been realized with the 5-class segmentation model UNet-EfficientNet-B5. Figure 8, figure 9 and figure 10 present an example. The output of building detection has a good quality and may help image analyst to achieve rapid mapping operations.
The evaluation of building damages is also interesting. Some area with major damages in red contrasts with the other building are. It gives a preliminary idea of where the hurricane created more damages, with an automatic processing chain.

CONCLUSION
In this work, several architectures of neural networks have been trained to evaluate damages between pre-disaster and postdisaster images. The class "minor damage" appears to be the more complicated to learn. So an approach with specialized networks is a potential solution to alleviate this problem. Some improvements have been studied, as merging the results of damage