FLOOD DETECTION IN NORWAY BASED ON SENTINEL-1 SAR IMAGERY

After large flood incidents in Norway, The Norwegian Water Resources and Energy Directorate (NVE), has the responsibility for documenting the flooded areas. This has so far mainly been performed by utilising aerial images and visual interpretation. Satellite images are a valuable source of additional information as they are able to cover vast areas in each satellite pass. In this paper a fully automated system for detecting and delineating floods with the use of Synthetic Aperture Radar (SAR) images from the Sentinel-1 satellites is presented. In SAR images wet areas and water bodies usually show lower backscatter than dry areas. The flood detection system is thus based on comparing a reference image acquired before the flood with the flood event image. A Sentinel-1 training dataset has been obtained and manually annotated by NVE from three flood events in Norway. This training set has been used to train a random forest (RF) classifier, which outputs a score for each pixel in the SAR image. This score image is thresholded in order to obtain a crude flood detection. Unfortunately, changes in the backscatter may also be triggered by other events such as melting snow and harvested fields of crops. To mitigate such “lookalikes”, several techniques have been implemented and tested. This includes masking based on size, slope and “height above nearest drainage” (HAND). The experiments presented show that the system performance is very good. Of the 179 manually labelled flood objects, 168 are detected. The system is being applied operationally at NVE.


INTRODUCTION
Every year thousands of lives are affected and billions of dollars are lost in disasters caused by flood events around the world. Accurate monitoring systems have the potential greatly reducing causalities and economic losses by providing up-to-date information for disaster managers and the population at large.
Satellite Earth observation (EO) missions currently offer a unique capability to observe the Earths surface in a spatially distributed and temporally repetitive fashion at global, regional and local spatial scales. Europe and Norway in particular, have invested substantially in a new European Earth observation program, Copernicus, wherein emergency management is a critical component. This service is based on timely and accurate geospatial information derived from satellite remote sensing. The major benefits of including analysis of satellite data in emergency management is the huge spatial coverage of satellite data, providing better overview of an ongoing emergency situation, and assessment of the potential damages. The information provided by satellite data may be utilised for e.g. resource allocation in the early phases of clean-up work.
Manual inspection of satellite data for detection and delineation of flooded areas is time-consuming, and during flood events, such resources are highly occupied. Hence, in order for a satellite-based flood monitoring system to be useful, automatic analysis of satellite data is a prerequisite. However, the development of reliable flood detection and mapping systems is inherently challenging, and is an active research area [Serpico et al., 2012, Martinis et al., 2009, Giustarini et al., 2013, Twele et al., 2016, Martinis et al., 2015, D'Addabbo et al., 2016, Refice et al., 2014.
Remote sensing technology has played an important role in flood monitoring in recent years. Synthetic aperture radar (SAR) based systems provide all-weather capability as compared to the optical satellite sensors. Detection of floods in SAR images usually relies on the hypothesis that water bodies have very low backscatter intensity values (σ0) compared to dry land, in particular if the wind level is not too high and the surface of the flooded water is calm. The underlying principle for flood detection is a pixel-wise comparison of the backscatter intensities of two SAR images an event image (the one with floods) and a reference image. Often such comparison is realized by computing the difference between the event and the reference images.
Change-detection techniques have been widely applied in the literature to solve the problem of flooded-areas extraction from SAR images [Giustarini et al., 2013, Pirrone et al., 2016, Brisco et al., 2013, Long et al., 2014. While [Martinis et al., 2009] only considered a single SAR flood image to extract pixels corresponding to open water via image thresholding and a region-growing algorithm, Giustarini et al (2013) added change-detection information with respect to a non-flood reference image to improve the algorithms performance. Traditional change-detection methods, such as log-ratio, were applied to identify changed regions from flood images. In Pirrone et al. (2016), multi-temporal SAR data were employed to compute a polarimetric log-ratio used to highlight changes in terms of magnitude and direction. This information was thereafter analysed in order to separate non-changed and changed samples. An extension of the curvelet-based change-detection approach to polarimetric SAR data for monitoring flooded vegetation, was proposed by Brisco et al. (2013). First, the Freeman-Durden decomposition was used to classify the SAR backscatter into double bounce, surface scattering and volume scattering. Then, a change-detection algorithm was applied to all three channels separately based on the hypothesis that the presence of water due to flooding is reflected by an equal increase in all three channels, whereas the change of a special scattering event only appears in the dedicated scattering mechanism intensity. Long et al. (2014) applied a three-step procedure: calculation of difference image, thresholding and segmentation in order to predict flooded areas in Namibia from ENVISAT/ASAR and Radarsat-2 data.
Interferometric SAR (InSAR) has also been explored for flood detection [D'Addabbo et al., 2016,Refice et al., 2014. Refice et al. (2014) showed that the use of InSAR coherence information may help in recognizing flooded areas exhibiting little change in the backscatter intensity. Typically, such areas are associated with the presence of vegetation. D 'Addabbo, et al (2016) also demonstrate the usefulness of InSAR coherence information to complement SAR intensity data, in particular for analysis of areas showing an increased value of backscattering signal during the flood.
A major drawback with many of the techniques based on change detection or multi-temporal SAR or InSAR data is that it is challenging precisely to determine the nature of the change in SAR data, and to decide whether it is due to the disaster impact or originates from other events.
The use of ancillary data for improving the flood detection performance has been investigated in several research studies [D'Addabbo et al., 2016,Twele et al., 2016. Bayesian networks have also been considered for integrating information from SAR and InSAR time series, and ancillary data [D'Addabbo et al., 2016]. Their approach showed that a Bayesian networks data fusion approach allowed to both mitigate false alarms, and to correctly identify flooded areas in events characterized by complex land-cover conditions and temporal evolution. Twele et al. (2016) investigated if and how the processing chain of the X-band-based TerraSAR-X flood service [Martinis et al., 2015] can be adapted to C-band Sentinel-1 data. In order to improve the robustness to challenging environmental conditions (e.g. prevalence of water lookalikes, such as wet snow or radar shadow in mountainous areas), the flood processor was improved through the integration of an exclusion mask derived from the Height above nearest drainage (HAND) index [Rennó et al., 2008, Nobre et al., 2011.
In this paper, a fully automated operational system for detection and delineation of flood objects is presented. The system runs on a daily basis and uses a national flood alert service to determine which regions of Norway to process for flood detection. All Sentinel-1 SAR data covering Norway is downloaded and preprocessed every day. The data covering regions with a flood warning is processed with machine-learning methods to detect and map the outline of flooded areas. Whereas data covering regions without flood warning, is stored in a database for use as reference images. The aim has been faster and better handling of flood events, as well as better utilisation of available resources.
However, as with many automatic detection systems, a great challenge is to obtain high detection rate and at the same time keep the false detection rate low. For this project, the presence of wet snow and agricultural areas provided potentially many false detections. To handle such situations we filter the flood detection using several features, including the size of the potential flood object, water mask, and HAND.

FLOOD DETECTION SYSTEM
This section gives a brief overview of the system where the algorithms are implemented. The flood-detection processing chain ( Figure 1) is fully automatic and is daily downloading and processing Sentinel-1 SAR data covering the Norwegian mainland. The input SAR data to the algorithm are Level-1 Interferometric Wide Swath (IW) Ground Range Detected (GRD) products with VV-polarization, which has been detected, multi-looked and projected to ground range using an Earth ellipsoid model. The pixel spacing of the IW GRD products is 10 × 10 m. The pre-processing of the SAR data is standard, and may be summarized as follows: • Radiometric calibration to normalized radar cross-section σ 0 ("sigma-naught) using attached metadata.
• 3 × 3 multi-looking to reduce speckle noise • Geocoding to selected map projection (UTM33/WGS84, 20 × 20m pixel spacing) using the Range-Doppler geocoding algorithm. The geocoded SAR images are sampled to the same grid as the DEM. Other geocoding algorithms may also be applied.
Then, for each flood-alert region and satellite pass (repeat cycle), the geocoded images are stitched together into a mosaic product using the knowledge about the geographical position of each image pixel.
National flood alerts are fetched from the flood alert application programming interface (API) served by the Norwegian Water Resources and Energy Directorate. Based on the status of the flood alert for each region, the acquired SAR image mosaics are either processed for reference or for detection. If there is a flood alert for a given region, the corresponding SAR mosaic is scanned for flood events, otherwise the reference database is updated with the SAR mosaic. The flood detection is based on the change-detection principle, i.e. the system compares the SAR image with potential flood events with the latest reference image. Potential flood events are filtered with masks for water areas and flood hazards, and other features like size of potential flood events. The outputs of the system is a Shapefile containing the geographical extent of each flood object, a false colour image modulated by a DEM hill-shade image and a raster image displaying the coverage of the SAR product.

Dataset
From previous flood events, corresponding Sentinel-1 mosaic images were generated. Then, from SAR images, elevation models, aerial photos, and existing knowledge about floods and water streams, flooded areas were manually delineated in the SAR images by the Norwegian Water Resources and Energy Directorate. In total, 179 flooded areas were delineated from three major flood events (Table 1). It should be emphasised that this manual annotation is not perfect. It does not include all the flooded areas covered by the images from the time of the event. There are several areas that, based on the satellite images, looks like they are most likely flooded but where this has not been confirmed by other sources of information, and, thus, these areas are not annotated and included in the database. In addition, the delineation of the included floods, is an estimation based on expert knowledge.
For each annotated flood object, we created an image crop with corresponding pixel label mask. Each image crop included, both the flood object and its nearby surroundings. The set of image crops constituted the database we used for training and validating the flood detection algorithm, and consisted of in total 13,426 pixels of flood and 879,400 pixels from nonflooded areas ( Table 2). Note that by using these crops during training, we remove most of flooded areas that have been manually annotated, and thereby alleviate the issue with such flooded areas influencing the training.

Reference Image Database
In addition to the images from the event, the flood detection algorithm described in Section 4.1 requires a reference image captured at a time without flooding. The Sentinel-1 satellites are very suitable for such a change detection algorithm as the satellites have a repeat cycle of 6 days. This means that each satellite acquire images with the same coverage and geometry every 6 th day.
It is possible to construct a variety of different types of reference image products. For instance, from a period of time without flood events, an average image may be computed, or one could compute an reference image as a linear combination of the previous N passes without flooding. However, in this work we used the images acquired during the previous pass not affected by flood. In practice this is performed by storing images from each pass occurring at a time when a flood warning has not been issued. Thus, during a flood event, the system uses the newest possible images acquired before the flood event as reference images.

FLOOD DETECTION
The flood detection algorithm consists of two steps: a changedetection step and a flood-processing step.

Change Detection
Input to the change-detection step is two SAR images (output from the pre-processing): the event image, Ievent , and the reference image, I ref . From these two images, the difference image, is created.
To detect flood objects in the SAR data, two features are applied: the difference image and the reference image. A random forest (RF) classifier [Breiman, 2001] (Python/Scikit-Learn) was trained using the annotated training data (Table 2), with the following parameters: n estimators equal to 100 and min sample leaf equal to 63. The other parameters were set equal to the default parameters. The cut-off parameter was selected to the threshold value when the F2-score (weighted average of precision and recall) was highest. For the given training data, the cut-off parameter was selected equal to 0.876. The cut-off value is the threshold we apply to the RF score values, i.e. we set the prediction to flood if the score value is greater than the cut-off. The decision boundaries for the two input features show that for reference values less than -15 dB and greater than 3 dB the background class is selected (Figure 2). For reference values around 15 dB, the RF classifier predicts flood if the difference value is less than 4 dB, whereas for reference values around 3 dB, the differences values need to be smaller than 10 dB in order to cause a flood prediction (Figure 2). In test mode, we apply the random forest classifier to all pixels in the SAR images, and output pixel values greater than the cut-off value are labelled as flood. The detected flood objects are then smoothed using a morphological closing operator with a 3 × 3 structuring element.

Flood Processing
Flood detection using the difference image produces many false detections. In order to reduce the number of false positives, we apply the following filtering to the output image of the change detection module: • Remove flood detections that overlap with the water mask. • Remove detected flood objects smaller than four pixels (1600 m 2 ) and larger than 2000 pixels (0.8 km 2 ). The filtering of large objects is effective to remove wet snow areas, which are often detected by the change-detection algorithm (note that the largest flood object was 0.3 km 2 see Section 5.2).
• Remove flood objects where the corresponding average terrain slope is larger than 15 degrees.
• Remove flood objects where the corresponding average HAND value is more than 20 m.
• Remove flood objects where the corresponding average HAND is more than 5 m and the distance from the nearest water body is more than 2000 m.

HAND feature image
The HAND model calculates the height of each cell in a DEM raster in relation to its nearest drainage point [Rennó et al., 2008, Nobre et al., 2011. The model uses the drainage network and the local drain directions to create the distance to the nearest drainage channel, which is the normalised topology of the HAND model [Rennó et al., 2008, Nobre et al., 2011. The HAND model is basically a terrain descriptor and therefore it cannot estimate the flood wave as in the case of hydrodynamic models. The HAND is used for hydrological and more general purpose applications, such as hazard mapping, landform classification, and remote sensing.
The HAND feature image serves as a flood hazard map and is created from the corresponding digital elevation model using GRASS GIS and the stream.extraction and stream.distance modules. The stream.extraction module identifies stream networks from digital elevation models, without calculating hydrological parameters, whereas the stream.distance module calculates the elevation above the nearest stream network. A central part of extracting the stream networks is the calculation of the flow accumulation, which is the flow accumulated into each pixel. By thresholding the flow accumulation raster, i.e. keeping cells where the accumulation is larger than a given threshold, we obtain the stream network. For our DEM we used a threshold value of 100,000.   form to the water mask raster. The result was a raster where each pixel reports the distance to the nearest water body.

VALIDATION
The algorithms are validated by processing SAR data for a number of flood events (positive events) and SAR data without floods (negative events) and comparing the results with "ground truth" derived from manual expert interpretation of SAR data combined with in situ observations.

The Validation Datasets
For validating the algorithms we apply the leave-one-out principle on the training data described in Section 3, were we train the classifier on two of the flood events and validate the results on the third (Table 3).

Results
The size distribution of the manual annotaded flood objects shows that the majority of the flood objects are small, typically less than 20,000 m 2 (Figure 3). For all flood events, we observe that small floods are most common. The largest flood object had an area of 755 pixels, i.e. 0.3 km 2 , and occurred in the "Inland" flood event.   The cut-off value used to threshold the RF classifier scorevalues was selected as the value that maximized the F2 score, where the definition of the general F β score is: For the three different flood events, the F2 metric versus cut-off value was similar and resulted in a nearly identical cut-off value ( Table 4).
The pixel-wise performance metrics of the RF classifier show the average accuracy of identifying non-flooded pixels is very high whereas the average accuracy for identifying flooded pixels is substantially lower, in particular for the Sørlandet flood event (Table 4). When we evaluated the results with respect to the flood objects, we observe that we are able to detect 168 of 179 flood objects (Table 5). However, the flood detection algorithm detects a significant amount of flood objects not labelled. As described in Section 3.1, this is to be expected due to the imperfections of the manual annotation. A visual inspection of the results also confirms that a significant part of the "false" detections are in fact most likely flooded and should be considered correctly classified.
Several examples of the automatic detections and delineations of the algorithm are shown in Figure 4, where the annotations and predictions are represented as orange and green polygons respectively. The pseudocolor images are constructed by putting the event image in the red channel and the reference image in the green and blue channels. Positive changes in the backscatter thus appear as red areas in the images. Finally, the images are modulated with the hill-shade image in order to visualise the topography.

DISCUSSION
The performance of the algorithms for detecting and delineating flooded areas in Norway was in general very good. From the flood events, we were able to detect 168 of 179 labelled flood objects. However, even though this number is substantially higher than the number of labelled flood objects, many of these detections are expected to correspond to actually flooded areas due to the deficiencies in the manual annotations noted in Section 3.1. The number of false detections, both pixel-wise and object-wise, should therefore not be given too much emphasis.
During training of the validation RF classifiers (each classifier was trained with data from the two other events, see Sec-tion 5.1), we observed that the decision boundaries of the RF classifiers did not change much between the three different validation cases. This stable behaviour of the RF classifier is promising with regard to the generalisability of the complete flood algorithm with an RF classifier trained on all the flood objects in the database. Figure 4 display several important properties of the detection algorithm. In Figure 4a and Figure 4b examples where the algorithm performs well are shown. In these figures, all of the manually labelled flooded areas (orange polygons) are detected by the system. There are some mismatches between the predicted flood objects (green polygons) and the manual delineations, especially for the case of "Gudbrandsdalsågen", however often these are due to that many pixels in waterbodies are marked as flood by the manual annotations (these are removed by the detection algorithm using a water mask). The detected flood objects in Figure 4c are great examples of detected areas that most likely are flooded but not included in the manual annotations. Both location and the visual signature of the objects are consistent with actual flooded regions, and thus it is reasonable to assume that these detections are correct. In Figure 4d, the opposite seem to be the most reasonable explanation: although the status of the detected object are unknown, it seems likely that most of them are not flooded areas and that the detections are false. The wet snow area shown in Figure 4e is an example where the change detection algorithm is easily confused. However, with the use of HAND properties in addition to object size filtering, the resilience for false detections in such cases has improved significantly, and thus this area is not detected as flood even though the signature is very strong. Wet snow is particularly present during spring when the snow is melting. This often coincide with the time of when floods are occuring, and thus removing such false detections are important to the system performance (the system is triggered by flood warnings). Another factor particularly present during spring, is areas of agricultural land which is prepared for the season by the farmers. Examples of such areas is shown in Figure 4f. Removing false detections caused by agricultural areas proved to be harder; however, including the HAND and removing detection that were more than 20 m above the nearest drainage point reduced the number of false detections substantially.
During the development of the flood detection algorithm, an investigation of the usage of the Sentinel-1s cross-polarization band and including a mean and standard deviation reference backscatter image was performed, without any measurable effects.

Future Work
The algorithm should be updated when data from new flood events are acquired. The flood detection algorithm is only based on three flood events, and for these events, not all the flood objects are labelled. Another improvement is to extend the HAND. Currently, the stream networks are detected by identifying areas that have flow accumulation larger than a given threshold. A natural extension would be to estimate several stream networks, using several different thresholds. A method to fusing different HAND processing results are needed. Other extensions include the fusion of VHR optical data, either to simulate and enhanced SAR image product, or to improve the detections.

CONCLUSION
In this paper a fully automated flood detection system based on Sentinel-1 SAR data has been resented. The system runs daily and downloads and processes SAR data covering Norway every day. During periods without flood warnings, the system updates the reference images, whereas for days with active flood warnings, the system is run in detection mode. Based on the evaluation presented, the overall performance of the system is very good. Nearly all of the labelled flood objects were detected in the evaluation database, in addition the delineation of the flood objects seem to be reasonable. There is still a potential for improvements regarding the number of false positives, however the presented efforts to combat these issues have shown to be very effective and reduced the number to a satisfactory level.