APPLICATION OF U-NET CONVOLUTIONAL NEURAL NETWORK TO BUSHFIRE MONITORING IN AUSTRALIA WITH SENTINEL-1/-2 DATA

This paper aims to define a pipeline architecture for near real-time identification of bushfire impact areas using Geoscience Australia Data Cube (AGDC). A series of catastrophic bushfires from late 2019 to early 2020 have captured international attention with their scale of devastation across four of the most populous states across Australia; New South Wales, Queensland, Victoria and South Australia. The extraction of burned areas using multispectral Sentinel-2 observations are straightforward when no cloud or haze obstruction are present. Without clear-sky observations, precisely locating the bushfire affected regions are difficult to achieve. Sentinel-1 C-band dual-polarized (VH/VV) Synthetic Aperture Radar (SAR) data is introduced to effectively elicit and analyse useful information based on backscattering coefficients, unaffected by adverse weather conditions and lack of sunlight. Burned vegetation results in significant volume scattering; co-/cross-polarised response decreases due to leafless trees, as well as coherence change over fire-disturbed areas; two sensors acquired images in a shortened revisit time over the same effected areas; all of which provided discriminative features for identifying burnt areas. Moreover, applying U-Net deep learning framework to train the recent and historical satellite data leads to an effective pre-trained segmentation model of burnt and non-burnt areas, enabling more timely emergency response, more efficient hazard reduction activities and evacuation planning during severe bushfire events. The advantages of this approach could have profound significance for a more robust, timely and accurate method of bushfire detection, utilising a scalable big data processing framework, to predict the bushfire footprint and fire spread model development.


INTRODUCTION
Bushfires, also known as wildfires, are a common event with seasonal occurrence worldwide, and considered as a major indicator of climatic change in the past decades. The devastating intensity of fire events can be exacerbated by lengthy droughts, creating ample fuel due to dry vegetation, as well as high temperature with minimal to no rainfall. Intense bushfires can reduce the vegetation coverage, but also give rise to property damage, impact on agriculture, livestock and loss of human life. Accurate and timely burn area identification plays a key role in burnt area mapping and monitoring, thus supporting time critical demand for situational awareness, informed decision making and tactical planning for all emergency services teams.
The Australian bushfires from November 2019 to April 2020 have reported an estimated AUD $2.26 billion, approximately US$900 million dollars, of property loss and destruction and high loss of human life over an estimated affected area of 187,360 km 2 . There was an urgent need at the time to effectively integrate accurate, timely and relevant data to support emergency services and government organisations. The Geoscience Australia Data Cube (AGDC) (Lewis et al. 2017) has provided an open source portal to facilitate the growing demand to analyse earth observation data, including Landsat, Sentinel, MODIS etc. The AGDC provides a good degree of scalability, including the option to deploy on high performance computing environments supported by National Computational Institution (NCI). The underlying core of AGDC is a suite of Python libraries and PostgreSQL databases, providing an intuitive spatial analysis environment for its users.

* Corresponding author
The aim of this study was to design and implement a fast data processing framework based on deep learning algorithms, and then assess the suitability of utilising Sentinel-1 and Sentinel-2 imagery data for bushfire hotspot extraction and segmentation. The paper is organized as follows. In Section 2, basic SAR physical mechanisms and phenomenology will be presented. In Section 3 an overview of the study area and experimental data collected during the aforementioned bushfire events are provided. Assessing the roles of SAR derived coherent change detection for different polarizations (VV and VH) will be illustrated in Section 4. The U-net model and segmented results will be discussed in Section 5, lastly concluding remarks are given in Section 6.

BACKGROUND
In contrast to passive remote sensing systems, such as optical multispectral imagery, SAR data (Curlander and McDonough 1991) (Moreira 2013) are active microwave systems with the ability to sense the surfaces of both Earth and other planets. A SAR system is normally used to map the characteristics and dimensions of terrain and ground features, for a variety of applications in forestry, hydrology, oceanography and agriculture.
In this research, SAR imagery of Sentinel-1 satellites (Malenovský et al. 2012) will be used because the observations are potentially collected over land cover worldwide as well as with priority for coastal areas. Sentinel 1 is a dual polarised (PolSAR) system, where polarisation refers to the locus of the electric field vector in the plane perpendicular to the direction of propagation for a plane electromagnetic (EM) wave (Oliver and Quegan 2004). With distinctive polarisation signatures that reflect different intensities, the diversity of ground targets is more uniquely interpreted by the backscattering coefficient, which is a physical quantity converted from pixel intensity values. PolSAR sensors can transmit microwave radiation in either linear horizontal (H) or linear vertical (V) direction, and then receive signals in either H and V polarisations (Oliver and Quegan 2004).
The main signatures of transmit-receive polarisation products enable maximising the ratio of backscatter strength that is utilised to improve the detectability of targets, while dedicated dual-PolSAR classification techniques (Hänsch and Hellwich 2010) enable feature classification and other advanced analysis in applications of fire scar mapping (Imperatore et al. 2017), biomass estimation, crop type and condition identification.
A series of deep learning algorithms (Schmidhuber 2015, Zhang, Zhang andDu 2016) have attracted unprecedented research interest due to their inherent capabilities to overcome drawbacks of traditional machine learning algorithms (e.g. Random Forest) (Ramo and Chuvieco 2017), demonstrating notable performance improvements in comparison to traditional machine learning algorithms. U-net (Ronneberger, Fischer andBrox 2015, Flood, Watson andCollett 2019) was originally developed for image segmentation in the field of diagnostic imaging; it is a fully convolutional network, the key feature of the model being its replacement of the pooling layer by upsampling operators. A large number of feature channels sit in the upsampling layer, enabling the network to propagate context information to higher resolution layers. The expansive path is more or less symmetric to the contracting layer, while visually it yields a U-shaped model architecture. To predict the pixels in the border regions of the image, missing context is extrapolated by mirroring the input image. The U-net model has revealed excellent performance pixel-level accuracy even when dealing with insufficient trainable datasets. U-net architecture's characteristic of propagating more weights to the area of interest or labelled pixels allows the network to learn its edge and feature information. In this work, U-net is utilised for its segmentation capability.

Study area and relevant bushfire event
From early October to late November 2019, over 900 km 2 of land at Myrtle Creek (see Figure 1.) and surrounding suburbs, including Rappville, Wyan and The Island in the state of New South Wales, Australia were affected by bushfires, presenting an out of control firefront of more than 20km. The rationale for choosing this event and its locale as the study area was because there were multiple nearby blazes that would merge over a period of a few days. This provides a good opportunity to examine the effectiveness of a deep learning algorithm across time series data. The study area provides a rich composition of ground features resulting in different types of scattering, e.g. residential areas, vegetation and hilly forests etc.

Sentinel-2 optical image processing over burned areas
Sentinel-2 (S2) MSI datasets includes a wide range of spectral bands. In this study Level-2A products were choosen to avoid atmospheric effects such as bottom of atmosphere (BOA) reflectance, and product application of resampling of cloud and water mask computations.
In this section, spectral index equations were employed to generate the fire impacted areas based on S2 images acquired on 18 Nov 2019 as the ground truth for labelled image that will be part of the training samples for the U-net model in Section 4.
Various fire-related spectral indexes have been widely analysed to detect the burn scars. Use of cloud and water body masking algorithms can reduce the disturbance for later steps of classification based indices, e.g. normalized burn ratio (NBR), and relativized burn ratio (RBR) (Parks, Dillon and Miller 2014). The S2 internal scene classification (SCL) image is used for atmosperic correction, providing cloud quality probabilities (Gascon et al. 2017). The SCL probabilities of thin cirrus, medium and high cloud are then summed up, and if the resulting sum is lower than 255 then the resulting probabilities are used for a cloud mask layer. Simularly, normalized difference water index (NDWI) (Gao 1996) are also required to be computed and then subtracted from the initial NBR layer in conjunction with the cloud mask layer.
The formulas used in this study and implemented in S2 multispectral images refer to NDWI Equation (1), NBR and RBR Equation (2) Due to the large scale and coverage of the full S2 image, this paper only shows a patch of bushfire burn scar with a close-up of the area of interest (AOI). The original true colour image is illustrated in Figure 5(a) with very thick smoke and cloud cover. The AOI burn scars can be clearly observed in brown in Figure  6(b) with associated near infrared and short wave infrared band composition. The outcome of the binarized threholding RBR image is shown in Figure 6(c) in white with black background.

Sentinel-1 polarimetric coherence estimation
This section outlines the processes to compute and analyse the coherent phase difference for applicable adjacent dates based on AOI ignition timeline in Figure 4 to select the most significant coherent variation for the U-net model in Section 4.
For Sentinel-1 (S1) polarized SAR data, pixel size is resampled to obtain regular square pixels with 10 m spatial resolution, instead of pixel spacing with 5 m in range by 20 m in azimuth in the original products. The purpose is to preserve the maximum polarimetric coherence for fusing with the S2 derived RBR result with 10 m resolution. According to the Bureau of Meteorology (BOM) weather records there was no rainfall during the S1 imagery acquisition period, ensuring minimal radar signal disturbance by soil moisture and wet vegetation over the study areas. Single look complex (SLC) PolSAR images were selected in interferometric wide swath mode (IW) with phase information, allowing further investigation of the interferometric SAR (InSAR) coherent changes and its characteristics.
Time series SAR images were chosen to generate a coherence magnitude estimation map from complex magnitude and phase information (Touzi et al. 1999), commonly used to discriminate small fractions of surface variation by computing the two complex radar signals 1 and 2 based on the following Equation (5): where denotes the interferometric coherence, referring to the amplitude of the complex correlation coefficient between two SAR images, 〈 〉 denotes the statistical expectation (Lu et al. 2018). A summary of the SLC based coherent map step-by-step procedures is presented in the flowchart in Figure 2. To derive pairs of processed PolSAR coherence images, firstly the corresponding band swath needs to be a subset from all original SLC images with the same orbit (satellite's line of sight). Then the precise orbit file and back geocoding on the paired images is applied. In order to minimize the discontinuous phase information across a series of bursts, enhanced spectral diversity method (ESD) was used to estimate the azimuth shift between the two SAR images (Wang, Xu and Fialko 2017). Phase information for both VH and VV images were also computed. The strips between different sub-swath images were removed by de-burst functionality. Next, multi-looking for pixel size regularization was used, and then speckle noise was eliminated by Refined Lee speckle filter with a 5×5 window size. The last step is to correct the terrain effect on the output images with Geocentric Datum of Australia 2020 (GDA20) projection system.

Figure 2. Workflow of InSAR coherence processing
The SAR coherence calculation is completed using the SNAP Sentinel-1/-2 Toolbox software, provided by the European Space Agency (ESA). Figure 3. S1 paired coherence change maps with a red outline reference polygon between any sequential two acquisitions as first row: 1025_1106 , 1106_1118 , 1118_1130 and 1130_1212 ; second row: 1025_1106 , 1106_1118 , 1118_1130 and 1130_1212 .

InSAR coherence differences analysis
Since the coherence images of all adjacent dates are estimated, the current results are overlapped with FESM fire extent map, mentioned in Section 1, for significant global change beyond the AOI. According to P. Zhang et al.'s study (Zhang et al. 2019), extending the temporary baseline from one sequential interval to two and subtracting the two coherence differences would provide a continuous change globally.
Thus, Zhang's approach is utilized to compute coherent map 1025_1118 and 1118_1212 based on 24 days interval, and then subtract 1025_1118 from 1118_1212 to derive the coherence difference ∆ 1025_1212 for both VH and VV polarization as the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII- B1-2020, 2020XXIV ISPRS Congress (2020 final input from S1. The processed images are listed in Figure  6(e) ∆ 1025_1212 and (f) ∆ 1025_1212 . Figure 4. AOI-targeted ignition timeline of SAR imagery acquizitions In Figure 5, overall VV co-polarized 1025_1118 has lower coherence value compared to 1118_1212 , highlighting the ground changes due the fire burn. As the fire over AOI still burned for a couple of weeks, it resulted in relatively higher coherence distribution on 1118_1212 image.

Sentinel-1 and Sentinel-2 data fusion
Since the processed S1 and S2 images are resampled to the same spatial resolution, stacking two products is necessary to collocate based on their geocoding pixel by pixel. SAR coherent and optical derived RBR maps deliver extensive information, therefore data fusion for both products is able to increase the image classification accuracy (Clerici, Valbuena Calderón andPosada 2017, He andYokoya 2018) as well as standardize the pixel matrix as a part of deep U-net model input.

Deep U-Net architecture
To train the U-net model, we utilized S1 derived coherence difference / ∆ 1025_1212 , and S2 RBR images as input. The model was trained with a Python library TensorFlow, and the first step is to utilise a pre-defined network and then perform hyperparameter tuning, limited training samples are available. This will ensure we retain the high-level semantic information in the subsequent processing steps. In the second step, data augmentation is required to enlarge the training sample datasets, leading to a channel creation for feature propagation. The signal broadcast between low-level and high-level details becomes much easier, and it also facilitates the backpropagation during training (Ronneberger et al. 2015, Flood et al. 2019). In our case study, the input are tiles of 256×256 pixels with three channels, RBR, ∆ 1025_1212 and ∆ 1025_1212 , respectively. The tiled images are 2058 in total, including 1234 images for training dataset, 412 images for validation.

Morphological operation as post-processing on Unet result
According to the definition of mathematical morphology, morphological closing creates a disk-shaped structuring element to preserve the main object shape while eliminating the surrounding noise pixels, closing and filling the small holes or gaps on the object (Haralick, Sternberg and Zhuang 1987, Lee et al. 2016, Ajadi, Meyer and Webley 2016. This operation is employed as a post processing step on the U-net output. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2020, 2020 XXIV ISPRS Congress (2020 edition)

EXPERIEMENTAL RESULTS AND DISCUSSION
The accuracy of the segmentation results is usually evaluated with different metrics, e.g. Sørensen-Dice coefficient (SDC) (Ivanovsky et al. 2019, Ulku et al. 2020, Chhor, Aramburu and Bougdal-Lambert 2017. SDC (also known as F1 score) algorithm is a statistic to assess the area of overlap or intersection over the total number of pixels in both images. It can be represented in Equation (6).
S2 spectral data derived RBR with clear sky are used to validate the effectiveness of U-net training outcomes. According to the findings, cross-polarised (VH) phase difference has shown greater contribution in U-net training progress in the light of delineating burn areas, compared to co-polarised (VV). In this case, HV and VV intensity images have shown limited effectiveness in contributing to the training model, whilst RBR index data have demonstrated its significance on segmenting the target burn areas. Our work has resulted in a training and validation mean dice coefficient of 0.90 and 0.89 for fire burned extents. Figure 9. The images in first row are the S1 and S2 fused and tiled images as model input data, the second row demonstrates the tiles mask/labelled image based on processed RBR image (as ground truth), and the last row represents the U-net semantic segmentation maps.
To achieve the maximum training accuracy, loss rate is required to be minimized with Adam optimizer to constantly compute and update the weights and bias for an optimal solution before overfitting. Training and validation accuracy are demonstrated over each training epoch during U-net modelling in Figure 10 below. Figure 10. U-net model training and validation accuracy of each epoch.
Training and validation curve throughout 40 epochs is shown in Figure 10, an early stopper was applied to terminate the training operation to avoid overfitting. The model performance on a holdout validation dataset is slight lower than the training dataset. Figure 11. Morphological post-processed U-net model prediction in light red overlapped with the proposed RBR map. Figure 11 shows the morphological post-processed U-net segmentation has precisely described the bushfire burned areas, smoothened the burned front edge whilst excluding the small segments might have misclassified due to coarse pixel resolution.

CONCLUSION
In this paper, a deep learning U-net model has been trained to process both multispectral optical and microwave dual-polarised (HV, VV) InSAR data to identify and segment fire affected areas. A satisfactory result is achieved with mean dice coefficient 0.89 for validation data. Moreover, for the InSAR coherence difference, compared with VV, VH has occupied the majority of lower coherence values over the new burnt areas. The processed SAR images have demonstrated great utilisation potential given heavy smoke or cloud shadow over the fire hotspots. It is worth knowing that, in the absence of optical data availability, we can rely upon SAR signal to penetrate visual obstacles and accurately locate fire effected areas.