RGB-BASED DEEP SURFACE WATER CONTOUR DETECTION

: The application of remote monitoring of surface water has focused primarily on the detection of water bodies using expensive multi-spectral IR sensors. However, critical information about surface water bodies, particularly the dynamic behavior, is better derived from water contours. We show that water body detection is inadequate in accurately capturing the contours. Furthermore, we argue that RGB-based detection should be sufficient for accurate water detection. We present a new global dataset of remote sensing images obtained from Sentinel-2 and Landsat-8 missions and contour labeled to assist in this effort. We propose a unique UNet-style contour detection system that utilizes multiscale filters to detect contours accurately. Comparisons between our proposed system, existing water detection, and other segmentation and contour detection systems show the system's effectiveness in detecting water.


INTRODUCTION
Water is an important ecosystem and a critical resource for life, agriculture, commerce, and industry needs. Additionally, rapid changes in water levels through major weather events threaten life and economic developments in their proximity. Remote sensing can be used to provide accurate, continuous, automatic, and real-time monitoring of surface water bodies. Remote sensing technologies are typically deployed via satellites, aircraft, and drones and provide data at variable temporal resolution (hours/days), spatial resolution (<1m/pixel 1km/pixel), and spectral channels (see (Huang et al. 2018) for a review).
Early remote sensing solutions utilized multi-spectral satellite data to formulate rule-based metrics such as mNDWI (Xu, 2006), WI (Fisher et al. 2016), AEWI (Feyisa et al. 2014), MBWI (Wang et al. 2108), each designed to account for variations in sensor accuracy, water temperature, turbidity, and cloud coverage. Improved water detection was further demonstrated with machine learning techniques involving decision trees (Friedl et al. 1997;Mueller et al. 2016), SVM (Aung et al. 2018), and clustering (Cordeiro et al. 2021) applied to the same multi-spectral satellite data.
Deep learning based fully convolution networks (FCN) (Krizhevsky et al., 2012;Long et al., 2015) have been have proven to be extremely successful in applications of object detection and segmentation (see (Hoeser et al, 2020) for a review of techniques and applications to remote sensing). Deep water map (Isikdogan et al. 2017;Isikdogan et al. 2019) is a UNet (Ronneberger et al. 2015) based FCN water body detection system that is used for accurate detection of water bodies from multi-spectral satellite data.
In this paper we focus on the detection of water contours using data in the visible range only (RGB). Water contours are more dynamic and allow for better analysis of changes that are happening at the surface and in the sediment. Our analysis shows that more accurate detection of water bodies doesn't necessarily __________________________ * Corresponding Author -(mbsyed@uno.edu) (1) www.usgs.gov/landsat-missions (2) sentinel.esa.int/web/sentinel/missions lead to the accurate detection of the contour. Contours are more difficult to detect due to the spectral blending that occurs at the boundary. Additionally, there is a need to address the huge imbalance between contour targets and the non-target to improve the detection of contour.
Humans are very skilled at detecting water in the visible spectrum, so we propose that machines can be trained to do the same. IR-based detection of water from sensing data is complicated by the water body type and the surrounding environment. The movement, depth, and temperature of water, as well as shadows from structures and clouds affects the absorption of IR bands. IR sensing technologies are more expensive and require careful calibration and vary in sensitivity, number of spectral bands, and spatial resolution even within the same sensor system. For example, the Sentinel-2 mission provides data at 10m spatial resolution for the visible range, and the Near Infra-red (NIR) range only, the rest of the data is at a lower spatial resolution.
An extensive literature review revealed several datasets (Carroll et al. 2009;Homer et al. 2004;Lehner et al. 2004;Prigent et al. 2012;Verpoorter et al. 2014;Yamazaki et al. 2015) with labeled water bodies. However, these were not useful mainly due to inaccurate contour labeling due to the use of polygons, or aggregate/cumulative water labeling, low resolution of satellite images (>60m/pixels), and difficulty in recognizing the satellite source image used for labeling.
As such we developed our own dataset based on Landsat-8 1 and Sentinel-2 2 missions. The dataset contains of over 1M images for Landsat and over 400K for Sentinel of which 14K have been hand selected for the purpose of fully supervised training. We also present a unique Fully Convolutional Network (FCN) UNetstyle network for automatic and accurate detection of water contours from aerial RGB images. Our network outperforms other architectures in contour detection, even though it has fewer parameters compared to the other architectures. We also show that existing waterbody detection systems like Deep water map (Isikdogan et al. 2017;Isikdogan et al. 2019) and water detect (Cordeiro et al. 2021) that use RGB+IR channels even with an F-score > 0.98 do not translate into accurate contour detection.
The use of visible bands (RGB) makes it a cost-effective detection system that can be deployed for all remote sensing solutions (fixed camera, drones, etc.). Rest of the paper is organized as follows. Section 2 details the data collection and labeling process. Section 3 has the UNet style architecture used for contour detection. Section 4 contains the results of the proposed system and comparative results to other systems in the literature, and section 5 has the concluding remarks.

DATA COLLECTION AND LABELLING
Data for both Landsat and Sentinel satellites were acquired from locations around the globe. Both Landsat and Sentinel satellite locations were obtained from (Pekel et al. 2016) and downloaded via Google Earth Engine (GEE) (Gorelick et al., 2017). Additional Sentinel satellite locations were obtained from bluedotwater 3 ; data was acquired using Sentinel's API 4 . The dataset was created using the following process. 1. Machine-label the water using a successful waterbody labeling technique. 2. Apply image processing to extract the contour. 3. Split the images into smaller sets. 4. Hand-select the tiles with the best accuracy. 5. Generate meta-data about each tile, including satellite image source and water content percentage.
For step 1, we used the normalized difference water index (NDWI) (Gao, 1996) as it was most successful in producing accurate and continuous contours when applied to various water bodies from around the globe. To extract the contour (step 2), each image was then binarized to show only water, and subtracted from its morphological dilation to yield a contour. While this method generated good contours generally, it also created bad contours within the same image. To improve the yield of good contours, the images were split into 128 × 128 tiles and visually inspected (steps 3 and 4). Tiles with heavy cloud coverage as well as bad contours were removed.
Our current repository 5 contains over 1M tiles for Landsat-8 data and 400K+ tiles for Sentinel-2 of visually unconfirmed data. The hand-selecting process is extremely time consuming with a very low yield. After visually inspecting 100K tiles, we created two datasets each with 7K+ accurately labeled tiles for both Landsat and Sentinel. The datasets were "balanced" to prevent the overabundance of tiles with little to no contours. The data also contains JRC (Pekel et al. 2016) water labels for each Landsat tile for reference. Each tile in the dataset is stored with 6 channels (16-bit raw satellite data), organized in the following order: blue (b1), green (b2), red (b3), NIR (b4), SWIR1 (b5), SWIR2 (b6). Though our work emphasizes RGB detection we included the IR bands in the dataset for others to use freely. Additionally, metadata containing the image source file/satellite, cloud coverage, and approximate water content in each tile is also provided.

Preprocessing
The datasets are 16-bit data and require sensor-specific preprocessing to convert them into regular true-color RGB data

Architecture
Our proposed UNet-based water contour detector model with unique specifications can be seen in Figure 1. Only RGB channels are used as input. The encoder/decoder layers rely on multiscale convolution layers using sizes of 1x1, 3x3, 5x5, and 7x7 filters. The 1x1 filters are useful in controlling the size of the parameters and help provide weights for each channel. The other size filters are successful in capturing edges at various scales. Bach-normalization (BN) with each convolution was also found to improve contour detection. Also unique to our design is the use of strided convolution for down-sampling in the encoder network and transpose convolution for up-sampling in the decoder network. While this adds parameters to the model, it barely affects processing time and improves the contour detection compared to (un)pooling. "Skip" connections between corresponding encoder and decoder blocks are a general attribute of UNet systems that have been shown to improve training and provide better localization in the output. A sigmoid output is used to classify each pixel output as a contour or non-contour (i.e., 1 or 0).

Loss Function
Binary cross entropy (BCE) was used to capture the pixel-level loss in the image. Intersection over union (IoU) and Dice losses were used to capture the object-related loss.
Due to the significant imbalance between contour and noncontour pixels, the pixels are first weighted by the ratio of nonwater pixels to the number of pixels in a × border around the labeled contour. The weighing makes errors closest to the contour count more heavily than those outside the × border. We found a border of 9 × 9 to be optimum.

Training
The training was done separately for Landsat and Sentinel datasets. Adam optimizer was used with an initial learning rate of 0.003; default beta1 and beta2 values were used. Each iteration had a mini-batch of 32 images. All the weights in the network were initialized using Kaiming He's initialization method. No augmentation was applied to the training images.

RESULTS
Each of the 7K+ Landsat and 7K+ Sentinel datasets were randomly split into 5K training tiles and 2K testing tiles and used for experimentation.

Evaluation Metrics
There are two evaluation metrics that we use. F-score, and the Average Precision (AP).

Numerical Analysis
In  (Cordeiro et al., 2021) is another waterbody detection tool that applies hierarchical clustering on RGB+IR data. Even with extremely high F-scores (>0.97) both models fail to accurately capture the contour accurately even with IR channels available as input.
Compared to this our model has superior performance only using RGB channels for contour detection. This further emphasizes the need for a contour detection system, which will be necessary in water resource management. The table includes performance measures for waterbody detection and contour detection.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France Performance metrics for our proposed contour detector are compared to popular image segmentation or waterbody detection systems in the literature, the results are summarized in Table 2. DeeplabV3+ , UNet-Resnet (Ronneberger et al. 2015), UNet++ (Zhou et al. 2018), and PAN (Li et al. 2018) are popular DL-based segmentation techniques. DWM is a DL waterbody detection model that was retrained on our data specifically for contour detection. Waterdetect (Cordeiro et al. 2021) is also a water body detector but relies on hierarchical clustering of rule-based metrics. As such, it does not have a training phase. The results show that our model outperforms several popular segmentation architectures with the least number of parameters. This can be attributed to the multiscale nature of the convolution block and the weighting of each channel that only keeps the most relevant information. Our system uses fewer parameters, has a faster training time, and is more accurate at detecting water contours than other systems.

Qualitative Analysis
We present some output results of our proposed system in Figure  2 and Figure 3 to show that a low F-score does not necessarily indicate bad output. Visually, the prediction of contours on the RGB images in Figures 2 and 3 are mostly accurate, but the Fscores are low. This can be attributed to a minute misalignment between the prediction and the ground truth. Due to the ground truth being an extremely thin contour, even minor misalignment can cause a low F-score.

CONCLUSION
Existing and state-of-the-art remote sensing systems rely on water body detection that does not necessitate accurate contour detection. Additionally, there is a need for inexpensive RGB based detection. Due to the lack of accurate water contour data, we presented a technique for collecting and labeling data from Sentinel and Landsat missions. We also proposed a unique UNetstyle detector that is very effective in water contour detection. We demonstrated a system that can train faster using fewer parameters by focusing on contours.