COLORIZING SENTINEL-1 SAR IMAGES USING A VARIATIONAL AUTOENCODER CONDITIONED ON SENTINEL-2 IMAGERY

Synthetic aperture radar (SAR) images are completely different from optical images in terms of both geometric and radiometric appearance: While SAR is a range-based imaging modality and measures physical properties of the observed scene, optical imagery basically represents an angular measurement system and collects information about the chemical characteristics of the environment. Thus, the interpretation of SAR imagery is still a challenging task for remote sensing scientists. However, SAR image interpretation can be alleviated when optical colors are used to support the interpretation process. For decades, this has been a special case of remote sensing image fusion (Pohl and van Genderen, 1998; Schmitt and Zhu, 2016). Still, SAR-optical image fusion by definition needs both a SAR and an optical image acquired at approximately the same time, which means standard image fusion techniques do not particularly help the interpretability of SAR images as an independent data source. To overcome the need for accompanying optical imagery, this paper proposes to learn feasible colorizations of Sentinel-1 SAR images from coregistered Sentinel-2 training examples using deep learning techniques. This is meant to provide a significant step in SAR-optical data fusion (Schmitt et al., 2017) with application to improved SAR image understanding, and will enable SAR data providers to attach colorized versions of their imagery to their products.


INTRODUCTION
Synthetic aperture radar (SAR) images are completely different from optical images in terms of both geometric and radiometric appearance: While SAR is a range-based imaging modality and measures physical properties of the observed scene, optical imagery basically represents an angular measurement system and collects information about the chemical characteristics of the environment.Thus, the interpretation of SAR imagery is still a challenging task for remote sensing scientists.However, SAR image interpretation can be alleviated when optical colors are used to support the interpretation process.For decades, this has been a special case of remote sensing image fusion (Pohl and van Genderen, 1998;Schmitt and Zhu, 2016).Still, SAR-optical image fusion by definition needs both a SAR and an optical image acquired at approximately the same time, which means standard image fusion techniques do not particularly help the interpretability of SAR images as an independent data source.To overcome the need for accompanying optical imagery, this paper proposes to learn feasible colorizations of Sentinel-1 SAR images from coregistered Sentinel-2 training examples using deep learning techniques.This is meant to provide a significant step in SAR-optical data fusion (Schmitt et al., 2017) with application to improved SAR image understanding, and will enable SAR data providers to attach colorized versions of their imagery to their products.
In order to achieve this goal, we find inspiration in recent computer vision approaches dealing with gray-scale image colorization.In the frame of this paper, we use the network architecture proposed by Deshpande et al. (2017), which utilizes both a variational autoencoder (VAE) as well as a mixture density network (MDN) (Bishop, 1994) to create multi-modal colorization hypotheses.Since our problem is different from the computer vision task in that there are no color SAR images that can be used as target samples during training, we first create artificial color SAR images by SAR-optical image fusion.In the remainder of this paper, we will first describe how these artificial color SAR images can be created (Section 2).Then, we introduce our dataset based on Sentinel-1 and Sentinel-2 imagery in Section 3, before we describe the employed deep generative model in Section 4. Finally, exemplary results are shown in Section 5.

SAR-OPTICAL IMAGE FUSION BY COLOR SPACE TRANSFORM
As Pohl and van Genderen (1998) have already summarized two decades ago, image fusion by exploitation of color space transformation has long been an established approach in remote sensing.In particular, the so-called intensity-hue-saturation (IHS) transform has become a standard procedure for SAR-optical image fusion (Harris and Murray, 1990), making use of the advantage over the conventional RGB color space that intensity or brightness is disentangled from the image colors (Tu et al., 2001).However, we follow a similar fusion strategy that employs a transformation of the optical imagery to Lab color space, which has the advantage that it describes all perceivable colors in a three-dimensional Cartesian coordinate system which reserves two dimensions for the actual color components and an additional luminosity axis (see Fig. 1).It has to be noted that the usage of the term Lab is not clearly defined and commonly used for different color space variants.In the context of this paper, we use it to refer to the CIE 1976 L*a*b* color space (Pauli, 1976).
For our case of SAR-optical image fusion, the Lab color transform can be exploited in the following way: First, the optical image Iopt comprised of RGB pixels p RGB = (R, G, B) is converted to Lab color space by determining the luminosity the value on the green-red color axis and the value on the yellow-blue color axis

THE DATASET
The experiments presented in this paper make use of a dataset consisting of co-registered SAR and optical image patches sized 256 × 256 px with a pixel spacing of 10 m.The patches were cut from hundreds of georeferenced Sentinel-1 and Sentinel-2 images downloaded from uniformly distributed regions of interest spread across the land masses of the Earth.For the Sentinel-1 SAR images, GRD products were used, which represent the σ 0 backscatter coefficient in dB scale.Restituted orbit information was combined with the 30 m-SRTM-DEM (or the ASTER DEM for high latitude regions where SRTM is not available) to produce precisely ortho-rectified images.While the original data is quantized to 16 bit, the images used in this study were reduced to 8 bit.For Sentinel-2, the red, green, and blue channels (i.e.bands 4, 3, and 2) corresponding to the same areas of interest were downloaded and reduced to 8 bit as well.Since Sentinel-2 data are not provided in the form of satellite images, but as precisely georeferenced granules, no further pre-processing was required.Instead, for Sentinel-2, the data were selected based on the amount of cloud coverage.In order to not discard too many images, during the database query less than or equal to 1% of cloud coverage was used as search criterion.
Since for a given, randomly selected, region of interest, full coverage by single Sentinel-1/Sentinel-2 acquisitions cannot be ensured, mosaicking was used, whereas only images from within a meteorological season (i.e.spring, summer, fall, winter) were combined.However, automatic mosaicking of Sentinel-1 imagery can lead to artifacts such as seam lines.Furthermore, some clouds can still be present in the Sentinel-2 imagery.Therefore, all the produced image patches were manually inspected in order to remove affected patches.This way, in total 282,384 quality-controlled patch-pairs covering all continents and all seasons were produced.

Implementation Details
For the VAE, which comprises the upper right part of Subfigure 3a, we chose to rely on the encoder-decoder architecture with skip connections that has provided the best results in (Deshpande et al., 2017).As for the losses, we also follow the optimal case, i.e. we use the full decoder loss as a sum of color rebalancing loss Lhist, the specifity loss Lmah, and the regularizing loss Lgrad.
For the MDN training, as suggested by Deshpande et al. (2017), we do not feed the SAR image ISAR directly into the MDN, but rely on the features extracted up to the conv7 layer of the colorization network proposed by Zhang et al. (2016), which was pre-trained on the 1.3 million images from the ImageNet training set (Russakovsky et al., 2015).

Training
We trained the network depicted in Subfigure 3a in a two-stage manner.Firstly, the VAE network was trained on 252,384 Labbased fusion images using standard Adam optimization (Kingma and Ba, 2015) implementation for 15 epochs.The optimization hyper-parameters are fixed to β1 = 0.9, β2 = 0.999 with a learning rate of αt = 5 • 10 −5 .
The second stage was to train the MDN network.Training was performed using the corresponding 252,384 conv7 gray-level features, as well as the latent codes produced by trained VAE for each of the Lab-based fusion training images.The MDN network was trained for 7 epochs, with the same β parameters as the VAE network and a learning rate of αt = 10 −3 .
The β parameters are the default parameters recommended for the Adam optimization algorithm, while the learning rates were based on the details provided by Deshpande et al. (2017).

COLORIZATION RESULTS
For evaluation of the colorization capabilities of the architecture described in Section 4, we order the MDN-predicted gaussian mixture model means µ i in descending order based on the mixture weights πi and then display the results of the top-8 means for some example images of our test set comprising 1024 images.
Some colorization examples drawn from the test set are shown in Fig. 4. It can be seen that Subfigures 4a to 4d can be considered as successful colorizations, as the target Lab appearance was well met, whereas the color adds valuable information to the SAR image, which supports general interpretability.For example, in results 4a, the rectangular field-like structures could possibly also have been caused by a fish farm or desalination ponds, while the optical color information rather suggests we are looking at something like rice fields.Another interesting phenomenon to be noted is the correct prediction of blue roofs in example 4d.While blue roofs are not uncommon in parts of China, they are rarely seen in the rest of the world.The colorization network, however, has successfully learned to predict them where applicable.
In contrast to these promising results, Subfigures 4e and 4f clearly belong to the class of failed examples.In these cases, the network does not only fail to match the target Lab color distribution, but also does not add any valuable information that would help image interpretation endeavours.Probably caused by the fact that many natural surfaces show a green or brown color appearance, the colors are predicted close to the corresponding expectation value.
The results indicate that colorizing SAR images based on a combination of Lab-based SAR-optical image fusion and a deep generative neural network architecture is able to produce artificial color SAR images, which can greatly support human operators in the task of image interpretation.

SUMMARY AND CONCLUSION
In this paper, we have shown an approach for the automatic colorization of SAR backscatter images, which are usually provided in the form of single-channel gray-scale imagery.Using a deep generative model proposed for the purpose of photograph colorization and a Lab-space-based SAR-optical image fusion formulation, we are able to predict artificial color SAR images, which disclose much more information to the human interpreter than the original SAR data.Future work will aim at further adaption of the employed procedure to our special case of multi-sensor remote sensing imagery.Furthermore, we will investigate if the low-level representations learned intrinsically by the deep network can be used for SAR image interpretation in an end-to-end manner.

Figure 1 .
Figure 1.The visible gamut within the Lab color space.c Wikimedia Commons / CC-BY-SA-3.0 linearly mapping from RGB space and Xn = 95.047,Yn = 100, Zn = 108.883are defined by the CIE standard illuminant D65 (Noboru and Robertson, 2005).Afterwards, the optical luminosity L is replaced by a SAR-derived pseudoluminosity LSAR = 100 • σ 0 , (5) where σ 0 is the terrain-corrected SAR backscatter coefficient in decibel scale and the factor 100 is needed to bring σ 0 ∈ [0; 1] to the range of luminosity L ∈ [0; 100].Then the new Lab color triplet p Lab = (LSAR, a, b) is transformed back to RGB color space to obtain the fused image ILab.Depending on the preparation of the SAR-derived luminosity, ILab is similar to the example shown in Fig. 2. We use such Lab-fusion-based artificial color SAR images as targets in the training of the colorization model described in Section 4.

Figure 4 .
Figure 4. Some test-time examples for colorized Sentinel-1 SAR images.Top row in every subfigure: SAR image (left), target Lab image (center), corresponding optical image (right).The remaining 8 images in each subfigure show the top-8 colorization samples drawn from the learned color distribution.