CHANGE DETECTION IN REMOTE SENSING IMAGES USING CONDITIONAL ADVERSARIAL NETWORKS

We present a method for change detection in images using Conditional Adversarial Network approach. The original network architecture based on pix2pix is proposed and evaluated for difference map creation. The paper address three types of experiments: change detection in synthetic images without objects relative shift, change detection in synthetic images with small relative shift of objects, and change detection in real season-varying remote sensing images.


INTRODUCTION
Change detection in the time-varying sequences of remote sensing images acquired on the same geographical area is an important part of many practical applications, e.g.urban development analysis, environmental inspection, agricultural monitoring.In most cases, solving the change detection task in manual mode is a highly time-consuming operation, which makes an automation of this process an important and practically demanded filed of research.
At present, the best results in the overwhelming majority of image analysis and processing tasks are delivered by methods based on deep convolutional neural networks (CNN).In this paper, we propose a new method for automatic change detection in season-varying remote sensing images, which employs such a modern type of CNN as Conditional Adversarial Networks.
The simplest direct comparison techniques are the image difference (Lu et al., 2005) and image rationing (Howarth, Wickware, 1981).Image regression represents second image as a linear function of first (Lunetta, 1999).
Change vector analysis (CVA) was developed for change detection in multiple image bands (Im and Jensen, 2005;Bayarjargal, 2006).Change vectors are calculated by subtracting pixel vectors of co-registered different-time dates.Principal component analysis (PCA) is applied for change detection in two main ways: applying PCA to images separately and then compare them using differencing or rationing (Richards, 1984) or merging the compared images into one set and then applying the PCA transform (Deng et al, 2008).Tasseled cap transformation (Kauth and Thomas, 1976) produces stable spectral components for long-term studies of forest and vegetation (Rogan et al., 2002;Jin, Sader, 2005).Some other texture-based transforms are developed in (Erener and Düzgün, 2009;Tomowski et al., 2011).
Machine Learning algorithms are extensively utilized in change detection.Artificial Neural Networks (ANN) are usually trained for generating the complex non-linear regression between input pair of images and output change map (Liu and Lathrop, 2002;Pijanowski et al., 2005).The Support Vector Machine (SVM) approach based on (Vapnik, 2000) considers the finding change and no-change regions as a problem of binary classification in a space of spectral features (Huang et al., 2008;Bovolo et al., 2008).Other machine learning techniques applied for change detection are: decision tree (Im and Jensen, 2005), genetic programming (Makkeasorn et al., 2009), random forest (Smith, 2010), cellular automata (Yang et al., 2008) and deep neural networks (Chu et al., 2016).
Object-based techniques operate with extracted objects.The Direct Object change detection (DOCD) approach is based on the comparison of object geometrical properties (Lefebvre et al., 2008;Zhou et al., 2008), spectral information (Miller et al., 2005;Hall and Hay, 2003) or texture features (Lefebvre et al., 2008;Tomowski et al., 2011).In Classified Objects change detection (COCD) approach the extracted objects are compared based on the geometry and class labels (Chant, Kelly, 2009;Jiang and Narayanan, 2003).The framework based on post-classification (Blaschke, 2005) presumes extracting objects and independently classifying them (Im and Jensen, 2005;Hansen and Loveland, 2012).Multitemporal-object change detection presumes that the joint segmentation is performed once for stacked (composite) images (Conchedda et al., 2008;Stow et al., 2008).
In contrary to all these approaches, our technique is based on machine learning and CNN, but it doesn't presume any object classification and performs change detection directly on the image level via GAN.

METHODOLOGY
In our change detection tasks, we consider image differences that correspond only to the appearance of new or disappearance of existent objects in a scene, rather than differences due to the season specific object changes (see Figure 1), brightness variations and other factors.Such problem of comparing feature domains from different images is solved by using domain adaption and transfer learning approaches, with the best results are delivered by Generative adversarial networks (GAN).As a result, CNN of the same type, namely "pix2pix" (Isola et al., 2017), was selected as a basic CNN model for our change detection method.(1) In turn, discriminator D learns to detect 'fake" images synthesized by generator G: The discriminator maps objects from the data space to [0,1] interval, which is interpreted as the probability that the example was "real".
As a result, D and G play the following two-player minimax game: where

Network architectures
Similarly to pix2pix, our model also contains two main parts: generator and discriminator.A distinctive feature of our implementation is that the generator applies the transformations to a pair of input images simultaneously and extracting features from these images.To do this, the concatenation procedure is applied to the input images of the generator.The generator is based on the "U-Net" network (Ronneberger et al., 2015).It is an encoder-decoder with skip connections between mirrored layers in the encoder and decoder stacks.
The discriminator is based on "PatchGAN" architecture (Isola et al., 2017).In our implementation, the discriminator takes three input images: two images for comparison and one image as a difference map, which can be the output from the generator or ground truth labels.The discriminator learns to distinguish between a difference map synthesized by the generator and ground truth labels.The discriminator structure is quite similar to the encoding part of the generator, but with output as a single value from 0 to 1.This value evaluates the realistic measure of difference map and the corresponding input images.
To train the discriminator, the generator synthesized the difference map, then the discriminator evaluates whether this difference map is fake or real for two input images.The discriminator parameters are adjusted based on the classification error.The training pipeline of the discriminator is shown on Figure 2.

EXPERIMENTS
In our study, we adhered to the rule "from simple to complex" in order to better understand how the proposed network stands the challenges.Therefore, we implemented three types of experiments: change detection in synthetic images without objects relative shift, change detection in synthetic images with small relative shift of objects, and change detection in real season-varying remote sensing images.

Experiments on the synthetic image dataset without object shifts
At the first experiment group, we tested performance of our CNN architecture on generated dataset of 12000 triples synthetic images with the dimensions of each image is 256x256 pixels.The   Precision and recall values during the tests of CNN were 0.92 and 0.93 respectively.As we noted above, Gaussian blur affects the quality of change detection results more than additive Gaussian noise including case of shifts of geometrical objects.Figure 6.Shows an example of change detection on synthetic images with small relative shift of objects.

Experiments on the real image dataset
At the third type of experiments, the proposed network architecture was evaluated using real images.For dataset generation we used season-varying remote sensing images of the same region, obtained by Google Earth (DigitalGlobe).We obtained 7 pairs of season-varying images with resolution of 4725х2700 pixels for manual ground truth creation and 4 seasonvarying image pairs with minimal changes and resolution of 1900x1000 pixels for adding additional objects manually.Spatial resolution of obtained images was from 3 to 100 cm/px.That allowed us to take into account objects with different sizes (i.e. from cars to big constructional structures), season changes of natural objects (i.e. from single trees to wide forest areas).Dataset was generated by cropping 256x256 randomly rotated fragments (0-2π) with at least a part of target object.Therefore, object center coordinates were unique and distance between object centers for each axis was 32 pixels.Finally, the dataset contained 16000 image sets with image size 256x256 pixels: 10000 train sets and 3000 test and validation sets.
Due to possible inaccuracy in manual ground truth labeling, we used Intersection over Union (IoU) metrics to assess change detection quality.For IoU calculation, firstly we extract connected regions from ground truth labels and difference map synthesized by the generator.An area is considered to be detected if IoU is greater than some threshold.Then, for the obtained classification values, the average values of Precision and Recall were calculated for the entire test dataset.For IoU thresholds equal to 0.5, the average Precision and Recall values were 0.26 and 0.32, respectively.Such low values are associated with poor detection of small sized objects (see Figure 7), to which our network was not originally trained.

Figure 1 .
Figure 1.Example of season specific object changes 3.1 Problem statement Conditional GANs learn a mapping from observed image x and random noise vector z, to y.The main components of a Conditional GAN are two competing neural networks: generator G and discriminator D. The generator G, on the basis of some space of latent objects features from input data x and a given a priori distribution pz(z), synthesizes output data y. = (, ): {, } → .(1)

Figure 2 .
Figure 2. The training pipeline of the discriminator At the next training step, the generator parameters are updated using classification error using the discriminator output and

Figure 3 .
Figure 3.The training pipeline of the generator In both cases, we use Adam as an optimization algorithm (Kingma et.al., 2015).Our objective is (Isola et al., 2017):  = arg min  max  ℒ  (, ) + ℒ 1 (),(5) first and second images are an RGB image pair (A and B) with a random homogenous background and random nonintersecting geometric primitives (square, round, rectangle, triangle) of random size and color.The third image is a binary symmetric change detection mask between A and B images.This dataset was split on 8000 training sets and 2000 validation and test sets.An object count limitation is 10.Some images were smoothed by a Gaussian filter with standard deviation in range 10 <  < 25.In addition, some images were noised by additive Gaussian noise with standard deviation in range 10 <  < 35.A and B images were smoothed or noised in 20% of cases of total image count and smoothed at first and then noised in 10% of cases of total image count.To evaluate detection results on synthetic images, we used pixel Precision and Recall values, since the difference mask should have a one-to-one correspondence.Precision and recall values during the tests of CNN were 0.95 and 0.96 respectively.Gaussian blur affects the quality of change detection results more than additive Gaussian noise.

Figure 4 .
Figure 4. Change detection on synthetic images without objects shifts: leftinput image A, middleinput image B, rightsynthesized difference map 4.2 Experiments on the synthetic image dataset with object shifts Since real images of the earth's surface, obtained at different times by different vision sensors, may have local discrepancies, then in the second type of experiments we tested the proposed network architecture for the case of objects small shifts.Figure 5 shows an example of change detection in case of 5 pixels object shift using the network which was trained on dataset without object shift.To demonstrate that the proposed network architecture can effectively detect changes in case of object shifts, we performed the network fine-tuning.For new training cycle, we created an additional dataset of 12000 triples synthetic images, which contains random shifts of objects represented both on images A and B. These shifts were in range [-5, 5] pixels in horizontal and vertical directions without intersections with the other objects and image boundaries.

Figure 5 .
Figure 5.An example of change detection in case of object shift using the network, which was trained on dataset without object shift: leftinput image A, middleinput image B, rightsynthesized difference map.

Figure 6 .
Figure 6.Change detection on synthetic images with small relative shift of objects: leftinput image A, middleinput image B, rightsynthesized difference map.

Figure 7 .
Figure 7.An example of poor detection of small sized objects: top leftinput image A, top rightinput image B, botton leftsynthesized difference map, botton rightground thruth map Therefore, in the assessment, we did not take into account objects with an area less than 500 pixels.The average Precision and Recall values for this case are shown in Table 1.The detection of small objects remains the subject of further research.Examples of change detection are shown in Figure 8.

Figure 8 .
Figure 8. Examples of change detection in remote sensing images: leftinput image A, middleinput image B, rightsynthesized difference map5.CONCLUSIONThe paper represents a specially modified Generative adversarial network (GAN) of "pix2pix" architecture for automatic change detection in season-varying remote sensing images.An extensive database of synthetic and real images was created and it will be uploaded for public access.The database contains 12,000 triples of synthetic images without object shift, 12,000 triples of model images with object shift and 16,000 triples of fragments of real remote sensing images.Performed tests have shown that the proposed CNN is promising and efficient enough in change detection on synthetic and real images.

Table 1 .
The detection of small objects remains the subject of further research.Examples of change detection are shown in Figure 8.

Table 1 .
The average Precision and Recall values on the test dataset.