Unsupervised Change Detection in Optical Satellite Imagery Using SIFT Flow

: The process of identifying change in remote sensing images has been a focal point of research for decades now. Many classical algorithms exist, and many new modern ones are still being developed. These algorithms can be divided into supervised and unsupervised. In this work an unsupervised method is presented. This method relies on the scene alignment algorithm SIFT flow. It is shown that building upon simple principles an accurate change map can be obtained from the SIFT descriptor flow of the two input images. Furthermore, it is shown that this method despite its simplicity exceeds other unsupervised methods and comes close to supervised ones, even exceeding them in some metrics. Lastly, the advantages of SIFT flow in comparison to the supervised methods are highlighted alongside its own downsides.


INTRODUCTION
Change detection in optical satellite imagery has been studied extensively throughout the years. Its utility is paramount in many applications in the remote sensing filed. For example, change detection can be used to assess urban growth, urban planning, monitoring the effect of forest fires and other natural disasters, and many other applications. Methods used for change detection can be split into two categories: supervised methods and unsupervised methods. Supervised methods use labeled data in their training process, and later make predictions based on it. On the other hand, unsupervised methods make predictions directly, without the need for any training (Ban and Yousif, 2016).
Due to lack of data up until recent times, most of the methods developed for change detection belong to the unsupervised category. In (Celik, 2010)expectation maximization algorithm is carried out on wavelets extracted by dual-tree complex wavelet transform. These wavelets are extracted from the difference image, while in (Hao et al., 2014) expectation-maximizationbased level set method is used instead calming to yield better results. In (Celik, 2009a) the proposed method is based on segmenting the difference image to non-overlapping blocks, applying principal component analysis, and then using k-means to obtain reference vectors for each cluster. In the end features for each pixel are extracted, compared to the reference vectors and assigned to each of the two clusters. In (Li et al., 2015a) Gabor wavelets followed by two-levels clustering was implemented. (Bruzzonel and Femluldez, n.d.) modeled the likelihood function of the data using Parazan windows to make the final decision. In 1986 an error minimizing algorithm for defining a threshold over a difference image was suggested by (Latifovic and Pouliot, 2014), this algorithm was developed an improved and used vastly later. For example, (Melgani et al., 2002) used the algorithm over a difference image of multispectral images, (Bazi et al., 2005) and (Moser and Serpico, n.d.) expanded the algorithm to deal with the intensity and amplitude images of SAR which do not have a Gaussian nature. In (Bazi et al., 2006) the same algorithm was extended to a multi-threshold case, but it causes the algorithm to be more computationally expensive. The algorithm was further improved by modifying it to work on the log-ratio image for SAR images in (Ban and Yousif, 2012).
In general, more recent supervised or semi supervised methods such as manage to produce better results than their unsupervised counterparts. For example, (Zhang et al., 2018) created a semi supervised method, that utilizes coarse to fine detection scheme to combine the benefits of supervised learning and mitigate the lack of data. However, In recent years new datasets have been introduce such as the one (Chen and Shi, 2020), which also suggested a new supervised model based on attention Pyramid spatial-temporal attention module (PAM). This dataset opened the flood gates for deep supervised models such as the two models presented by (Diakogiannis et al., 2021). Both models are based on attention mechanism, but the more advanced CEECNET showed the most promising results. Alas, supervised methods suffer from two downsides, first, supervised methods may produce better results when the test data is either from the same dataset as the training or similar to it, however, they tend to underperform whenever the test data does not resemble the training dataset. The second problem, which is a generalization of the first one, which is that the amount of data needed to train the more advanced supervised methods such as deep networks is large, and the amount of data that would be needed to train a supervised model to generalize for large, diverse areas is almost nonexistent.
Unsupervised methods however do not need any labeled data. They depend on the information within the pair of images to make assessments and locate the area of change for example in (Celik, 2009b;Li et al., 2015b). Many unsupervised methods in change detection either use a difference image or ratio image as based approaches to produces their final result. These methods are quite successful in determining the difference between two sets of images. However, for these methods to work the pair of temporal images must go through a preprocessing step to ensure that their histograms match exactly. Furthermore, any radical change in lighting conditions or sensor inconstancy may affect the results massively. Moreover, unsupervised methods lack the specialization that supervised methods have, they cannot be trained to pay attention on one type of change and ignore the other.
In this work, an unsupervised method for change detection is presented. This method is based on the Scale Invariant Feature Transform (SIFT) (Lowe, 1999) and its flow between to images: SIFT flow (Liu et al., 2011). In the first section of this article will be a brief introduction to SIFT features and the SIFT flow algorithm. The second section will show the simple, yet effective approach used to transform SIFT flow into a change detection method. In the third section, visual results of SIFT flow are shown and discussed, also numerical results are examined to compare the performance of SIFT flow to other unsupervised and supervised methods. The final section offers a summary of this work and suggestions for future works based on it.

SIFT FLOW
SIFT flow is an alignment algorithm that was inspired by optical flow. However, instead of matching pixel intensity values, SIFT feature descriptors (or SIFT descriptors) are used. For this reason, in the following subsection a short explanation of the SIFT features is conducted, followed by the matching scheme used by SIFT flow.

Scale Invariant Feature Transform
The SIFT algorithm consists of two parts: key point detection and feature extraction. The latter being the only relevant part for SIFT flow and this work. Typically, SIFT features are simple and easy to compute. Given a key point the neighbouring × pixels are selected, and their orientation is calculated using (Lowe, 2004) : Where ( , ) is the gradient around the pixel located at ( , ), Then this block of pixel orientations is divided into cell arrays (typically 4 × 4) and the histogram for each cell is calculated using 8 bins. Finally, the resulting vector is normalized and clipped where only values larger than 0.2 are kept. For a 16 × 16 window divided into 4 × 4 cells with quantization of 8 bins, the resulting feature vector has the size of 4 × 4 × 8 = 128 (Lowe, 1999).

SIFT Flow
The SIFT Flow algorithm takes advantage of the scale invariant nature of the SIFT features. In SIFT flow SIFT descriptors are calculated for every pixel in the image. In other words, for every pixel location ( , ) a 128 SIFT descriptor is generated. This gives rise to what is called a SIFT image, a SIFT representation of the original image that has the size of × × 128 where × is the size of the original image. Since any SIFT descriptor in the first image may match with any SIFT descriptor in the second image (regardless of their locations) SIFT flow utilizes a top-down matching scheme over a gaussian pyramid as seen in Figure 1. This method reduces the number of possible matches and increases the quality of the matching (Liu et al., 2011).  (Liu et al., 2011) The energy function seen in equation (1) used to determine the best matching SIFT descriptors is inspired by the energy function of optical flow. It has three terms to ensure smooth and consistent matching. The first term is the data term which indicates the distance between the descriptors. The second term is referred to as the small displacement term, which is there to ensure that the flow vectors remain as small as possible when no other information is available. The third term is the smoothness term, which is used to make sure that neighbouring points have a uniform flow.
In this equation: = ( , ) represents the grid coordinates of any given image, ( ) = ( ( ), ( )) represents the flow vector at and ( ) ( ) represent its components in the and directions respectively. 1 and 2 are the two SIFT images desired to be matched. and are thresholds for the L1 norm used in the first and last terms, they are used to increase robustness in matching outliers and prevent discontinuities in the flow field respectively. and are hyper parameters used to control the second and third terms of the cost function respectively. Figure 2 shows example of SIFT flow being used for image registration and scene alignment.

PROPOSED METHOD FOR CHANGE DETECTION USING SIFT FLOW
SIFT flow algorithm's goal is to find the best matching SIFT descriptors between two images. It can be and has been used in many applications such as image registration, face alignment, motion hallucination and much more. However, change detection is very much different from scene alignment and image registration. This section presents simple yet effective method to utilize this algorithm for change detection. This novel approach is based on two basic principles and a thresholding process, both are explained in the following subsections.

Basic principles of change detection using SIFT flow
Change detection using SIFT flow is based on two simple yet strong assumptions: First, satellite images are already registered. This is a fact of life, optical satellite images used for change detection must represent the same geographical area. In other words, every pixel in each image represents the same point at different times. The second assumption is that SIFT flow for two identical images must be or is close to zero. This stems from the energy function of SIFT flow and the way it was built. If two pixels and their corresponding SIFT descriptors have not undergone enough significant change, they will match with each other.
By combining these two assumptions together: since a pair of multitemporal registered satellite images represents the same area pixel by pixel, and SIFT flow is zero or small when the pixels did not go significant change, it can be said that the regions where the flow vectors are small or close to zero are unchanged while the areas with large SIFT flow are changed.

SIFT Flow to Change Map
SIFT flow produces flow vectors in the and directions of the image. In other words, for a pixel location there exists a flow vector in the direction ( ) and a flow vector in the direction ( ). To generate the change map, the intensity of the flow field must be first determined. This is done by taking the norm if the flow vectors in both directions at each pixel location: The change map (CM) is calculated by thresholding the intensity of the flow vectors for all pixel locations. If is the flow intensity at pixel location and is the intensity of the flow for all pixels in the image, then the change map CM is calculated as follows: Where T is a threshold value determined empirically. In summary, the method proposed uses the two-dimensional flow vectors obtained from SIFT, calculates the intensity of these vectors, and applies a threshold on the resulting intensity image. This process leads the generation of change map if the pair of images has been registered. The flow chart of the proposed method can be seen in Figure 3.

RESULTS AND DISCUSSION
Firstly, in this section, the dataset used will be briefly described.
Later, results of SIFT flow-based change detection are presented visually. Lastly, numerical results comparing SIFT flow change detection to other unsupervised and supervised methods are shown and discussed.

Dataset
The dataset used in this work is the LEVIR-CD dataset (Chen and Shi, 2020). It is made up of 637 very high resolution (VHR) image pairs. Each image has the spatial resolution of 0.5 meters. This dataset was compiled specifically to represent the change in residences through the region. Since SIFT flow is an unsupervised method, only the test images are used in the calculation of error and visualization of results. This facilitates comparability between SIFT flow and supervised methods that need the training data for training. Example images of the dataset can be seen in Figure 4.

Results and Discussion
In this section both visual results for SIFT flow-based change detection, and numerical results are presented. These results are compared to 3 different methods: PCA-Kmeans change detection algorithm (Celik, 2009b), which a difference based unsupervised method, (PAM) (Chen and Shi, 2020), a deep supervised method that was presented alongside the dataset, and CEECNet (Diakogiannis et al., 2021), another deep supervised network that utilizes attention and residual network architecture.

SIFT Images
The prerequisite for SIFT flow is to generates SIFT images, these images are equal to the size of the image but have 128 dimensions instead of 3. However, SIFT images can be projected to 3dimensional space using principal component analysis (PCA). This way SIFT images can be visualised. Figure 5 shows a few examples of SIFT images generated for pairs of images from the dataset.

SIFT Flow Visual Results:
Upon closer examination of Figure 6, It can be noticed that the resulting change map of SIFT flow is clustered around the change region. Specifically, in part (b) of Figure 6, the SIFT flow change map, although not precise, managed to encapsulate the region of change. In part (a) however, the change map is sparser, yet it still corresponds to the location of change seen in the ground truth.  Results seen in Figure 7, put SIFT flow performance in preceptive. Although the change map produces by it is not as detailed or precise as the change map produces by CEECNet, SIFT flow's result is concentrated around the change area.
Clearly it is not as good as the detailed result of CEECNet, however, it is important to remember that SIFT flow did not use any examples from the training or validation dataset. This make it very useful in cases where just general knowledge of the change area is needed, not minute details such as forest fire detection. Furthermore, since no pre-processing was conducted, the results of PCA-Kmeans has been affected massively, SIFT flow however was robust to the differences between the histogram of the two images and still managed to find the correct area of change.

Numerical Results:
The test dataset consists of 128 images. The supervised models are trained using the 445 images and validated on 64 images. PCA-Kmeans and SIFT flow are directly applied to the test dataset. No pre-processing was conducted for any of the models. Common metrics such as precision, precision and F1-score are used as evaluation. The results can be seen in Table 1.
Where TP is true positive, FP is false positive, FN is false negative. As seen in Table 1. SIFT flow-based change detection achieves F1-score of almost 82 percent, which mainly due to its high recall value. This is consistent with the visual results shown in Figure  ß6 and Figure 7. The high recall value is due to the fact that SIFT flow identifies the regions of change accurately. However, the precision of the change maps generated by SIFT flow is not on par with supervised models. PAM and CEECNet are modern supervised models, therefore they perform well in allocating the change precisely. However, since they are supervised, they are prone to overfitting and falling victim to the problem of dataset bias. SIFT flow results rival that of PAM and are almost identical to its vanilla version which has an F1-score of %83.9 (Chen and Shi, 2020). However, it has not been trained on this dataset, and similar performance can be expected for different datasets and even in situation where no data exist, which is much more common in practice.

CONCLUSION
In this work a novel method for change detection in optical satellite images has been introduced. The method utilizes the SIFT flow algorithm which is typically used for image and scene alignment. By making two simple assumptions, SIFT flow's result is used to produce an accurate change map. SIFT flow rivals deep supervised models when it comes to common accuracy metrics and even exceeds them in some. Furthermore, SIFT flow-based change detection has big advantages in comparison to other models such as not needing any training data and not requiring any pre-processing. This makes it very universal and applicable in all situations regardless of data scarcity. However, SIFT flow-based change detection has its downfalls such as lower precision and lack of semantic awareness. For these reasons, it is recommended that further research to be conducted to improve the precision of the change maps produced by this method. Furthermore, its universal nature can be used with other models to produce good results when training data does not exist or is not sufficient for training.