A Kernel-Based Similarity Measuring for Change Detection in Remote Sensing Images

This paper presents a kernel-based approach for the change detection of remote sensing images. It detects change by comparing the probability density (PD), expressed as kernel functions, of the feature vector extracted from bitemporal images. PD is compared by defined kernel functions without immediate PD estimation. This algorithm is model-free and it can process multidimensional data, and is fit for the images with rich texture in particular. Experimental results show that overall accuracy of the algorithm is 98.9%, a little bit better than that of the change vector analysis and classification comparison method, which is 96.7% and 95.9% respectively. * Corresponding author The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLI-B7, 2016 XXIII ISPRS Congress, 12–19 July 2016, Prague, Czech Republic This contribution has been peer-reviewed. doi:10.5194/isprs-archives-XLI-B7-999-2016 999

Assume that is the decision function, the upper area of the hyperplane corresponds to the inside of ; the lower area corresponds to the outside of ; and hyperplane corresponds to decision function, namely, PD estimation (figure 1).
Figure1 illustrates the map of spatial data points in the original space non-linearly into the high-dimensional feature space.The PD estimation in the original space is converted into the process of seeking optimal separating hyper plane in the feature space.A data point on the hyper plane is non-support vector, while the points mapped on the hyper plane are support vector.The hyper plane can be defined as a combination of these points.The points that are mapped below the hyper plane are interval mistake samples.Of all the possible hyper planes, the one that matches optimal PD estimation satisfy the following optimization criterion:

L w w vl a w x
(5) Set the derivatives with respect to the primal variables , yielding Substituting equation ( 6) into equation ( 5) and using kernel function, then obtain the dual problem: If solve the problem of QP, then get the value of .All the samples of which are non-zero are called Support Vectors (SV).In accordance with any SV, see the following function: The decision function, namely , is .

CHANGE DETECTION BASED ON KERNEL SIMILARITY MEASURE
Assume that the local pixels of bi-temporal images to be compared are vector sets and respectively.Use two oneclass SVMs to train the samples and independently and get their PD functions respectively, yielding two regions and , equivalently, and two hyper planes and in feature space H.These two hyper planes are parameterized by ) , ( p p w and ) , ( q q w respectively.The vectors and define a two dimensional plane, denoted by ,which intersects the hyper sphere along a circle with center O and radius 1, as depicted in Figure 2. If there is no change in the physic local area, the vectors and are collinear, in other words, the two hyper planes are overlapped.The more change, the bigger the angle between the two hyper planes.Arc is the angle between the two hyper planes.However, the difference between the two hyper planes can't be completely defined by the angle, because translation exists between them.A better dissimilarity measure is shown as follows is the ratio between the intra-region distance and inter-region distance of bi-temporal images, which is similar to the Fisher criterion function.Big intra-region distance and small inter-region distance is corresponding to the significant change region.The above equations are defined in feature space.A key point of kernel method is that ) , ( q p D S must be computed in the input space.
Because the radius of the hyper sphere in feature space is 1(implemented by kernel function), the arc distance is ) , arccos( ) , ( The weight is expressed as the linear combination of support vectors: Similar calculation can be applied to ， ) arccos( ) , ( . Then, the similarity measure of the local area can be achieved, and change in this area can be detected by threshold.
In the least probable cases: if ) , ( tends to be zero, namely, the hyper-planes of two support vectors are approximately the same, change hasn't happened on the bitemporal images; if and tend to be zero, then Find improper regions in accordance with PD, for example, the smooth region, where all the vectors are almost the same, so it is impossible to get the support vector and accurate PD estimation.This can be accomplished by introducing an absolute term behind or .
An overall functional diagram of the proposed system is as Change is complex and universal, no existing approach is optimal and applicable to all cases.The conventional method of simple difference and threshold is linear method, our method is free of model and no-linear method, which can deal with more complex situation.Conventional class method depends on the samples and the samples selection is very difficult in multispectral image, the change detection method proposed by this paper is a unsupervised method, the samples in our method are not needed, which is similar to the Fisher Analysis.Both methods make use of the intra-region ratio and inter-region ration of bi-temporal images based on the Rayleigh Principle, which is similar to the Fisher criterion.However, the classification is different from Fisher Analysis.The direction of classification plane in this paper is defined by commutating two independent one-class support vectors, while kernel Fisher obtains the biggest variance of the projection vector by optimizing the algorithm.Kernel Fisher is fit for two-type classification.While in practice change conditions is much more complex.They may have different types, more training samples, high dimension and unknown PD and complex shape etc. Kernel Fisher is not applicable, and the method proposed in this paper is more feasible.

ALGORITHM FLOW AND EXPERIMENTS
The 2.83 version of LibSvm, developed by Professor Lin Zhiren, has been used in the training of two one-class support vectors.The kernel functions are normalized.Overall accuracy is adopted as evaluation criterion for the performance comparison, which is the ratio of the number of correct samples and all the testing samples.In addition, the accuracy is comprehensively evaluated by integrating Kappa coefficient.

A Survey on the Research Area and Basic Processing
Two bi-temporal TM images of 1998 and 2000 are adopted.
They are constituted by the other six bands except for the sixth infrared band.The bi-temporal images are pre-processed, i.e., precise geometrical correction (precise small-bin differential rectification, rectification accuracy is within 0.2 pixels), normalized radiometric correction and image segment.The images are 496×496 pixels and image resolution is 30 meter/pixel.The colorized images of 432 bands are as shown in Figure 3.Typical objects such as water bodies, buildings, roads, mountains and vegetation and so on are included in the testing images.given by the experiment are difference images of change, and separation threshold needs to be selected in accordance with experience.
(1) Change vector analysis: Take the difference absolute value of the corresponding bands of the bi-temporal images as change vector; a 543-band pseudo color image is as  From Figure 5 and Figure 6, it is obvious that both change vector analysis and classification comparison have detected change of the buildings and lake surface, but they got a little difference in such trivial changes as change of residential area and so on.The overall accuracy and Kappa Coefficient of the two methods are as shown in Table 1.Change vector analysis has better performance than non-supervised classification comparison.The reason is that it is difficult to define number of classification in classification comparison and classification and comparison are separate, which results in error accumulation; while change vector analysis has made full use of data of all bands equally well, and no other procedure produces error, so it has high accuracy.

Change Detection Results with Different Parameters
Use RBF kernel functions for experiment.Because the radius of the hyper sphere in the above algorithm shall be 1, kernel functions are normalized.It can be seen from the results of quantitative analysis (Table 2) and qualitative analysis (Figure 7) that this algorithm is not sensitive to v , because different values of it all can lead to good change detection results.In general, when keeps the same value, the larger v is, a better result it is, because larger v can result in more support vectors, so the defined hyper plane is more accurate.However, larger v can also result in longer operation time, because more SVs are taking part in operation.Taking both detection time and effect into account, v from 0.3 to 0.7 is comparatively better.The table also has shown that also has wide numeric area.In experiments, has little influence on the number of the SV, so the corresponding operation times are almost the same.However, large mainly reflects the overall and typical changes, while small can detect more trivial changes, because is the width of the basis function of radial direction, and the smaller it is, more precise PD estimation can be got and more detailed detection results can be achieved.Taking noises into consideration, detection accuracy falls slightly.In accordance with results of quantitative analysis, the best results can be achieved, when is between 0.01 and 0.1

Change Detection Results with Blocks of Different Sizes
Compare the change detection results, using RBF kernel function, and selecting blocks with the size of 3*3，5*5 and  It is seen from the results of quantitative analysis (Table 3) and qualitative analysis (Figure8, Figure 9) that the smaller the block is, the more detailed change can be detected and more accurate results can be achieved.Detection accuracies got from block of 3*3 and block of 5*5 are approximately the same, but the accuracy from the block of 7*7 drops greatly.On the other hand, the larger the block is, more pixels take part in operation, which will increase operation time obviously, and is not good for change detection.Overall small blocks such as 3*3or 5*5 are better.

CONCLUSION
This paper proposes an algorithm that detects change by comparing the PD of the feature vector of bi-temporal images.
The PD comparison is expressed as kernel functions, and change is detected using similarity measuring with kernel functions.PD is compared by defined kernel functions without immediate PD estimation.The results of experiments and theoretical analysis show that this algorithm is free of model; it can deal with any complex situation; it has no real PD estimation and PD comparison is achieved indirectly; it can process multidimensional data and multi-scale data; and is fit for the images with rich texture, the detection results have good visual effects.

Figure 1
Figure 1 The one-class SVM Mapping Figure 2 Sketch map of the high-dimensional feature space Dissimilarity measure for the feature space

Figure 3 .Figure 3
Figure 3 Overall functional diagram of the proposed system Figure 4 bi-temporal imagesThe experiment presents the change detection results from change vector analysis and post-classification comparison respectively first and evaluates the accuracy.Then select RBF kernel functions, and select different kernel parameters and different block sizes respectively to test the performance of the algorithm in this paper.The results of change vector analysis and classification comparison are compared to each other to check the performance of this algorithm.Some of the results given by the experiment are difference images of change, and separation threshold needs to be selected in accordance with experience.

Figure 5 Figure 6
Figure 5 The result of CVA v and are defined with different values, and then check the change detection results.Generally speaking, v is an upper bound on the fraction of interval mistake sample points and a lower bound on the fraction of SVs.Its value is between 0 and 1, 0 and 1 are included.is the width of the basis function of radial direction.It determines the similarity measure and shape of the decision surface.Change detections are compared by choosing v and with different values.A is applied to all experiments.A region of 05 .0 3*3 window size is processed every time.Namely, 9 samples are selected for one-class support vector training every time.With typical kernel parameters, the results of change difference are as shown in Figure 7. Overall accuracy and Kappa coefficient of the change detections with different kernel parameters are as shown in Table2.

Figure 7
Figure 7 some results with different parameters 7*7 respectively.When v = 0.2 and = 0.1, change difference results of different block sizes are as shown in Figure 8.The

FigureFigure 9
Figure 8 results with some blocks of different sizes

Table 2
Change detection results with different parameters