KERNEL-COMPOSITION FOR CHANGE DETECTION IN MEDIUM RESOLUTION REMOTE SENSING DATA

A framework for multitemporal change detection based on kernel-composition is applied to a multispectral-multitemporal classification scenario, evaluated and compared to traditional change detection approaches. The framework makes use of the fact that images of different points in time can be used as input data sources for kernel-composition a data fusion approach typically used with kernel based classifiers like support vector machines (SVM). The framework is used to analyze the growth of a limestone pit in the Upper Rhine Graben (West Germany). Results indicate that the highest accuracy rates are produced by the kernel based framework. The approach produces the least number of false positives and gives the most convincing overall impression.


INTRODUCTION
Although the availability of modern remote sensing datasets increases -e.g.hyperspectral, high resolution optical satellite imagery, interferometric synthetic aperture radar -many change detection applications continue to require traditional datasets.The fact that e.g.Landsat data are available since 1972 makes them a valuable source of information for four entire decades.They can be seen as a way to recover information on past environmental conditions which are not observable any more by field campaigns.However, traditional methods like post-classification change detection based on overlaying classification maps raise accuracy issues (Serra et al., 2003).Therefore, more sophisticated change detection methods have been proposed in literature.Some approaches model the probability of transition from one class to another.Another approach is change detection based on kernel-composition (Camps-Valls et al., 2006b).These issues are exemplified on a change detection application from Upper Rhine Graben.Changes in the landuse are outlined with special focus on the construction of a limestone pit which continuously grows replacing near-natural ecosystems.Kernel based classifiers -like the well know support vector machine (SVM) (Boser et al., 1992), (Cortes and Vapnik, 1995) -work on kernel matrices.These kernel matrices represent the similarity between data points in high dimensional feature spaces (reproducing kernel Hilbert spaces, RKHS).The SVM chooses the most suitable points by optimizing a target function on the kernel choosing only a few training data points as SVs.These points are used to define a separating hyperplane which is usually non-linear in the input space.The conceptual advantage of kernel-composition is, that different kernel functions can by combined e.g. by addition, thus performing data fusion in the RKHS during classification.This circumstance is usually employed for data fusion (Tuia and Camps-Valls, 2009).However, it can also be employed for change detection.In (Camps-Valls et al., 2008), (Camps-Valls et al., 2006a) well know change detection techniques -like image differencing -are adapted for kernel-composition based approaches.An entire framework for change detection and multitemporal classification is presented.This framework is evalu-

MATHEMATICAL FOUNDATIONS
Within this section, the mathematical foundations of the main methods used will be given.A high number of profound introductions into the SVM problem (Burges, 1998), (Ivanciuc, 2007), (Zhang, 2001), (Schölkopf and Smola, 2002), (Camps-Valls and Bruzzone, 2009) and valuable reviews on the application of SVM in remote sensing (Mountrakis et al., 2010), (Plaza et al., 2009) have been published.For this reason, the foundations of SVM and state-of-the-art application examples will not be given exhaustively, but strictly focused on the kernel-composition problem.

Kernel matrices and the SVM problem
Given a data set X with n data points, kernel matrices are the result of kernel functions applied over all n 2 tupels of data (Shawe-Taylor and Cristianini, 2004).The outcome of a kernel function Kx i ,x j = f δ (xi, xj) is a similarity measure for the two training data xi and xj depending on some distance metric δ.Usually, δ is the Euclidean distance (Mercier and Lennon, 2003).However, kernel functions can be modified by e.g. by introducing different similarity measures (Amari and Wu, 1999).For instance, (Mercier and Lennon, 2003) and (Honeine and Richard, 2010) use the spectral angle as a similarity measure for hyperspectral SVM classification.To model complex distributions of the training data in the feature space, f δ is usually some nonlinear function.The most frequently applied family of non-linear functions are Gaussian radial basis functions (RBF) (Schölkopf et al., 1997).The closer two points are found in the feature space, the higher is their resulting kernel value.Given these facts, the kernel matrix simply represents the similarity between the points of the training data set.To understand how the kernel matrix is used in SVM classification, it is helpful not to look at the primal, but the dual formulation of the SVM problem (Ivanciuc, 2007).
The dual problem is given by Eq.1 The Lagrange multipliers λi are only greater than zero for the support vectors.These are usually identified by sequential minimal optimization (Platt, 1998).Hence, only training data which are both SVs contribute to the solution of Eq.1 (for all other cases, λiλj = 0 setting the second part of Eq.1 to zero).The class labels yi are in [−1, 1].Since the second part of Eq.1 is subtracted, only points with different class labels can maximize the term (their product yiyj = −1 renders the second part positive).
The problem is therefore maximized, if points are chosen as SVs which have different class labels but are found close to each other in the feature space (thus yielding a high value in the kernel matrix K(xi, xj)).Thus, the similarity values of the kernel matrix are used for finding the best suited training points as SVs.By setting the λi of all other points to zero, a sparse solution is found which only depends on the SVs.

Kernel-Composition
As can be seen in Eq.1, the training data xi do not enter directly into the SVM problem.In contrast, the data are represented by kernels K(xi, xj).According to Mercer's theorem (Mercer, 1909), valid kernel functions can be combined, e.g. through addition, to form new valid kernels.From there, different sources of information on the same training data can be fused through simple arithmetical operations (Camps-Valls et al., 2006b).For instance, KC ) fuses the information domains A and B on the training data xi, xj and forms a new kernel KC .Within the original framework on kernel-composition for data fusion (Camps-Valls et al., 2006b), the following fusion approaches are published.
Eq.2 is called direct summation kernel, the most simple form of kernel-composition.Eq.3 is called weighted summation kernel.Its main advantage is, that the weighting parameter µ ∈ (0, 1) allows to regulate the relevance of the two data sources A and B for the classification problem.Eq.4 is called the crossinformation kernel.It consist of four single kernels while the last two KAB and KBA allow incorporating the mutual information between the data sources A and B (e.g.differences between the value of both data sources yielded for a particular data point).
Based on these basic composition approaches (Camps-Valls et al., 2008), (Camps-Valls et al., 2006a) extend the kernel-composition framework to the field of multitemporal classification and change detection.The key idea is to use images from the same landscape but from different points in time as input data for kernelcomposition and SVM classification.Given two points in time t0 and t1 two kernels Kt0 and Kt1 are built.These kernels only incorporate the spectral information given at each point of time.
Then, a new kernel can be build using one of the Eqs.2 to 4. For instance, K Change (x C i , x C j ) = Kt0(x t0 i , x t0 j ) + Kt1(x t1 i , x t1 j ) represents a direct summation kernel which incorporates the information about the change of the spectral responses of pixels implicitly.Although the basic composite kernels can be used for multitemporal classification as well, the authors developed specialized kernels in order to combine traditional change detection techniques with kernel-composition.For instance, the image difference kernel is introduced in Eq.5.
Note that Eq.5 is a particular case of the cross-information kernel (Eq.4) that performs the change detection technique of image differencing in the RKHS.

RELATED WORK
Within this section, an overview on relevant contribution from the field of change detection and kernel-composition will be given.Since kernel-composition has been introduced only in 2006, it has been dedicated far less research than change detection in general.

Change detection and multitemporal classification
Herein, a short outline on important reviews and state-of-the-art papers in change detection is presented.A very comprehensive introduction into multitemporal classification is given by (Gillanders et al., 2008).(Singh, 1989) and (Coppin et al., 2004) present reviews with emphasis on signal processing.(Wang and Xu, 2010) give a comparison on change detection methods emphasising particular aspects of different applications.(Holmgreen and Thuresson, 1998) and (Wulder et al., 2006) (Kennedy et al., 2009).(Almutairi and Warner, 2010) present important considerations on accuracy assessment and the influence of accuracy to the final change detection result.(Van Oort, 2007) presents an insightful contribution on the importance of the error matrix of multitemporal classification.Some state-of-the-art papers are given e.g. by (Bruzzone and Serpico, 1997) and (Bruzzone et al., 2004) which present an iterative approach for change detection.(Coops et al., 2010) apply Landsat time series for assessing forest fragmentation.(Dianat and Kasaei, 2010) use a polynomial regression technique for change detection which considers neighborhoods.Application schemes based on SVMs are presented by (Nemmour and Chibani, 2006), (He and Laptev, 2009) and (Bovolo et al., 2008).(Mota et al., 2007) and (Feitosa et al., 2009) present fuzzy approaches based on modeling the class transitional probabilities.

RESULTS
Within this section, results for three change detection approaches will be presented.The main objective is to monitor the growth of a limestone pit close to the village Mauer (near Heidelberg) in the Upper Rhine Graben, Germany, Latitude: 49 • 19'50"N, Longitude: 8 • 48'1"E (cf.Fig. 1).Two 122 × 135 pixel (≈14.8km2 ) subsets of Landsat ETM+ images are used.The first is from 02-11-2001 (Fig. 2(a)), the second is from 17-4-2005 (Fig. 2(b)).Fortunately, the village is located in the very center of both images, so the failure of the Landsat ETM+ scan line corrector does not affect the work at all.Between the two points in time, the limestone pit has considerably grown.Although more landuse classes are assigned for classification in the first place (e.g.meadows, forests, settlements), only two classes of interest will be considered in the final result: LS-Pit present in 2001 (yellow) and LS-Pit new between 2001 and2005 (red).Complete groundtruth has been made available by a digitization in the field and is shown in Fig. 2(c).Except the limestone pit, all other landuse classes will not be considered and set to black.At first, change detection based on a post-classification approach will be employed.
Secondly, features will be stacked to represent the change of pixels intensities between the two points in time.Lastly, kernelcomposition approaches based on (Camps-Valls et al., 2008), (Camps-Valls et al., 2006a) and(Camps-Valls et al., 2006b) will be employed.These approaches also incorporate changes in pixels intensities.All change detection approaches are based on image classification.Each classification was done using an SVM with a Gaussian RBF kernel1 .Kernel parameters were tuned using a 5fold grid search in the ranges γ ∈ [2 −15 , 2 5 ] and C ∈ [2 −5 , 2 15 ].The LibSVM 3.11 library was utilized (Chang et al., 2001).

Post-classification approach
Two classifications are performed using the same landuse classes in each dataset.

Stacked-features approach
In order to provide implicit information on the changes of pixels intensities, a stacked-features approach was performed.The data matrices of both 8 channel Landsat datasets were concatenated to build a 16 channel data matrix.Within this feature space, the two

Kernel-composition approach
The last approach performed is similar to the stacked-features approach.In order to incorporate information on e.g.color changes, both datasets are combined to a new dataset.However, it is aimed to perform the fusion not in the feature space, but in the RKHS.Therefore, a composed kernel matrix was build, e.g. by The following kernelcomposition approaches were followed: direct summation (Eq.2), weighted summation (Eq.3), cross-information (Eq.4) and image differencing (Eq.5).The overall accuracy values for each approach can be seen in Tab.1.The best overall accuracy yielded  by the direct summation approach on the two classes is 88.8%.
As can be seen, all kernel-composition approaches yield slightly higher accuracy values than the other approaches.Although specifically designed for this task, the image difference kernel does not yield the highest accuracy value.However, the performance difference to the direct summation kernel is only 0.2 percent points -a value which should not be over-interpreted.It should be noted that more simple kernels yield better results than more complex ones.A result which is in agreement with the findings of the inventors of the framework (Camps-Valls et al., 2006b), (Camps-Valls et al., 2008), (Tuia and Camps-Valls, 2009).A visual result after setting other classes to black is given in Fig. 2(f).There are much less false positives than in the other approaches.The only exception is a large barren field where open Loess soil mixed with limestone rocks is found (south-east corner of the image).
The spectral characteristics of this field are very similar to the limestone pit thus making the classifier susceptible for confusion with the limestone pit.

McNemar's Test
The advantages in overall accuracy of e.g.kernel-composition over post-classification may seem only a slight gain.Therefore, they were tested for significance using McNemar's test (Foody, 2004).McNemar's test is based on χ 2 statistics and can be employed to test the significance of differences between two nominal labellings.The advantage is considered as significant if the resulting test value |z| ≥ 1.96.Testing the advantage of kernelcomposition over post-classification yielded |z| ≈ 13.95 indicating a significant advantage.A test of kernel-composition against the stacked-features approach yielded |z| ≈ 6.50 which also is significant.The stacked-features approach yielded |z| ≈ 10.16 over post-classification.Thus, all advantages described are significant.

DISCUSSION
We present a comparison of change detection based on kernelcomposition with two traditional methods, post-classification change detection and a stacked-features approach.As Fig. 2 Approaches based on kernel-composition produce the most accurate results.However, there are slight differences when using kernel-composition that depend on the type of composition.
In agreement with other authors, more simple composition approaches tend to produce better results than more complex ones.In our case, the direct summation and the image difference kernel produced the best results.The highest overall accuracy yielded is 88.8% by a direct summation kernel.There are almost no false alerts and the change between 2001 and 2005 becomes most clearly visible.The advantages between the approaches may seem only moderate.However, in the major part of the pit, a separation between the two limestone pit classes seem to be quite simple and therefore, all approaches yield good results in a large part.Furthermore, it has to be kept in mind that a major source of error comes from confusion between the limestone pit and natural outcrops of limestone or open chalky soils.Confusion between these landcover types narrow the differences between the change detection approaches.It should be noted though, that false positives based on this source of error are much less for kernel-composition approaches and more concentrated to single spots.According to McNemar's test, the advantage in overall accuracy of the kernelcomposition approach over the other approaches are significant.
The reason for the advantage of the kernel-composition approach and the stacked-features approach over post-classification change detection is straightforward.While post-classification change detection does not include any information on the change in pixels intensities between the two points of time, both kernel-composition and stacked-features do incorporate this information implicitly.However, the advantage of kernel-composition over the stackedfeatures approach is remarkable.Both approaches include information on the change in pixels intensities.However, kernelcomposition appears to be a better suited technique to exploit this information.It is assumed that the main advantage lies in the fact, that the kernel-composition represents this information in the RKHS, while the stacked-features approach represents it in the original feature space.Since SVMs operate in the RKHS when finding their optimal solution, kernel-composition and SVM seem to be a more suitable combination for representing this implicit information.

CONCLUSIONS
Kernel based change detection is a conceptually elegant and useful method for change detection and multi temporal classification.
Standard techniques like image differencing can be executed in RKHS, thus benefiting from the advantages of kernel based SVM classification.Changes in landuse for the given dataset from Upper Rhine Graben can be visualized and furthermore quantified with high precision.In future work, the approach will be tested on more complex change detection problems.

Figure 1 :
Figure 1: Limestone pit near Heidelberg classes LS-Pit present in 2001 and LS-Pit new between 2001 and 2005 can be distinguished in a single classification step.While the class LS-Pit present in 2001 shows grey color in both images, LS-Pit new between 2001 and 2005 would change from e.g.green to grey.This indicates that a change from e.g.meadows to limestone pit has taken place between the two points in time.From there, only one classification needs to be performed.The overall accuracy on the two classes is 87.5%.A visual result after setting other classes to black is given in Fig.2(e).Note that the amount of false positive pixels is considerably reduced.False positives are now assigned to the class LS-Pit present in 2001.Much less pixels are found in LS-Pit new between 2001 and 2005.Since the latter class is characterized by a change in color from green to grey, it is less confused with bare soils which would stay grey in both points in time.

Figure 2 :
Figure 2: Landsat ETM+ image scenes, groundtruth and results present reviews International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B7, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia focused on applications of change detection of satellite images in forestry.A comparable contribution for landscape monitoring is given by The limestone pit is represented by a single class.Since classification of the two points in time is performed separately, it is not possible to assign the class LS-Pit new between 2001 and 2005 in the 2005 dataset.No features are available which indicate whether or not a pixel has belonged to the limestone pit in 2001 when classifying the 2005 image.From there, one class LS-Pit has been assigned in both datasets and the two classes of interest have been determined by overlaying the results afterwards.The overall accuracy on the two classes is 86.7%.A visual result after setting other classes (like meadows, forest, settlements) to black is given in Fig.2(d).Note the high amount of pixels falsely assigned to the class LS-Pit new between 2001 and 2005.Since the entire image scene is made up of limestone, many places of bare soils have been confused with the limestone pit.

Table 1 :
Approaches and overall accuracies