A HYPERSPECTRAL IMAGE CLASSIFICATION METHOD USING ISOMAP AND RVM

: Classification is one of the most significant applications of hyperspectral image processing and even remote sensing. Though various algorithms have been proposed to implement and improve this application, there are still drawbacks in traditional classification methods. Thus further investigations on some aspects, such as dimension reduction, data mining, and rational use of spatial information, should be developed. In this paper, we used a widely utilized global manifold learning approach, isometric feature mapping (ISOMAP), to address the intrinsic nonlinearities of hyperspectral image for dimension reduction. Considering the impropriety of Euclidean distance in spectral measurement, we applied spectral angle (SA) for substitute when constructed the neighbourhood graph. Then, relevance vector machines (RVM) was introduced to implement classification instead of support vector machines (SVM) for simplicity, generalization and sparsity. Therefore, a probability result could be obtained rather than a less convincing binary result. Moreover, taking into account the spatial information of the hyperspectral image, we employ a spatial vector formed by different classes’ ratios around the pixel. At last, we combined the probability results and spatial factors with a criterion to decide the final classification result. To verify the proposed method, we have implemented multiple experiments with standard hyperspectral images compared with some other methods. The results and different evaluation indexes illustrated the effectiveness of our method.


INTRODUCTION
Hyperspectral image (HSI) is a kind of extraordinary remote sensing image which is different from infrared image and other traditional image.We can not only obtain the spatial information of HSI as ordinary image, but also the distinct and more important spectral information of each pixel.Moreover, the spectral resolution of HSI is so high that the reflectance of each pixel can be nearly plotted as a smooth and continuous curve, which brings lots of benefits for research and its application, such as target detection, classification and anomaly detection (Wang Yiting, 2017).Because of its distinct spectral characteristic, the reflectance spectra has been taken as a key and decisive factor in the application of HSI and many algorithms only focus on the spectral information.However, this brings two problems: 1) because of so many bands HSI contains, the dimension of hyperspectral data could be very large, and the computation cost will be very huge, especially when optimization exists; 2) although identifying different materials based on their reflectance spectra is a necessary and effective way to do HSI processing applications (C.I. Chang, 2003), spatial information which could be also important and useful for higher accuracy is ignored.For the first issue, researchers find that there are strong correlation among spectral bands, which means redundancy exists in HSI.Thus we can do dimension reduction (DR) with little information loss.On account of the nonlinearity of HSI, manifold learning based on graph embedding framework is an appropriate method to handle this problem (Ma Li, 2010).Manifold is a kind of curly space with local Euclidean space properties, such as spherical, curved surface.It's more suitable for HSI data structure compared with Euclidean space.ISOMAP (Tenenbaum, 2000), which is a famous global manifold learning algorithm, applies geodesic distance to descript the geometric relations of data points in high-dimensional space instead of Euclidean distance and it keeps the same data points' correlation well in low-dimensional space.After DR, hyperspectral image classification could be proceeded.A famous and widely used classification algorithm is SVM, and it has been proved efficient for HSI (F.Melgani, 2004).While compared with SVM, RVM (Christopher M. Bishop, 2006) could be potentially better.The advantages of the RVM over the SVM are probabilistic predictions, automatic estimations of parameters, and the possibility of choosing arbitrary kernel functions (M.E. Tipping, 2001).For reasons above, we adopt RVM for hyperspectral classification to obtain precise probabilistic outputs rather than binary results.Then, spatial factor is taken into consideration.We utilize a spatial weight vector to revise the RVM results so as to the results could be more fit for realistic scenery.
The structure of this paper is organized as follows.In the second section, the basic theory of ISOMAP and RVM are described, and taking advantage of spatial information is also introduced.Experiments and analysis are given in section 3, followed by the conclusion in section 4.

Isometric Feature Mapping
By assuming that the distribution of the dataset has a lowdimensional embedded manifold structure, the main idea of ISOMAP is to find a representation of the dataset in the lowdimensional space by keeping the geodesic distance among the data points.The calculation of geodesic distance is achieved through neighbourhood graph.So the steps of ISOMAP are as follows: Step1 Step3: Construct low-dimensional embedding Y. Apply Multi-Dimensional Scaling (MDS) method with DG to minimize the difference of data points before and after embedding Where DE,Y is the distance matrix of data points after DR, and Where e is a column vector which all elements are 1.
Manipulate eigenvalue decomposition of τ(DG), and select d largest eigenvalues and corresponding eigenvectors to form a diagonal eigenvalue matrix Λd and an eigenvector matrix Vd.Then, we get the final low-dimensional data Though ISOMAP use geodesic distance to descript the correlation of HSI data points, the identity of its measurements is also Euclidean distance.While for HSI, SA is a similarity measurement of spectral waveform.It has good resistance to multiplicative interference and can weaken the influence of light intensity (Freek van der Meer, 2005).Therefor we can just take SA for institute when calculate neighbourhood graph and geodesic distance.

Relevance Vector Machines
The RVM introduces a prior over the model weights governed by a set of hyperparameters, in a probabilistic framework.One hyperparameter is associated with each weight, and the most probable values are iteratively estimated from the training data.The most compelling feature of the RVM is that it typically utilizes significantly fewer kernel functions compared to the SVM, while providing a similar performance (Begüm Demir, 2007;Psorakis I, 2010) (5) Where K(• ,• ) is kernel function, w =(w0, w1, … , wN) T is the weight vector.Besides, logistics sigmoid link function σ(y) = 1/(1+e -y ) is applied to y(x;w) to obtain probabilistic outputs.
A Bernoulli distribution can be adopted for p(t|x) in the probabilistic framework because only two values (0 and 1) are possible.The likelihood estimation function can be defined as The likelihood is complemented by a prior over the parameters (weights) in the form of Where α = (α1, α2, . . ., αN) T shows the hyperparameters introduced to control the strength of the prior over its associated weight.Hence, the prior is Gaussian, but conditioned on α.For a certain α value, the posterior weight distribution conditioned on the data is Where p(t|w) is the likelihood, p(w|α) is the prior, and p(t|α) is referred to as evidence.
As for the weights that couldn't be simply calculated, we need to use a Laplacian approximation procedure.We first define the parameter , so the most probable weights wMP can be obtained by maximizing p(w|t,).Because p(w|t,) is linearly proportional to p(t|w) ×p(w|), wMP can be calculated by

Φ
This result is used for a Gaussian approximation to the posterior over weights centered at wMP: In this way, the classification problem is locally linearized around wMP in an effective way with These equations are basically equivalent to the solution of a generalized least-squares problem.After obtaining wMP, the hyperparameters αi are updated with the ith posterior mean weight, λi and the ith diagonal element of the covariance ii During the optimization process, many αi will have large values, and thus, the corresponding model weights are pruned out, realizing sparsity.The optimization process typically continues until the maximum change in αi values is below a certain threshold or the maximum number of iterations is reached.The corresponding xi of remained αi is called relevance vector.

Space Weight
The spatial correlation of data points is an important part of information contained in HSI.The spatial information is also a remarkable basis for classification.However, such information is not carefully utilized in the above procedure.So, we decide to make a revise for the results gained from RVM.We assume the classification probability output of RVM is P, the class number of HSI is NC, and the spatial weight matrix is W.Then, the final result of HSI classification will be obtained as followed.
1) For P, we first select the max probability of each pixel and set the class label as the corresponding one.Thus, we can get a preliminary classification result map Pmap.
2) For pixel Pmap (i, j), i, j = 1,…, NC, we can count the numbers of pixels belonged to different classes in the eight adjacent areas.Hence the ratios of different classes around it are also gained, i.e., W = diag(W1, W1,…, WNC).
Then, modified probability Pm could be achieved by m i , j  P WP (16) Where Pi,j is a probability vector of pixel (i, j).
3) Execute the second step until every pixel of HSI is traversed.Therefore, we get the final modified probability output Pm.Then, according to the operation of the first step, we can get the classification map with class labels.

Data Description
A sample hyperspectral image which is taken over northwest Indiana's Indian Pine test site in June 1992 is used to test the proposed algorithm.The Indian Pine data consists of 145×145 pixels with 220 bands and 16 classes of substances.The number of spectral bands is initially reduced to 200 by removing bands, covering water absorption as well as noisy bands.The image of 10 th band in hyperspectral image is shown in figure 1.The ground truth image is given in figure 2. All the pixels are labelled with numbers.One to sixteen stand for class indexes while zero means the pixel is not classified.

Experiments and Results
We set the natural number labelled pixels as the sample set.The 35 percent samples are selected randomly as training samples to test the whole image.However, when comparing the performance of different algorithms, we still use the remaining 65 percent samples to calculate for comparative evaluation.RVM, SVM and proposed algorithm are compared on the same selected samples in this paper.Gaussian RBF kernel function is adopted as the basis function of RVM and SVM.
Gaussian kernels parameter is 0.1.For SVM, the parameter cost is 125.Then, our classified result map is shown as figure 3.In addition, figure 4 shows the result map of RVM, and figure 5 shows for proposed algorithm.In the proposed method, k = 20 and the reduced dimension is 20 when ISOMAP is carried out.

Analysis
The Kappa coefficient and overall classification accuracy are often used to evaluate the classification results in the remote sensing image, we make a quantitative analysis on the experimental results.From the above indexes in table 1, we can find that the proposed method is superior to another two algorithms.Moreover, the highest accuracy of SVM and RVM is 97.95% (sixth class) and 94.63% (thirteen class) respectively, whereas that of method in this paper is 99.86% (thirteen class).In addition, 63.04% (first class), 41.30% (first class), and 83.67% (fourth class) are the lowest accuracies of SVM, RVM and proposed algorithm, respectively.Another point worth noting is that the support vectors of SVM is 3544, while the relevance vectors RVM and presented method are 541 and 514 respectively.Hence, the RVM classification is superior to the SVM classification in terms of sparsity.
Although spatial information is under consideration, we can still find that there are also some incorrectly classified points in block areas.This kind of result need to be further investigated and improved.Another problem is that the time consumption is increased with the method in this paper.Therefore, it is not suitable for real-time classification or the situation with time requirement.

CONCLUSION
Aiming at the problem that the classification accuracy of high dimensional data is not high, this paper presents a method that combine ISOMAP, RVM and spatial weight to reduce the dimensions of hyperspectral data and carry out classification.It is shown to provide higher classification accuracy compared with the SVM-based classification.With ISOMAP, the dimension is reduced and the information is well remained at the same time.With spatial weight, the result is improved.In general, this method can effectively enhance relevance vector machine model for the classification accuracy of hyperspectral data.
: Construct neighbourhood graph G.According to the similarity measurement, generally Euclidean distance, of data points, find k nearest neighbourhood points for each point xi, defined as Xi = [xi1, … , xik].Step2: Calculate the geodesic distance DG = {dG(xi, xj), i, j = 1, … , N }.First initialize the geodesic distance DG: in neighbourhood graph G, for xj ∈Xi, set dG(xi, xj) = dE(xi, xj).Where dE(xi, xj) stands for the Euclidean distance between xi and xj.While for xj ∉ Xi, we use the shortest path distance calculated by Dijkstra method as the geodesic distance dG(xi, xj).

Figure 1 .
Figure 1.The image of 10 th band

Figure 3 .
Figure 3. Classification results of SVM

Table 1 .
The results are shown in Table1.Kappa coefficient and overall classification accuracy