Research on dimension reduction method for hyperspectral remote sensing image based on global mixture coordination factor analysis

Over the past thirty years, the hyperspectral remote sensing technology is attracted more and more attentions by the researchers. The dimension reduction technology for hyperspectral remote sensing image data is one of the hotspots in current research of hyperspectral remote sensing. In order to solve the problems of nonlinearity, the high dimensions and the redundancy of the bands that exist in the hyperspectral data, this paper proposes a dimension reduction method for hyperspectral remote sensing image data based on the global mixture coordination factor analysis. In the first place, a linear low dimensional manifold is obtained from the nonlinear and high dimensional hyperspectral image data by mixture factor analysis method. In the second place, the parameters of linear low dimensional manifold are estimated by the EM algorithm of find a local maximum of the data log-likelihood. In the third place, the manifold is aligned to a global parameterization by the global coordinated factor analysis model and then the lowdimension image data of hyperspectral image data is obtained at last. Through the comparison of different dimensionality reduction method and different classification method for the low-dimensional data, the result illuminates the proposed method can retain maximum spectral information in hyperspectral image data and can eliminate the redundant among bands. * Corresponding author E-mail:hpu_wcy@163.com The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W4, 2015 2015 International Workshop on Image and Data Fusion, 21 – 23 July 2015, Kona, Hawaii, USA This contribution has been peer-reviewed. doi:10.5194/isprsarchives-XL-7-W4-159-2015 159


INTRODUCTION
The hyperspectral remote sensing technology is a new remote sensing technology which developed rapidly in recent years and has become one of pulling power of the remote sensing fields (Goetz, 2009).It is a comprehensive technology which integrates detector technology, precision optical machinery, weak information detection, computer technology, information processing technology.It uses lots of narrow bands of electromagnetic waves to obtain the relevant data from interesting objects and can detect features which can't be able to be detected in the wide-band spectrum remote sensing (Tong et al., 2014).The hyperspectral data have distinctive features and contain abundance information of spatial, radiation and spectral.In imaging process, it obtains image of earth's surface by using a nano-scale spectrum resolution imaging spectrometer with dozens even hundreds bands at the same time, acquiring continuous spectrums of ground objects and obtains the spectral information, radiation information and spatial information of ground objects simultaneously.So it has a great application value and broad development prospects in related fields (Yu et al., 2013).
The hyperspectral remote sensing image have the characteristics of multi-bands, enormous data, strong band relevant and easily influenced by Hughes phenomenon (Hughes, 1968).It produced the contradiction between fine spectra and enormous data, efficiency and accuracy in some extent.Therefore, it is an important issue in remote sensing field that the hyperspectral data can be reduced the dimension to solve the problem of computing complexity under the premise of not reducing the classification accuracy (Zhang et al., 2013, Kozal et al., 2013, Luo et al., 2013).
So far, the research of hyperspectral remote sensing image dimension reduction method mainly carry out from two general orientation (Li et al., 2008).One is band selection and the other is feature transformation.The principle of band selection of dimension reduction method is to use m bands of hyperspectral data with a great sense of features to replace the original n dimensions of high dimensional hyperspectral data(m<n).The advantage of these methods is able to retain better the characteristics of the original image data, and the disadvantage is that they undermine the integrity of the hyperspectral data spectral details (Wang et al., 2012).The principle of feature transformation of dimension reduction is to use a transform method with low order approximation sets to represent the original datasets.The advantage of these methods is that the new low-order datasets have a higher compressibility, the disadvantage is that they could damage the specific physical meaning of original datasets (Hsing et al., 2005).
Due to the algorithm in feature transformation, dimension reduction method is good, and dimension reduction efficiency is high.The hyperspectral data dimension reduction is more widely applied in the field.There are two major classes of dimension reduction.One of methods is dimension reduction of linear feature transformation, and the other is the dimension reduction of nonlinear feature transformation.The dimension reduction method of linear feature transformation mainly contains hyperspectral dimension reduction method based on nonnegative matrix factorization (Bourennane et al. 2014, Robila et al. 2007), principal component analysis (Agarwal et al., 2007), independent component analysis (Song et al., 2007, Lennon et al., 2001), principal curves and manifolds (Gillis et al., 2005), kernel PCA (Khan et al., 2009), andisomap (Dong et al., 2007).The main problem of the present dimension reduction method of linear feature transformation is that it can't keep the local information.And the spectral information loss is serious after the reduction of dimension.The advantage is fast speed, the algorithm is efficient, it is easy to implement.And the existing problem of dimension reduction methods of nonlinear feature transform for nonlinear initial conditions is difficult to estimate it, such as we can not defined the manifold's simplicity.On the other hand, the algorithm is slow, and it is difficult to realize.The advantage is that the explicit consideration of the manifold structure of the data is given, and the local information is kept enough.
The factor analysis method is a dimension reduction method of linear feature transformation.The method can constitute a linear combination of the original features through some latent factors and finally remove the correlation of the original feature concentration.Mixtures of factor analysis is extension of the factor analysis method.The method allows different regions in the input space to build model of local factor data, that is Gaussian mixture model of dimension reduction and dimension reduction and clustering can be completed at the same time, so it is very suitable when we learn high-dimensional data generation model.This paper presents a hyperspectral data dimension reduction method of global mixture coordination factor analysis(GMCFA), first of all, it use a mixture of factor analysis method to obtain a linear low-dimensional manifold from nonlinear high-dimensional original hyperspectral data.Then this method uses the maximum log likelihood algorithm to estimate parameters of the linear low dimensional manifold.Finally, it can construct the low dimensional hyperspectral image data by using the global coordination factor analysis model.The algorithm combines features of transform linear dimension reduction method and nonlinear transform of dimension reduction methods.Under the premise of spectral information sufficient retention, the algorithm is easy to implement, has a fast speed, and it is high efficiency.

model of mixtures factor analyzers
Suppose high-dimensional hyperspectral images is, there is a low-dimensional manifold embedded in high-dimensional space.If the low dimensional manifold is locally sufficient smooth, then a model can be constructed to approximate the linear representation of the manifold.Factor analysis models such as formula 2-1: it is a constant vector representation mean for load matrix, is independent and normalized common factors specific variance, and independent of each other, called special factors.Assuming that F is independent random variable and it obeys the Gauss distribution, the probability density function of the target is obtained by Gauss mixture modeling, as shown in Figure 2-2 The i  is the proportion of the I factor analyzer in the model and satisfy this restriction , the mean of each Gauss distribution in the mixed model of the Gauss model is Λ , Ψ is the covariance of the Gauss distribution in the Gauss mixture model.
As can be seen from the formula 2-2, the mixed factor model is actually the Gauss mixture model of dimension reduction.Each factor analyzer fits a part of the Gauss distribution data.
Because each Gauss distribution of the covariance matrix is determined by a low dimensional factor load matrix, the covariance structure of the model can be determined by the Km parameters rather than the Kn(Kn-1)/2 parameters of the Gauss mixture model, so as to realize the dimension reduction.

Mixed factor model parameter estimation
The parameters of the mixed factor model are estimated mainly using EM algorithm at the current(Expectation Maximization).EM algorithm is an iterative method to find the maximum likelihood estimation of the subject distribution parameters under the condition of given set of observation data.The core idea is based on the existing data, with the help of the hidden variable, the likelihood function is estimated by the iteration between the expected values.
We assume that the observed data is Y,  (0) , then alternating between two steps: E expressed the expectation that M expressed maximum.
The algorithm steps can be summarized as follows: E step: Repeat the above E and M to

The improved EM algorithm
EM algorithm in the parameter estimation method is very popular, but there are a lot of defects is the slow convergence speed.In order to accelerate the convergence rate of the EM algorithm, Verbeek, J et al. (2006) proposed the CFA algorithm (Verbeek, 2006), The main advantage of the coordinated factor analysis (CFA) method is that it allows the global adjustment of the parameters estimation in the linear model and increases the constraint conditions.The algorithm steps can be summarized as follows: Where log ( ) i p x is the maximum likelihood value of the MFA model is estimated using the EM algorithm.
( ) D  is the Kullback-Leibler Divergence.y is low dimensional data.The dominant idea of CFA algorithm is to amend the M step to adjust the covariance and get the additional information of the complete data.The steps include: (1) Initial value: For the Equation 2-2, the mean value Λ is used of the original data and the covariance matrix Ψ is set to the identity matrix.
(2) CFA-E step: CFA algorithm(4) The condition of convergence: Keeping the iteration E and M steps and repeating the above three values,

THE STUDY AREA AND VALIDATION IMAGES
Test of the study area is a sight of the AVIRIS hyperspectral images which is offered by NASA, image area located at the Kennedy Space Center (KSC), access time is March 23, 1996, image data include 224 bands, spectral range is 400-2500nm, spectral resolution is 10nm, spatial resolution is 18 meters.After atmospheric correction and geometry correction, remove noise more band, finally chooses 120 bands were analyzed.The ground of the experimental area category reference Landsat Thematic Mapper images obtain (Francisco et al., 2006), A total of 13 major categories, respectively Scrub, Willow, CP Hammock, CP/Oak, Slash Pine, Oak/Broadleaf, Hardwood swamp, Graminoid marsh, Spartina marsh, Cattail marsh, Salt marsh, Mud flats, water.The experimental zone true color composite image is shown in figure 1, each category of training sample distribution as shown in figure 2, 13 kinds of categories of ground objects spectrum curve as shown in figure 3, the number of all kinds of samples as shown in table 1.

Dimension reduction effect comparison for different classifier algorithms
This experiment mainly has two purposes, one is the validation dimension reduction effect of this chapter puts forward the dimension reduction method.Participate in contrast dimension reduction method has PCA (principal component analysis), the LDA (linear discriminant analysis) and dimension reduction method which is proposed in this paper; Participate in contrast classification method has a minimum distance method, maximum likelihood method, support vector machine (SVM) method.

Discussion
(1) GMCFA dimension reduction method is applied to AVIRIS type hyperspectral data, which shows a good dimension reduction performance.
(2) The effect of GMCFA dimension reduction method is better than PCA and LDA algorithm of projection dimension reduction.
(3) GMCFA dimension reduction method wipe off a large number of redundant information between the bands, classification precision is tiny variations in each classification method and classification results have no significant decline after dimensionality reduction.

CONCLUSION
Hyperspectral data have the characteristics of a large number of bands, large amount of data and the strong correlation between adjacent bands, the dimension often need to be cut down before deal with.Factor analysis which is linear feature transformation method can eliminate the original characteristics of the concentration of associations eventually by some latent factors constitute a linear combination of the original features, it is very suitable for the applications of high-dimensional data dimension reduction.This paper improves a dimension reduction method of factor analysis and applies to the dimension reduction of hyperspectral remote sensing image and compare with a variety of dimension reduction method.The experimental results showed that the GMCFA dimension reduction method has good universality and stability.Otherwise, it can remove a large number of redundancy between bands and keep better spectral information.Thus, the hyperspectral remote sensing data processing of classification after band dimension reduction can maintain a high classification precision and good classification.
is the missing data, θ is the model parameters, The posterior distribution density function of the θ based on the observation data Y is ( ) g y  , which is called posterior distribution.( , ) f y z  indicates that the posterior distribution density function of θ is obtained after adding data Z, and is called the adding posterior distribution.( , ) k z y  indicates the conditional distribution density function of the latent data Z after the given θ and the observed data Y.EM algorithm starts from θ

Fig. 4
Fig.4 The comparison of classification results map using minimum distance algorithm under different dimension reduction algorithm of AVIRIS data