ENDMEMBER EXTRACTION OF HIGHLY MIXED DATA USING 1 L SPARSITY-CONSTRAINED MULTILAYER NONNEGATIVE MATRIX FACTORIZATION

Due to the limited spatial resolution of remote hyperspectral sensors, pixels are usually highly mixed in the hyperspectral images. Endmember extraction refers to the process identifying the pure endmember signatures from the mixture, which is an important step towards the utilization of hyperspectral data. Nonnegative matrix factorization (NMF) is a widely used method of endmember extraction due to its effectiveness and convenience. While most NMF-based methods have single-layer structures, which may have difficulties in effectively learning the structures of highly mixed and complex data. On the other hand, multilayer algorithms have shown great advantages in learning data features and been widely studied in many fields. In this paper, we presented a 1 L sparsityconstrained multilayer NMF method for endmember extraction of highly mixed data. Firstly, the multilayer NMF structure was obtained by unfolding NMF into a certain number of layers. In each layer, the abundance matrix was decomposed into the endmember matrix and abundance matrix of the next layer. Besides, to improve the performance of NMF, we incorporated sparsity constraints to the multilayer NMF model by adding a 1 L regularizer of the abundance matrix to each layer. At last, a layer-wise optimization method based on NeNMF was proposed to train the multilayer NMF structure. Experiments were conducted on both synthetic data and real data. The results demonstrate that our proposed algorithm can achieve better results than several state-of-art approaches. * tech_commu@163.com


INTRODUCTION
Hyperspectral remote sensing images generally contain abundant spatial and spectral information of the covered areas, which can be useful in the applications of earth monitoring, land cover classification, mineral exploration, military surveillance, etc. (Tong et al., 2016).These images are usually captured by space-borne sensors or air-borne sensors.Due to the long observation distance and the low spatial resolution of hyperspectral sensors, pixels in the acquired hyperspectral images are usually a mixture of several ground cover spectrum.These pure spectra are known as endmembers and their proportions in the pixels are called the corresponding abundance fractions (Zhu et al., 2014).Before utilizing the hyperspectral images, we need to decompose these mixed pixels into a set of endmembers and their corresponding abundances first.This process is called hyperspectral unmixing, which contains two procedures: endmember extraction and abundance estimation (Miao et al., 2007).
Many algorithms have been proposed for endmember extraction, such as Vertex component analysis (VCA) (Nascimento et al., 2005).With the pure-pixel assumption, VCA projects all the pixels into a random direction and regard the pixel with the largest projection as endmember.Recently NMF has been widely studied.The major advantages of NMF are that it does not need to assume the existence of pure pixels in the hyperspectral images and can decompose the target matrix into two nonnegative matrices simultaneously (Wang et al., 2013).However, it may has a huge solution space since the cost function of traditional NMF is nonconvex.
Many efforts have been made to improve the unmixing performance of NMF.Among them, based on the assumption that most pixels are mixtures of only a few of the endmembers in the images, sparsity constraints based on regularization methods have acquired extensive attention.Though many (0 1) q L q   regularizers have been incorporated to generate sparse results, they may suffer from numerical stability since they are nondifferentiable at 0. The 1 L regularizer is generally the most popular choice for achieving sparse constraints due to its stability and effectiveness (Qian et al., 2011).Besides, for those highly-mixed and complex data, single-layered NMF may have poor performance since it is difficult to effectively decompose the complicated data (Cichocki et al., 2007).Multilayer structure can decompose the data layer by layer and help to extract the tiny features hidden in the data.So multilayer structure are generally more effective than single layer in learning feature representations of highly mixed data.Multilayer algorithms has been also widely studied in many areas such as image classification (Chen et al., 2015), hyperspectral unmixing (Rajabi et al., 2015), attribute representations (Trigeorgis et al., 2017), etc.In this paper, we proposed a 1 L sparsity-constrained multilayer NMF ( 1 L -MLNMF) unmixing algorithm for highly mixed hyperspectral data.The multilayer NMF structure is acquired by iteratively decomposing the target matrix into the endmember matrix and the abundance matrix in each layer.Due to the intrinsic sparsity of the abundance matrix, we added sparse constraint to each layer with the 1 L regularizer of the abundance matrix, which is different from traditional multilayer NMF algorithms.Besides, the major contribution of our paper is that a novel optimization method was proposed to train the structure layer-wisely.To improve the approximation accuracy and accelerate the optimization process, NeNMF (Guan et al., 2012) was adopted to solve the NMF problem in each layer.

METHODOLOGY
The linear mixing model (LMM) has been widely used in hyperspectral unmixing due to its simplicity and satisfactory performance in most scenes.In this paper, the proposed algorithm is built based on LMM.So we first introduce LMM in this section.Then NeNMF is further presented, which is the foundation of our optimization procedure.At last, the formulation of our method is described.

Linear Mixing Model
For the hyperspectral data with m bands and n pixels, suppose the number of endmembers in the image is c , then the mathematical model of LMM can be expressed as where is the abundance matrix E is the noise matrix Both the endmember matrix and the abundance matrix should satisfy some physical constraints.The elements in the endmember matrix should be nonnegative since they denote the observed energy in certain bands.The abundance matrix should subject to the following two constraints: the abundance nonnegative constraint,

NeNMF
NeNMF is an efficient NMF solver based on Nesterov's optimal gradient method (Nesterov, 2005), which has been proven to be superior to the multiplicative update rule (MUR) and the projected gradient method (PG) in terms of efficiency as well as approximation accuracy.
A common cost function of NMF can be written as where are the two decomposed nonnegative matrices Since ( 2) is nonconvex, it is impractical to obtain the optimal solution.The block coordinate descent method is usually used to solve it by the following two sub-problems where t is the iteration counter In the method, it alternatively solve the above two sub-problems until convergence.NeNMF was built based on the block coordinate descent method, which uses Nesterov's optimal gradient algorithm to alternatively minimize them.However, before Nesterov's method can be used, the target function is required to be convex and the gradient of the function should be Lipschitz continuous.Fortunately, both (3) and ( 4) meet these requirements.They can be solved similarly since they are symmetric.Take (3) for example, the method construct two sequences { } k Y and { } k H , and alternatively update them in each iteration as follows: By using the Lagrange multiplier method and the Karush-Kuhn-Tucker (K.K.T.) conditions of (5), it can be solved as 1 ( , ) where ( ) P  can project all the negative elements to zero By alternatively updating the two sequences, k H can be optimized towards the direction of reducing (3).Until convergence, the optimal solution of (3) can be obtained.The above procedure can be used to solve (4) in the same way.Using Nesterov's optimal gradient method to alternatively minimize (3) and (4), it can reach the local optimum of (2).

The Proposed 1 L -MLNMF Method
In 1 L -MLNMF, the proposed multilayer NMF structure is used to decompose the hyperspectral data X .In the first layer, the data is decomposed into the basic endmember matrix and the abundance matrix, namely 1 W and 1 H .For the l layer, the data matrix l X which is directly obtained from -1 l H , is decomposed into l W and l H . Suppose the number of layers is L .So the mathematical representation of the multilayer structure can be expressed as Then sparsity constraint is added to the model with the 1 L regularizer of the abundance matrix in each layer.Then the total cost function can be constructed as where l  is the penalty factor in layer l .
The penalty factor l  is used to estimate the sparsity of the abundance matrix in the l th layer, which is calculated using the estimator (Hoyer, 2004) as where The function in ( 10) is a multi-factor NMF problem.Traditional algorithms usually decompose it into two-factor NMFs, which is easy to accumulate cumulative error.In this paper, we solve it layer-by-layer.In l th layer, the cost function can be expressed as where Here we use NeNMF to optimize (12).The gradients of ( , ) F W H and corresponding Lipschitz constants can be calculated as Several issues should be considered in the optimization process.Firstly, we adopt VCA (Nascimento et al., 2002) and the fully constrained least squares method (FCLS) (Heinz et al., 2001) to estimate endmembers and abundance fractions for all layers, separately.Besides, for the abundance sum-to-one constraint, an effective method is adopted to augment the target matrix and the endmember matrix as where  is a positive constant.
At last, the optimization will be stopped when the number of iterations exceeds the maximum number of iterations max T or the cost function changes less than the threshold, set to 10 -5 in this paper, for more than 20 successive times.Our proposed algorithm can be summarized as in Algorithm 1.
Algorithm 1： 1 L -MLNMF Input: for all layers do initialize l W , l H using VCA and FCLS repeat calculate l  using ( 11) end for , and

EXPERIMENTS AND RESULTS
In this section, several experiments are conducted on both synthetic data and real data to compare our method with several existing state-of-art approaches, namely the vertex component analysis method (VCA-FCLS), the projected gradient NMF method (PGNMF) (Lin, 2007), the minimum volume constrained NMF method (MVC-NMF) (Miao et al., 2007), and the multilayer NMF method (MLNMF) (Rajabi et al., 2015).Spectra from the USGS spectral library are used to generate the synthetic data with the method proposed by (Miao et al., 2007).To get highly-mixed data, the number of extracted spectra is set to 6 and the size of the low pass filter is set to 9×9.The parameter concerning the mixing degree of the data,  , is set to 0.8 to further remove pure pixels.20-dB white Gaussian noise is also added to the data.
In our method, we set 10 L  , 20   , and max 1000 T  .The parameters of the other four approaches are consistent with their original work, except that the layer of MLNMF is kept the same with our method.Four widely used metrics, including the spectral angle distance (SAD), the spectral information divergence (SID), the abundance angle distance (AAD) and the abundance information divergence (AID) are used to evaluate the performance of the methods.Two experiments are conducted on the synthetic data.The metrics in each experiment is acquired by 20 random tests.

Overall Comparison of Different Methods
In this experiment, we set 0.6   to generate higher mixing degree of the synthetic data.Figure 1 shows the averaging experimental results.As can be seen, 1 L -MLNMF can generate the smallest metrics among all the five tested algorithms in terms of AAD, AID, SAD, and SID.In particular, it can also achieve better performance than MLNMF since they both have multilayer structures.This is mainly owing to the carefullydesigned multilayer model and the effective optimization method.In general, our proposed method is superior to the other four approaches.

Performance Analysis of Different Number of Layers
In this experiment, the effect of different number of layers on 1 L -MLNMF is analysed. is remained to be 0.8. Figure 2 shows the calculated results.It can be seen that all the four metrics decrease with the number of layers increases.Especially, when the number of layer is less than 3, all the metrics have a big decline.When the number of layers is close to 10, the metrics decrease slowly, demonstrating that the unmixing effect is reaching its limitation.In general, more layers can help to improve the unmixing performance in a certain range.

Performance Comparison on Real Data
In this experiment, our method is tested on the Cuprite dataset, which has 224 bands and a spectral resolution of 10 nm, covering the wavelength range from 0.4 to 2.5 μm. Figure 3 shows the 30th band of the sub image used in this experiment.After removing the low SNR and water-vapor absorption bands (including bands 1-3, 105-113, 148-167, and 221-224), a total of 188 bands are used in this experiment.The number of endmembers is defined as 12, according to the study in (Nascimento et al., 2002).Table 1 summarizes the comparison results of the five algorithms.Figure 4 shows the estimated abundance maps by our proposed method.It can be seen that our method could achieve smaller SAD that other methods and the estimated abundance maps presented high similarity to the published results.

CONCLUSIONS
For the endmember extraction of highly mixed data, we have proposed a 1 L sparsity-constrained multilayer NMF method in this paper.The proposed multilayer NMF model is formulated by unfolding NMF into a certain number of layers and adding the 1 L regularization terms of the abundance matrices.The cost function is built on all the matrices in each layer.And a novel optimization method based on NeNMF is further presented to solve the multi-factor NMF problem, since it is difficult to simultaneously optimize all the endmember matrices and abundance matrices.The experimental results demonstrates that when the number of layers is less than 10, more layers can help to improve the unmixing performance.Due to the carefullydesigned multilayer structure, our method can also achieve better performance than several other state-of-art unmixing approaches on both synthetic data and real data.

Figure 1 .
Figure 1.Performance comparison of different algorithms Figure 2. Performance comparison of 1 L -MLNMF under different number of layers

Figure
Figure 3. Band 30 of the experimental subimage

TABLE 1 .
Comparison results of different methods in terms of SAD