ADVANCES IN HYPERSPECTRAL AND MULTISPECTRAL IMAGE FUSION AND SPECTRAL UNMIXING

In this work, we jointly process high spectral and high geometric resolution images and exploit their synergies to (a) generate a fused image of high spectral and geometric resolution; and (b) improve (linear) spectral unmixing of hyperspectral endmembers at subpixel level w.r.t. the pixel size of the hyperspectral image. We assume that the two images are radiometrically corrected and geometrically co-registered. The scientific contributions of this work are (a) a simultaneous approach to image fusion and hyperspectral unmixing, (b) enforcing several physically plausible constraints during unmixing that are all well-known, but typically not used in combination, and (c) the use of efficient, state-of-the-art mathematical optimization tools to implement the processing. The results of our joint fusion and unmixing has the potential to enable more accurate and detailed semantic interpretation of objects and their properties in hyperspectral and multispectral images, with applications in environmental mapping, monitoring and change detection. In our experiments, the proposed method always improves the fusion compared to competing methods, reducing RMSE between 4% and 53%.


INTRODUCTION
The aim of this paper is the integrated processing of (a) high spectral resolution images (HSRI) with low geometric resolution, and (b) high geometric resolution images (HGRI) with low spectral resolution. From a pair of one HSRI and one HGRI, the integrated processing (a) generates a fused image with both high spectral and geometric resolution ( Figure 1). This can be thought of as a sort of hyperspectral pan-sharpening, improved through the use of spectral unmixing constraints; and (b) a better spectral unmixing of the HSRI, through integration of geometric information from the HGRI.
Hyperspectral imaging, often also called imaging spectroscopy, delivers images with many (up to several hundreds) contiguous spectral bands of narrow bandwidth, typically in the order of a few nm. These bands may cover different spectral ranges, e.g. 0.4-2.5µm, 0.4-1.0µm or 0.4-0.7µm. The geometric resolution (or ground sampling distance GSD) of HSRI is low, on the one hand due to physical constraints (i.e. with a narrow bandwidth the signal is weak, so GSD has to be increased to yield an acceptable SNR), and on the other hand due to the possible constraints of the platform (satellites) for storing and/or transmitting a large amount of data, making it necessary to sacrifice geometric resolution in favour of spectral resolution. The HGRI are typically multispectral images with often only 3 to 4 bands in the RGB and NIR bands, with a wide bandwidth of several tens of nm. Theoretically, HGRI could also be panchromatic images, although we currently do not handle this case. HSRI are increasingly available, especially from airborne and satellite platforms, but lately also UAVs and terrestrial photography (for a list of some sensors see e.g. Varshney and Arora, 2004). HGRI are widely acquired from all kind of platforms. These two types of images are complementary to some degree, but have rarely been processed jointly. Merging them into a single, enhanced image product is, especially in the computer vision literature, sometimes termed "hyperspectral image fusion" or "hyperspectral super-resolution". The images can be acquired (quasi) simultaneously from the same platform Figure 1. Hyperspectral images have high spectral resolution but low geometric resolution, whereas the opposite is true for multispectral images. In this paper, we aim at fusing the two types of imagery.
(e.g. sensors Hyperion and ALI on the EO-1 satellite), or with a time difference. The latter case is more frequent -HSRI and HGRI sensors are rarely used simultaneously -but also more difficult, because multitemporal differences (e.g. different illumination, atmospheric conditions, scene changes) introduce errors. We currently assume that the images have already been radiometrically corrected, that the bands of each image and the two images are geometrically accurately co-registered, and that a dimensionality reduction of the hyperspectral bands has been performed, if needed.
The proposed integrated processing shall serve as basis for more accurate and detailed semantic information extraction. HSRI can distinguish finer spectral differences among objects. Thus, not only more objects can be localized and classified, e.g. for landcover mapping, but also biophysical/biochemical properties of objects can be better estimated (for a list of some applications see e.g. Varshney and Arora, 2004). Since HSRI have a large GSD, mixed pixels occur more often, especially in settlement areas.
The task of unmixing is to determine for each pixel the "endmembers" (so called "pure" material spectra) and the proportional contribution of each endmember within the pixel (called "fractional abundance", or simply "abundance"). The abundances andendmembers are subject to several physical constraints, like that the abundances for all endmembers in a pixel must be non-negative and sum to 1, while the number of endmembers present within each pixel, is rather low. To the best of our knowledge, though, there is not yet a fusion method which simultaneously uses all available constraints.
Regarding unmixing, there are two main classes of models, the linear mixing model (LMM) and different nonlinear ones (see e.g. Keshava and Mustard, 2002). In the first case, one assumes only direct reflections, and abundances represent areas of endmembers within a pixel. In the non-linear case, multiple reflections are taken into account and multiple endmembers can be overlayed in the radiometric signature of a single surface location (see Heylen et al., 2014, for a nonlinear unmixing review). Most research uses the simpler LMM, which we also do here. The reason to include HGRI in the unmixing lies in the higher geometric resolution. Geometric information, e.g. homogeneous segments, edges and texture boundaries at a resolution higher than the GSD of the HSRI can help to achieve a more accurate unmixing, and also to spatially localize abundances: rather than only estimating the contribution (%) of one endmember within a pixel, one can determine the corresponding regions (i.e. the pixels at the higher HGRI resoution).
This work makes the following contributions: At the conceptual level we process HSRI and HGRI together to simultaneously perform image fusion and spectral unmixing. This coupled estimation lets the two processes support each other and leads to better results. At the level of modelling, one needs to adequately couple the two processes, and we do this by imposing multiple physically motivated constraints on the endmembers and abundances in both input images. Thus, we treat LMM as a coupled, constrained matrix factorization of the two input images into endmembers and their abundances. On the technical level, we employ efficient state-of-the-art optimization algorithms to tackle the resulting constrained least-squares problem (Bolte et al., 2014, Condat, 2014. Experimental results on several different image pairs show a consistent improvement compared to several other fusion methods.

RELATED WORK
Here, due to lack of space we focus on research on hyperspectral image processing. In recent years, hyperspectral imaging is advancing rapidly (Plaza et al., 2009, Ben-Dor et al., 2013. Both geometric and spectral resolution have improved, which opens up new applications in different fields, including geology and prospecting, agriculture, ecosystems, earth sciences, etc. The fine spectral resolution allows for fine material characterization within one pixel, while the low geometric resolution is a limiting factor. According to Bioucas-Dias et al. (2013) hyperspectral remote sensing data analysis can be grouped into five main areas of activity: hyperspectral unmixing, data fusion, classification, target detection and estimation of land physical parameters. The first two parts are most relevant for this research. The case of the LMM (for a recent review see Bioucas-Dias et al., 2012) is the best studied one, and sufficient for many purposes. To solve it, the first step is to identify the number of endmembers as well as their spectra. The so-called endmember extraction algorithms can be categorized as follows.
Geometrical approaches exploit the fact that in a linear mixture the observed spectral responses form a simplex. Among them, pure pixel algorithms identify a set of "pure" pixels in the data and assume that these are the vertices of the simplex. Well-known examples include vertex component analysis (VCA) (Nascimento and Bioucas Dias, 2005), AVMAX and SVMAX (Chan et al., 2011). Minimum volume algorithms on the other hand do not assume that pure pixels exist, but rather that some data points exist on the facets of a simplex. Examples include SISAL (Bioucas-Dias, 2009), MVC-NMF (Miao and Qi, 2007) and the sparsitypromoting SPICE (Zare and Gader, 2007). Statistical approaches are not limited to the surface of the simplex (e.g. DECA, Nascimento and Bioucas-Dias, 2012), and can deal better with highly mixed scenarios, at the cost of increased complexity. Finally (sparse) regression approaches formulate unmixing as a sparse linear regression problem, assuming that the "pure" spectral signatures are given in advance. With this assumption, the observations are mixtures of a subset of given signatures available in a "spectral library".
The inversion step consists of solving for the fractional abundances under additional constraints such as non-negativity and sum-to-one. This can be done with direct least squares estimation (Heinz and Chang, 2001) or basis pursuit, including variants regularized with total variation (Iordache et al., 2012).
Hyperspectral data fusion denotes the problem of combining a HSRI with another image of higher geometric resolution, usually a multispectral one. Most methods rely on some form of matrix factorization. Yokoya et al. (2012) first extract endmembers with standard VCA and then apply a Non-negative Matrix Factorization (NMF) to jointly estimate the abundances for both a HSRI and a HGRI in an iterative manner. Akhtar et al. (2014) also extract reflectance spectra to form a spectral basis in advance, and with the basis fixed use a non-negative OMP (Orthogonal Matching Pursuit) to learn a sparse coding for the image. A related approach (Kawakami et al., 2011) estimates the endmembers by sparse dictionary learning and further enforces the multispectral abundances to be sparse. However, results may not be physically plausible, since non-negativity is not considered. Huang et al. (2014) learn the spectral basis with SVD (Singular Value Decomposition) and solve for the high spectral resolution abundances via OMP. Wycoff et al. (2013) introduce a non-negative sparsitypromoting framework, which is solved with the ADMM (Alternating Direction Method of Multipliers) solver. Simoes et al. (2015) formulated a convex subspace-based regularization problem, which can also be solved with ADMM, and estimate also the spectral response and spatial blur from the data. However, they used a fixed subspace for the spectral vectors, not considering any physical interpretation. Bayesian approaches (Hardie et al., 2004, Wei et al., 2014 additionally impose priors on the distribution of image intensities and perform Maximum A Posteriori (MAP) inference. Kasetkasem et al. (2005) also add a Markov Random Field (MRF) prior to model spatial correlations. It has also been suggested to learn a prior for the local structure of the upsampled images from offline training data (Gu et al., 2008). Such blind deblurring will however work at most for moderate upsampling.

PROBLEM FORMULATION
The goal of fusion is to combine information coming from a hyperspectral image (HSRI)H ∈ R w×h×B and a multispectral image (HGRI)M ∈ R W ×H×b . The HSRI has high spectral resolution, with B spectral bands, but low geometric resolution with w, h, image width and height respectively. The HGRI has high geometric resolution, with W , H, image width and height respectively, but a low spectral resolution b. We want to estimate a fused imageZ ∈ R W ×H×B that has both high geometric and high spectral resolution. For this fused image we expect to have the spectral resolution of the HSRI (a lot larger than the HGRI, B b) and the geometric resolution of the HGRI (a lot larger than the HSRI, W w and H h).
To simplify the notation, we treat each image ("data cube") as a matrix, with each row holding all pixels of an image in a certain spectral band, i.e. N h = wh and Nm = W H pixels per row, while each matrix column shows all spectral values for a given pixel. Accordingly, the images are written Z ∈ R B×Nm , H ∈ R B×N h and M ∈ R b×Nm .
Following the LMM (Keshava andMustard, 2002, Bioucas-Dias et al., 2012), mixing accurs additively within each pixel and the spectral response z ∈ R B of a given pixel of Z is approximated as where ej ∈ R B denotes the spectral measurement (e.g. reflectance) of the j ∈ {1, . . . , p} endmember, aj denotes the fractional abundance of the respective endmember j, and p is the number of endmembers used to explain this pixel. Eq. (1) can be rewritten in matrix form: A ∈ R p×Nm and ai ∈ R p are the fractional abundances for every pixel i = {1, 2, . . . , Nm}. By using this expression, we are able to represent the image Z in a basis spanned by the endmembers, with the abundances as coefficients. Given that p < B, we are able to describe Z in a lower dimensional space, thus achieving a dimensionality reduction. The assumption that p < B and consequently that the data "live" in a lower dimensional space holds in most the hyperspectral images and thus rank{Z} ≤ p.

Constraints
The basic idea of the proposed fusion and unmixing is to use the fact that in (1) the endmembers and abundances have a physical meaning and they can retain this meaning only under the following constraints: with eij and aij the elements of E, respectively A. 1 denotes a vector of 1's compatible with the dimensions of A.
The first two constraints restrict the solution to a simplex and promote sparsity. The third constraint bounds the endmembers from above and implies that a material cannot reflect more than the incident radiation. Even if the image values do not represent reflectance, we can rescale the values in the range [0 . . . 1], assuming that there are some materials in the scene that are higly reflective in a part of the spectrum.
The low geometric resolution HSRI can be modelled as a geometrically downsampled version of Z: where S ∈ R Nm×N h is the downsampling operator that describes the spatial response of the hyperspectral sensor, andÃ ≡ AS are the abundances at the lower resolution. S can be approximated by taking the (weighted) average values of the multispectral image corresponding to one HSRI pixel.
Likewise, the low spectral resolution HGRI can be modelled as a spectral downsampling of Z where R ∈ R b×B is the spectral response function of the multispectral sensor and each row contains the responses of each multispectral band.Ẽ ≡ RE are the spectrally transformed (downsampled) endmembers. For this work, we assume that S and R are perfectly known.

PROPOSED SOLUTION
The fusion is solved if Z has been found, which is the same as estimating E and A. Actually, by solving for E and A separately we are solving first the spectral unmixing and then using the result to generate the fused image Z. By taking into account the constraints (3), we formulate the following constrained leastsquares problem: arg min with · F denoting the Frobenius norm. The non-negativity and sum-to-one constraints (6c, 6d) geometrically mean that the abundances are forced to lie on the surface of a simplex whose vertices are the endmembers. This property at the same time acts as a prior sparsity term, which promotes that in each pixel only a limited number of endmembers should be present.
Solving (6a) for E and A is rather tricky and suffers from the following issues. The first term is ill-posed with respect to A, because the downsampling operator S acts on the spatial domain and any update coming to A from this term will suffer from the low geometric resolution of the HSRI. In much the same way, the second term is ill-posed with respect to E, because the HGRI has broad spectral responses R and only coarse information about the material spectra can be acquired. Consequently, we propose to split (6a) into two subproblems and solve them by alternating between the (geometrically) low-resolution term for H and the high-resolution term for M. This is somewhat similar to Yokoya et al. (2012), although they omit constraints other than non-negativity (but do a full matrix factorization in each step rather than updating only endmembers or the abundances).
Solving the first term of (6a) subject to the constraints on E will be referred to as the low-resolution step: Here, the endmembers E are updated based on the low-resolution abundancesÃ, which are kept fixed. The minimisation of the second term of (6a) under the constraints on A is the high-resolution step: The high-resolution abundances A are updated and the spectrally downsampled endmembersẼ acquired from the low-resolution step are kept fixed. Algorithm 1 Solution of minimization problem (6a).

Optimisation scheme
Problems (7, 8) are both constrained least-squares and we alternate between the two. We propose to use a projected gradient method to solve both, inspired by the PALM (Proximal Alternating Linearised Minimization) algorithm (Bolte et al., 2014). For (7) the following two steps are iterated until convergence for a number of iterations q = 1, 2, ...: with c = γ1 ÃÃ F a non-zero constant, and prox E a proximal operator that projects onto the constraints of (7). The proximal operator simply truncates the entries of U to 0 from below and to 1 from above. Parameter γ1 is explained below.
Similarly, (8) is minimised by iterating the following steps until convergence: with d = γ2 ẼẼ F again a non-zero constant and prox A a proximal operator that projects onto the constraints of (8). For this simplex projection, a recent, computationally efficient algorithm is available (Condat, 2014). We set γ1 = γ2 = 1.01 for the experiments. These parameters only affect the speed of convergence. An overview over the whole optimisation procedure is given in Algorithm 1.
The problem described in (6a) is highly non-convex and the local optimisation depends a lot on the initial values. To acquire good initial values for the endmembers we use SISAL (Bioucas-Dias, 2009). SISAL is based on the principle that the simplex defined by the endmembers must have minimum volume. It uses a sequence of augmented Lagrangian optimizations in a robust way to return the endmembers. Once the inital endmembers E (0) are defined, we then use SUnSAL (Bioucas-Dias and Nascimento, 2008) to get initial abundances. SUnSAL solves a least-squares problem forÃ (0) via ADMM and can incorporate many constraints, but we only use the constraints (6c, 6d). Finally, we initialize the high geometric resolution abundances A (0) , by upsampling them with the pseudo-inverse of the downsampling operator, A (0) =Ã (0) (S S) −1 S . We observe that strict upsampling produces to blocking artefacts, so we apply a low-pass filter to A (0) .

Datasets
Acquiring HSRI and HGRI simultaneously is not yet a standard procedure, and even if such data were available extensive ground truth would be needed to be able to evaluate the fusion method. Consequently, in this study we make use of semi-synthetic data. That means that as input we have only one real hyperspectral image (which we call the "ground truth"), from which we generate a HSRI and a HGRI by geometric, respectively spectral downsampling. These HSRI and HGRI are synthetic, but since they are based on a true HSRI and realistic spectral response functions, we call them semi-synthetic data.
We use in total four publically available hyperspectral images to validate our methodology. First, two well-known images, Cuprite (in Figure 1) and Indian Pines 1 , were selected as ground truth. These were acquired with AVIRIS, a sensor of NASA that delivers images with 224 contiguous spectral bands in the 0.4-2.5µm range (Varshney and Arora, 2004). After removing the water absorption bands, the images had 195 and 199 spectral bands, respectively. The Cuprite image has dimensions 512 × 608 pixels and Indian Pines 145×145 pixels. The respective GSDs are 18 m and 10 m.
Additionally, we use another standard hyperspectral image, Pavia University 2 , which was acquired with ROSIS, a sensor of DLR 3 that has 115 spectral bands spanning the 0.4-0.9µm range. The image has dimentions 610 × 340 pixels and 103 spectral bands are available, with an approximate GSD of 1 m.
Finally, we choose to use an image from APEX (Schaepman et al., 2015), developed by a Swiss-Belgian consortium on behalf of ESA. APEX covers the spectral range 0.4-2.5µm. The image has a GSD of about 2 m and was acquired in 2011 over Baden, Switzerland (true color composite in Figure 4), as an Open Science Dataset 4 . The cropped image we used has dimensions 400 × 400 pixels and 211 spectral bands, after removing water absorption bands.
All images described above serve as ground truth for our experiments. To simulate the HSRI H from each image we downsample them with a factor of 8, which is a realistic difference between observed HSRIs and HGRIs. The downsampling is done by simple averaging over 8×8 pixel blocks. The HGRI M can be simulated using the known spectral responses R of existing multispectral sensors. We choose to use a different spectral response R for each hyperspectral sensor based on the geometric resolution of the HSRIs. Thus, we take the spectral response of Landsat 8 to combine with the scenes acquired with AVIRIS, because of the moderate geometric resolution. Secondly, the IKONOS spectral response is used for the ROSIS sensor scene, since the HSRI only covers the VNIR spectrum. Finally we we choose the spectral response of the Leica ADS80 for the scene acquired with APEX. The last case will be treated as more challenging, since the two spectra do not fully overlap. We take care that the area under the spectral response for each band is normalized to 1, so that the simulated HGRI has the same value range as the HSRI.  Table 1. Quantitative results for all tested images.

Implementation details and baseline methods
We run our method with the maximum number of endmembers set to p = 30. Although lower numbers would be enough for most of the images, we choose the number conservatively to adapt to the big variability within the APEX scene. This might cause the algorithm to overfit in some cases, but does not seem to affect the quantitative results. The inner loops of the two optimisation steps (9a), (10a) run quite fast and typically converge in ≈ 10 iterations in the early stages, dropping to ≈ 2 iterations as the alternation proceeds. Inner loops stop when the update falls below 1%. We iterate the outer loop over (7, 8) until the overall cost (6a) changes less than 0.01%, or for at most 2000 iterations. For the APEX dataset the proposed method took ≈ 10 min to run in a Matlab implementation on a single Intel Xeon E5 3.2 GHz CPU. Note, for a given image the processing speed depends heavily on the chosen number p of endmembers.
We use three state-of-the-art fusion methods (Wycoff et al., 2013, Akhtar et al., 2014, Simoes et al., 2015 as baselines to compare against. Below, we will call the first two (nameless) methods Wycoff and Akhtar and the third one HySure, the name coined by the authors. The first two methods come from the computer vision literature, where they reported the best results (especially for close-range images), while the third one comes from recent remote sensing literature. We use the authors' original implementations, which are available for all three baselines. All methods were run with the exact same spectral responses R. For HySure, we use only the fusion part and deactivate the additional estimation of the point spread function. We treat the latter as perfectly known, so re-estimating it would give the method a disadvantage. Since we used p = 30 endmembers for our method, we also used a subspace of 30 for HySure. For the method of Akhtar we adapted the number of atoms that is learnt in the dictionary to k = 200, so that the dictionary can cover more spectral responses. Finally, for the method of Wycoff we could not increase the number of scene materials from N = 10, the initial number proposed. Even with N = 10 the method needed more than 100 GB of memory to perform the fusion.

Error metrics
As primary error metric, the root mean square error of the estimated high-resolution hyperspectral imageẐ compared to the ground truth Z is used, scaled to an 8-bit intensity range [0 . . . 255].
To make the evaluation independent of intensity units, we additionally use the Erreur Relative Globale Adimensionnelle de Synthèse (Wald, 2000). This error measure is also supposed to be independent of the GSD difference between the HSRI and HGRI.
where S is the ratio of GSD difference of the HGRI and HSRI (S = GSD h /GSDm), MSE(ẑi, zi) is the mean squared error of every estimated spectral bandẑi and µẑ i is the mean value of each spectral band. Finally, we used the Spectral Angle Mapper (SAM, Yuhas et al., 1992), which is the average, among all pixels, of the angle in R B between the estimated pixelẑj and the ground truth pixel zj.
where · 2 is the l2 vector norm. The SAM is given in degrees. Table 1 shows the error metrics (in bold best results) that each method achieved for all four images. The proposed fusion approach with joint unmixing performs better than all other methods, across all three error measures. The improvement can be seen especially in terms of RMSE and SAM. The effect of resolution difference (i.e. the ratio S) in ERGAS reduces the differences of the methods. Among the other methods, the method of Wycoff performs the best. The method of Akhtar, proposed mainly for close-range computer vision images, performs worse for remote sensing images. Mostly the best performance is achieved for Cuprite, while for APEX all methods perform worst.

EXPERIMENTAL RESULTS AND DISCUSSION
Complementary to the tabulated results, we visualize the perpixel residuals for all four investigated images. In Figure 2 the pixels are sorted in ascending order according to their per-pixel RMSE, computed over all spectral bands. From this graph it can be seen that our method in all cases has the largest amount of pixels with small reconstruction errors. E.g. in images Pavia University and Cuprite most of the pixels are reconstructed with an RMSE lower than 2 gray values in 8bit range. We also visualize the RMSE per spectral band in Figure 3. Based on this figure, our method achieves very similar results with the method of Wycoff, except in the case of APEX, where the proposed method outperforms the baseline, especially in the wavelengths above 1000 nm. The method of Akhtar shows big and rapid RMSE variations, depending on the wavelength. It also shows a different shape of the spectral response curves, compared to the other three methods that have quite similar shapes. In APEX we note rapidly growing RMSE for wavelengths above 900 nm. This is due to the fact that ADS80 has no spectral bands in this range.
Moreover, we choose to visualize the error in two bands of the APEX dataset for all methods. We choose APEX because it is the most challenging scene, since contains many different materials in a mixed urban scene and also has the largest reconstruction errors. Moreover, this image gives bigger differences between the methods. In Figure 4, we visualize the differences (in 8 bit range) between the ground truth and reconstructed images for wavelengths 552 nm and 1506 nm, corresponding to band with low reconstruction error (spectral overlap with the HGRI), respectively high reconstruction error (see Figure 3a). The evaluation shows that we were able to generate a fused image that is 8×8 times improved in the geometric resolution compared to the input HSRI. At the same time, by using the joint spectral unmixing we are able to acquire a physically plausible explanation of the scene's endmembers and fractional abundances. Figure 5 shows the abundances for the APEX scene corresponding to the endmembers (a) for the surface of a tennis court and (b) for stadium tartan. On the top, on the left side is the abundance map for the fused image as recovered by our method, while on the right is the abundance map for the HSRI. Clearly the HGRI provides information about the spatial distribution of the endmembers within a hyperspectral image pixel. This indicates that we can improve unmixing results of the HSRI. However, we are not able to quantitatively judge the unmixing quality, as no ground truth exists. Note that in the true color image of Figure 4 the tennis court and the stadium tartan have very similar color. However, the abundance maps in Figure 5 for the tennis court and the stadium tartan endmembers show different values for these two objects, which should be so, since these two objects consist of different materials. In Figure 5 (bottom right) are the average spectral responses of 30 representative pixels of each material, taken from the ground truth image. Although these two materials reflect similarly in the visible range, they have considerable differences in the infrared range.
The method we proposed has only one essential user parameter, namely the number p of endmembers. If fusion is the ultimate goal, then preliminary tests have proven that it is better to choose p larger than the actual number of endmembers in the scene. In this case, probably some of the estimated endmembers correspond to the same material, splitting the spectral response into two parts (i.e., endmember is no longer synonymous with surface material). Nevertheless this will not affect the fused image. More endmembers may also help handling shadows and specular reflections.
We believe that the most important ingredient for the success of our method is the simplex constraint (6c, 6d), that greatly stabilizes the model. Although these two constraints by themselves are standard for spectral unmixing, we are not aware of any other method that uses both of them together for image fusion.
In this paper, we used semi-synthetic, perfectly co-registered data. Under these circumstances, most methods are able to reconstruct the images with RMSE ≈ 1% of the intensity range, except for APEX. In case that the HSRI and HGRI were acquired by different platforms, co-registration errors between the two images would affect the fusion and the images would have to be corrected. Furthermore, different sensors will have different offsets and gains and the spectral response R must be modified so that the value range for all images is the same. The situation gets even more complicated if the images are not taken simultaneously, when multitemporal effects will influence the fusion, especially illumination differences.
As a closing remark, some images studied in this work have important deficiencies. Images from the AVIRIS sensor have a lot of noise, especially the Indian Pines image. In Figure 3d the peaks in graphs indicate that in those bands there is a lot of noise, which cannot be explained by any method. Recent sensors like APEX seem not to have unusually noisy bands and it is important that new challenging hyperspectral test datasets are provided to the community to help improve algorithms and applications.

CONCLUSIONS
In this work, we have descussed synergies between hyperspectral fusion and unmixing, some investigated for the first time, to the best of our knowledge. First, we perform an integrated processing of both HSRI and HGRI to jointly solve the tasks of image fusion and spectral unmixing. Thereby, the spectral unmixing supports image fusion and vice versa. Second, in the joint estimation, we enforce several useful physical constraints simultaneously, which lead to a more plausible estimation of endmembers and abundances, and at the same time stabilize fusion. Third, we adopt efficient recent optimization machinery which makes it possible to tackle the resulting estimation problem. The validity of the proposed method has been shown experimentally in tests  with four semi-synthetic datasets, including a comparison to three state-of-the-art fusion methods. In the tests our method improved the RMSE of tyhe fusion result by 4% to 53% relative to the best and worst baseline, respectively.
Future work will include several aspects: Processing of real HSRI and HGRI data, and comparison to external ground truth; Scaling to larger scenes, potentially with a locally varying set of endmembers so as not to unnecessarily inflate the spectral basis; Better modelling of multitemporal differences and shadows, which account for an important part of the error in the current processing; Extension to multi-layer unmixing that occurs in vegetation applications when tree canopies interact with each other and with the ground.