FUSING PANCHROMATIC AND SWIR BANDS BASED ON CNN – A PRELIMINARY STUDY OVER WORLDVIEW-3 DATASETS

: The traditional fusion methods are based on the fact that the spectral ranges of the Panchromatic (PAN) and multispectral bands (MS) are almost overlapping. In this paper, we propose a new pan-sharpening method for the fusion of PAN and SWIR (short-wave infrared) bands, whose spectral coverages are not overlapping. This problem is addressed with a convolutional neural network (CNN), which is trained by WorldView-3 dataset. CNN can learn the complex relationship among bands, and thus alleviate spectral distortion. Consequently, in our network, we use the simple three-layer basic architecture with 16 × 16 kernels to conduct the experiment. Every layer use different receptive field. The first two layers compute 512 feature maps by using the 16 × 16 and 1 × 1 receptive field respectively and the third layer with a 8 × 8 receptive field. The fusion results are optimized by continuous training. As for assessment, four evaluation indexes including Entropy, CC, SAM and UIQI are selected built on subjective visual effect and quantitative evaluation. The preliminary experimental results demonstrate that the fusion algorithms can effectively enhance the spatial information. Unfortunately, the fusion image has spectral distortion, it cannot maintain the spectral information of the SWIR image.


INTRODUCTION
Remotely-sensed images have exhibited explosive growth trends in multi-sensor, multi-temporal, and multi-resolution characteristics (Bin, 2017).Duo to technical limitations (Liang, 2004) of multi-sensor technology, remote sensing sensors cannot obtain images with both high spatial and high spectral properties.Remote sensing image fusion, also known as image pan-sharpening, becomes a crucial procedure to combine the features of multi-sensors.
These methods assume that the spectral coverage of panchromatic (Pan) band is overlapped with that of spectral bands, and the spectral bands are highly related to the panchromatic band.However, many newly-launched satellites, such as worldview-3, besides some typical green, blue, red and near infrared (NIR) bands, also can provide some short-wave infrared bands (SWIRs) (Pacifici, 2016).The spatial resolution of these SWIRs is lower than the Pan bands, meanwhile the spectral ranges are not overlapped by that of the Pan band.
Therefore, the assumption that holds by state-of-the-arts methods is false under this situation.How to combine complementary information from the panchromatic and SWIR bands become the key issue for fusing Pan and SWIR band.

Experimental Setting
To illustrate the effectiveness of the method, two datasets acquired by WorldView-3 sensors were considered.The test images were acquired over Rio de Janeiro, Brazil region.The resolution of PAN in this satellite ranges from 450nm to 800nm.
And the resolution cell is 1.2m×1.2mwith 1000×1000 pixels for the PAN.The datasets also comprise of 8 SWIR bands in the wavelength range from 1210nm to 2330nm with spatial resolution 7.5m and contains 1000×1000 pixels.The test scene includes building, water body and vegetation.We compare with three widely used pan-sharpening method: ATWT, IHS, PCA.
(1) The entropy is an important index to measure the information richness of images.The entropy value reflects the average amount of information contained in the image.It is defined as： where Entropy is the entropy of image, and p i is the probability of pixel value i in the image.
(2) CC (Soiva, 2007) measures the correlation between the SWIR image and the fused image.Its definition is given by: where M and F representative the mean values of the original SWIR and fused image.
(3) SAM (Yuhas, 1992) is defined as calculating the angle between the corresponding pixels of the fused and reference image, its optimal value is 0. It is defined as follows: ) , arccos( , where I {n} =[I 1 , {n},..., I N , {n} ] is a pixel vector of the SWIR image I with N bands.In which  ， denotes bc a the scalar product, and  denotes the vector 2  -norm. (4) UIQI (Wang, 2002) can measures the distortion of two image structure and it is universal, which is suitable for evaluating different image processing processes.It is expressed as: where IJ  is the sample covariance of I and J, and I is the sample mean of I .
I  and J  are the standard deviation of the fused and original images respectively.With a careful visual inspection of the results, we can find that all of these results have serious spectral distortion, especially in vegetation.This may be due to the CS method has higher spectral overlap requirement.As for the CNN, it may attribute to our network's structure.However, the spatial quality of the image is greatly improved.

Results
In order to further evaluate the effect of fused images, quantitative indices were used to evaluate the spectral fidelity and spatial details of the fused images.The entropy can measure the information of the fusion image.The higher the entropy is, the better the quality of the result will be.The CC, UIQI and SAM can evaluate the similarity of the fusion image and original short-wave infrared image.Especially the SAM, it can measure the spectral similarity of the two, the smaller the SAM is, the better the spectral fidelity of the result is.In table 2 and 3, the entropy of the proposed method is significantly higher than the original SWIR image.This demonstrates that this methods are effective to increase the spatial resolution of the SWIR images, this is consistent with the visual inspection.From the visual interpretation, although there are spectral distortions in all four methods, the CNN method has the smallest spectral angle mapper indicating that its spectral fidelity is better than the other methods.Unfortunately, the CC and UIQI are lower than the ATWT.In general, the reason for this result is that the number of samples we use to train the network is insufficient.

Method
Traditional methods are often suffers from high spectral and spatial distortion, because Pan and SWIR components are not acquired in overlapped spectral ranges.In contrast, Convolutional Neural Network (CNN), a breakthrough method in the machine learning field, is a promising solution to the problem due to the following properties.Firstly, CNN can learn features automatically by using the multi-layer nonlinear mapping.The model can learn the optimal deep support value filter at various levels.Secondly, the multi-scale support value filter can extract the high frequency information of the images at all levels effectively.The high frequency information of the Pan image is injected into the SWIR, which avoids the space and spectral distortion simultaneously.Finally, via two or more deep learning networks, the network can find the ideal solution in the training process quickly.Figure 1 shows a basic flowchart describing fusion process of the CNN.The input of the CNN model is the SWIR and PAN band contents.The output is a SWIR image with the same spatial resolution as the PAN.We use the simple three-layer basic architecture with 16×16 kernels.The parameters and weight adjustment of the network need to be trained with the desired high resolution SWIR image as the reference data.Unfortunately, such reference data does not exist.Therefore, before entering the network, we create a down-sampled reference data following the Wald protocol (Wald 2009).In detail, the SWIR image at the target panchromatic resolution is employed as reference, original SWIR and PAN image is employed as input.The space and spectral features are extracted by the operation of multiple convolution layers for image reconstruction.The output of the network is the final fusion result.

Figure 1 .
Figure 1.The CNN structure of for pansharpening

Figure 2 .
Figure 2. Training process based on the Wald protocol

Figure 3 .
Figure 3.The processing flowchart of the proposed pan-sharpening method.

Figure 4
Figure 4 and 5 show the different fused results of WV-3.Every patches is mainly covered by different land cover types, such as building, water body and vegetation.Every patches with the size of 1000×1000 pixels.Table 2-3 show these quantitative scores for evaluation results.The best performance of each metric is in bold.The SWIR image has eight bands and is displayed by a combination of 1, 2, 3 bands as red, green, blue, as shown in figure 4.

Figure 4 .
Figure 4. Results of the fusion experiment on an area of building.From left to right: Original SWIR images; ATWT; HIS; PCA and the proposed method.

Figure 5 .
Figure 5. Results of the fusion experiment on an area of building.From left to right: Original SWIR images; ATWT; HIS; PCA and the proposed method.
, a novel image fusion algorithm for SWIR and Panchromatic images based on CNN domain is proposed, and applying it to WV-3 images.The experiments have achieved good results and can provide an effective means for the fuse of Pan and SWIR bands.The experimental results demonstrate the effectiveness of the proposed fusion algorithms.Though it can enhance the spatial information, there are also spectral distortion of the SWIR images.Furthermore, how to eliminate the resolution of fusion image and realize adaptive fusion based on sensor is the direction of our future research.

Table 1 .
Characteristics of the employed WV-3 datasets

Table 3 .
Quality evaluation of fused images: water body and vegetation (corresponding to Figure5).The best performance of each metric is in bold.