HICF : A MATLAB PACKAGE FOR HYPERSPECTRAL IMAGE CLASSIFICATION AND FUSION FOR EDUCATIONAL LEARNING AND RESEARCH

A significant surge has been observed with the development and research in remote sensing in recent years for hyperspectral applications in Earth observation. Subsequently, the development of software and tools have also experienced an unprecedented rise, both in research as well as in academia. Although commercial software and tools such as ENVI by ITT Visual Information Solutions, Boulder, CO, USA are available for visualizing and analyzing the hyperspectral images, such software are expensive. Some open source toolboxes such as the MATLAB-based Hyperspectral Image Analysis Toolbox (HIAT) are also available. However, mostly these toolboxes have not been packaged for dissemination and operation without the MATLAB software which is commercial. In this paper, we introduce the Hyperspectral Image Classification and Fusion (HICF) package which is being developed at the Geoinformatics laboratory, Department of Civil Engineering, Indian Institute of Technology Kanpur (IITK) in MATLAB that can be used by standalone installation with an open source supplementary MATLAB compiler. This software is intended to provide a collection of algorithms both conventional and those developed at the Geoinformatics laboratory that utilizes the numerical computing capability of MATLAB for the processing of hyperspectral and multispectral imagery. The HICF software comprises a simple design of the graphical user interface which can be efficiently used particularly for academic purposes.


INTRODUCTION
With the advent of remote sensing (RS) technology information retrieval of land surface features has become increasingly simple, especially with the larger availability of the freely available remote sensing data and the advancements in algorithm development.Land Use Land Cover (LULC) analysis is one of the major applications of RS data.Currently, there are numerous spaceborne and airborne sensors which are widely used all over the world for acquiring multispectral and hyperspectral data for LULC characterization, which is vital for many environmental and socioeconomic applications.Subsequently, a surge has been observed in research and development of algorithms for utilizing this data.The larger extent of applicability of such data renders information distribution on remote sensing techniques and applications in educational systems significant in the overall growth of the system.A particular increase in the available techniques for LULC studies has been observed in advanced classification methods and in approaches for improving the classification accuracy (Gong et al., 1992;Kontoes et al., 1993;Foody, 1996).Another area of research that has found significant growth in the recent years is in the domain of image fusion which has many applications not only in earth observation applications but in cybersecurity and biometric information retrieval through face recognition (Kour et al., 2016).Several methods of image fusion are already taught regularly at Bachelor and Master level in the educational system in a variety of disciplines.
For image processing there are many commercial and noncommercial software packages available such as Environment for Visualizing Images (ENVI), developed by ITT Visual Information Solutions, Boulder, CO, USA, used to process and analyze geospatial images.Although commercial software and tools such as ENVI (ITT Visual Information Solutions, Boulder, CO, USA) are available for visualizing and analyzing hyperspectral images.ENVI bundles contain lots of scientific * Corresponding Author algorithms used for image processing.However, such software are expensive.For several educational institutions due to constrained financial budgets, procurement of such software is not feasible.Subsequently, open source software are preferred in teaching and academic research.Integrated Land and Water Information System (ILWIS), an open source software, developed by ITC (Faculty of Geo-Information Science and Earth Observation, University of Twente) Enschede, the Netherlands, for its researchers and students.ILWIS is mainly used for vector and raster processing.However, the interface and file formats used in ILWIS are relatively unpopular and not easy for generalized processing in various applications.Some open source toolboxes such as the MATLAB-based hyperspectral image analysis toolbox (HIAT) are also available.However, mostly these toolboxes have not been packaged for dissemination and operation without the MATLAB software.The MATLAB is a proprietary software and development tool by the MathWorks is especially suitable for rapid application development which also allows building and dissemination of these applications along with graphical user interfaces by packaging.
Hyperspectral Remote Sensing (HRS) provides an accurate description about land surface features.This property of HRS is due to its large number of band set which is acquired at very narrow bandwidths which often results in poor bands with lesser information and poor contrast.Since the hyperspectral sensors acquire multiple images at similar wavelengths within individual spectrums, often these images also provide redundant information (Landgrebe, 2002;Varshney and Arora, 2010) These noisy or lesser informative bands and redundant bands contribute to errors in the accuracy which increases with respect to the dimensionality of the hyperspectral data (Zhang and Ma, 2009).The higher dimensionality also introduces higher computational costs in data processing.A popular approach to tackle these issues is to reduce the band size using dimensionality reduction (DR) or feature selection (FS).In the former case, feature space is transformed while in the latter case bands are selected from the original band set.One of the basic approaches which is widely used for DR is Principal Component Analysis (PCA) (Mather and Koch, 2011).The FS methods are more commonly known as 'band selection methods' in hyperspectral processing which are often based on the statistical characteristics of the individual hyperspectral bands.For FS information theory based measures like mutual information (MI) based and distance measures like Bhattacharya distance, Mahalanobis distance are often used (Richards, 2013b;Varade et al., 2017;Varade et al., 2018).Generally, DR or FS is a primary processing step in hyperspectral data processing for LULC studies.
The developed HICF package is currently divided into two primary modules of image classification and image fusion.The image classification module contains an implementation of algorithms for LULC classification of hyperspectral imagery (HSI).Classification can be supervised or unsupervised based on the availability of training sample.After classification statistical testing is performed on the classified image and available training data.The second module, image incorporates several algorithms for pixel and object level fusion of images.A module for deriving statistical parameters as indicators of the fusion quality is also provided in the fusion module.The organization of the paper is as follows.Section 1 introduces the HICF package with a brief information on other existing software used in hyperspectral remote sensing.Section 2 and 3 provide information on the two primary modules and on the corresponding sub-modules of these in the HICF package including their capabilities for HSI processing.Section 4 illustrates the graphical user interface (GUI) for the primary modules in the HICF package.Finally, the paper is concluded in section 5.

HICF details
The HICF package comprises a bundle of tools and applications for the processing of hyperspectral imagery for LULC studies.The source files of the HICF package are developed in MATLAB, which although is a commercial software, allows dissemination of packages as standalone executive files based on a freely downloadable MATLAB runtime compiler which is essential for the operation of the package.These standalone packages can be used freely on systems that do not have MATLAB installed.It only requires the MATLAB runtime compiler, which is automatically downloaded during installation.This package is mainly useful RS application classification and fusion.
The HICF package, at present, is divided into two modules of image classification and image fusion.Image classification module contains two types of methods for the extraction of useful information, one is FS and other is DR.In FS methods the original characteristics of the HSI bands are preserved, but in DR technique the band reflectance is transformed to a new value.For LULC analysis using HRS, the best band identified using FS or DR are used to perform supervised or unsupervised classification.In supervised classification, training data is required, subsequently, there is a sub-module in the classification module for the generation of training data.Training pixels can either be user defined pixel or randomly generated pixels from the reference image.The unsupervised methods for classification include approaches based on clustering of pixels and class labeling of the clusters.Since the applicability and potential of a classifier is determined based on the accuracy of classification, a separate sub-module "Accuracy Assessment" is included in the classification module.The image fusion module includes at present four methods for image fusion.This module is designed such that it can also be used in the fusion of multispectral images.A major application of image fusion methods is the pansharpening of the medium resolution multispectral and hyperspectral data.These methods are based on the fusion of panchromatic and multispectral produces an image with high spatial and spectral resolution image.To assess the quality of fused image another sub-module, "Quality Index" is incorporated, which provides 3 statistical metrics.

Flowcharts of classification and fusion module
A detailed flowchart for the classification module is given in Figure 1.As HSI contains hundreds of bands, so first operation block is to reduce the HSI band set, using FS or DR methods.Then depending on availability of training data, classification strategy is selected, and finally, the accuracy assessment block is used to evaluate the results.Flowchart for image fusion is shown in Figure 2. A very important point before applying fusion is that all participating images must be properly registered and resampled to the same pixel size.Fusion methodology block, take the images and apply the selected method.Finally, quality assessment is performed on the fused image.

DR/FS routine
The DR/FS routine provides an option to reduce the dimensionality of data using the DR methods and/or selection of features using the FS methods.In this routine, several techniques are used to reduce the dimensionality including, Principle Component Analysis (PCA), Independent Component Analysis (ICA).These two methods are included in the DR routine.The FS routine includes several approaches for band selection in hyperspectral data (Chang, 2003;Bajcsy and Groves, 2004;Varade et al., 2017).Due to advancements in algorithm development for hyperspectral band selection, this module is still being updated.
PCA: In the PCA technique, original feature space is transformed into independent feature space.PCA is a linear combination of input feature.So every PC band is a weighted average of input bands.It retains high variance information data in decreasing order (Chang, 2003;Richards, 2013).
ICA: In the ICA technique, signal/feature is broken into the independent component.It is based on the unmixing of original signals into the original component through some linear model (Oja and Nordhausen, 2002;Chang, 2003).
MI based: The MI based feature selection is based on information theory based measures.MI is an estimate of each HSI band and the reference image and higher value corresponding band is selected.There are variants of MI in literature.Varade et al., (2017) used clustering based MI to classify the snow cover area using Hyperion data.Another method for band ranking based on denoising error matching (BRDEM) which uses MI as a matching parameter with denoising responses of the hyperspectral bands is included (Varade et al., 2018).The updating of this module with other techniques recently developed is in progress.

Training
Supervised classification requires a priori information about the classes in the data corresponding to the study area.In general, the supervised classification analysis includes two parts, first is training the classifier with known pixels and then evaluating the accuracy of the classification results using testing/validation pixels.Both the training and the validation pixels are selected from a reference data which includes a priori information on the geography of the area represented in the hyperspectral/ multispectral data.The pixels for the training and validation are selected from the reference data which usually generated using ground survey, aerial photography or any other known information (Richards, 2013).The selection of training pixels can be random or systematic in nature (Mather and Koch, 2011).The classification accuracy usually varies with the amount of training and validation pixels (Congalton and Green, 2009).

Classification
In the HICF package, the classification module is sub-divided into 3 parts: training data selection, classification type and accuracy assessment.Training data selection is already discussed in section 3.2.Classification type includes several conventional supervised and unsupervised methods for classification.

Supervised Classification
In this package Minimum Distance to the Mean (MDM) classifier, Gaussian Maximum Likelihood (GML), Support Vector Machine (SVM) classifier (Mather and Koch, 2011).

Unsupervised Classification
In absence of reference data or knowledge about study area, unsupervised classification technique are used.Unsupervised classification is based on clustering of the pixel value in the spectral domain.Common algorithms for clustering include Kmeans and Iterative Self-Organizing Data Analysis Technique (ISODATA) method (Richards, 2013).
Accuracy Assessment: The classification accuracy assessment are often based on statistical parameters derived from the errors matrices (Congalton and Green, 2009).Some of these parameters are as follows:

Image Fusion
Image fusion is carried out at different levels of abstraction, including pixel level, feature level and decision level as discussed before (Pohl, 2016).The HICF package includes modules for two commonly used fusion levels including pixel level and feature level, discussed as follows.
Pixel-level fusion: Pixel-level fusion techniques works on raw data i.e. pixel at a particular location of the fused image is weighted summation of corresponding pixel in participating image (Chaudhuri and Kotwal, 2013).
 Bayesian Data Fusion (BDF): This method depends on the statistical properties of the source images.In case of fusion based pansharpening, multispectral and panchromatic bands constitute the source images.This method uses manual weights on spectral and panchromatic data which are based on the visual or quantitative information retrieved from the source images (Fasbender et al., 2008).BDF computes a variable of interest vector Z, which is linked to observables Y, with the help of an error model given in eq. ( 1). () where E is the vector of random error which is stochastically independent of Z.With the help of eq. ( 1), the error is calculated for both the source images to be fused.Fused data is calculated as, given in eq. ( 2) where ( , ) F i j is the fused pixel at the location ( , ) ij, ( , ) A i j and ( , ) B i j are the pixel values of the source images A and B at the location ( , ) ij, respectively and W is the weight used to fuse the two pixels.


Local Mean and Variance Matching (LMVM) fusion: LMVM method is based on the local image statistics and the equality of local histogram of two images being fused using local mean and variance matching functions (Bethune et al., 1998).In LMVM method, the local neighborhood size (w) significantly influences the fusion quality.A larger w reduces the fusion quality by incorporating a relatively larger variability of the data in the fusion scheme.In contrast, considering a much smaller w results in the fused image which is not able to represent the variance of the source data.The smaller w also increases the computational costs in processing.The commonly used w for fusion is usually between 5x5 to 11x11, depending upon the spatial resolution of the source images.The fused image using LMVM method is given in eq. ( 3).
( , ) ( ( , ) ( ( , )). ( ( , ) ) ( , ) ( , ) ( ( , ) ) where F(i, j) is fused pixel at location (i, j), H(i, j) and L(i, j) are high and low resolution image pixel value at (i, j), Fusion quality metrics: To test the goodness of fusion results some statistical parameters are computed between the fused result and source image.In the HICF package, this facility is under image fusion module's "Quality Index" tab.


Entropy (H): Entropy is a measure of the randomness in the data.As compared to low resolution blurred images the sharpened images often have higher entropy.This is especially observed in pansharpening where the entropy of the fused product is lower than the panchromatic band but significantly higher than the multispectral/hyperspectral bands.


Root Mean Square Error (RMSE): RMSE measures the amount of deviation between fused results and source image.The low value of RMSE indicates better is the fusion results.
 Peak Signal to Noise Ratio (PSNR): PSNR computes peak signal to noise ratio between two images in decibels.PSNR is basically used for quality measurement between reconstructed and source image.The high value of PSNR means better is the fused result than source image.

HICF-GUI
The primary graphical user interface of the HICF software is shown in Figure 3.The primary interface includes a hassle-free representation of the package.The pre-processing tab includes modules for image registration.The activation of 'Image Fusion' and 'Classification' tabs opens these modules in new windows.
Figure 3.The primary interface of the HICF package which includes the tabs for application based processing.
Figure 5. Fusion of sentinel-1 texture image and H-α-decomposition image of Sentinel-1 using BDF approach.

The interface of the image fusion module
The primary interface for the image fusion module is shown in Figure 4.In the image fusion module, the 'Band Reduction/Selection' tab in the 'Methods' tab is provided to select bands which are to be used in the fusion of hyperspectral imagery with other remote sensing imagery which is useful for example in pansharpening of spaceborne hyperspectral data such as Hyperion.The 'Image Info' tab gives the statistics of the loaded images.The different methods for fusion irrespective of pixel level or object level based are included in a single tab 'Fusion'.The quality metrics for the fused product are included in the 'Quality Index' tab. Figure 5 shows an example of image fusion using BDF approach with weighting parameter value equal to 0.6.The source images in the example correspond to an H-α composite of Sentinel-1 and Hyperion true color composite of the Solang valley in Himachal Pradesh, India.

The interface of the classification module
The classification module is shown in Figure 6 which includes significantly larger capabilities relative to the fusion module.Image Display tab is used for displaying images with different ways like 8-bit color composite (CC), 24-bit CC, false CC etc.For classification demonstration, we have again used the Dhundi dataset using SVM classifier and 40 % training pixels from reference data (Varade et al., 2017).Reference data and the classified map is shown in Figure 13 and 14 respectively.The accuracy of the classified image can be evaluated using the accuracy assessment option, which is shown in Figure 15.

CONCLUSION
In this paper, we introduce the HICF package which includes the two primary modules for hyperspectral data analysis in LULC based applications.The two modules are designed to incorporate commonly used approaches in image fusion and classification.The HICF GUI is organized in a simple format where the two modules are called from the main interface in separate windows.Due to the lack of such freely available software, incorporation of HICF package in academia provides significant opportunities for instructors to enhance the quality of teaching.In future, the addition of more algorithms for hyperspectral band selection and image fusion are proposed.
a) Conditional Kappa (Users) for all classes b) Conditional Kappa (Producers) for all classes c) Standard deviation of conditional Kappa (User's) d) Standard deviation of conditional kappa (Producer's) e) Overall kappa f) Overall Accuracy g) Overall standard error h) Z-statistics test are the local high and low image pixel inside window size (w, h),( , )   H i j and ( , ) L i j are local means inside window size of (w, h) and s is the local standard deviation.Feature-level fusion: In feature level fusion, features are extracted from the source images using image analysis techniques such as segmentation, morphological operations etc.These features are extracted based on the application area, which significantly affects the fusion quality (Chaudhuri and Kotwal, 2013). K-Singular Value Decomposition (K-SVD) based fusion: K-SVD (Aharon et al., 2006) based fusion works on dictionary learning and improves the dictionary iteratively to gain the sparse representation of input data. Artificial Neural Network (ANN) based fusion: ANN based fusion, works on the decomposition of source images into non-overlapping patches.These decomposition techniques reveal the horizontal, vertical and diagonal features which are used to train the ANN model et al., 2002).

Figure 4 .
Figure 4.The interface of the image fusion module.

Figure 6 .
Figure 6.The interface of the classification module.

Figure 7 .
Figure 7.The interface of the PCA sub-module of the classification module of the HICF package.

Figure 9 .
Figure 9. Example of the PCA result from the interface shown in Figure 7.

Figure 8 .
Figure 8.The interface of the ICA sub-module of the classification module of the HICF package.The "Feature Selection" tab is included in the classification module which contains DR and FS functionality.The layout of this is shown in Figure7and Figure8.To demonstrate this, Hyperion data of Dhundi (H.P.) is taken and PCA is used for DR and result is shown in Figure9.

Figure 10 .
Figure 10.Interface for the conventional methods in the classification module of the HICF package.

Figure 11 .
Figure 11.The interface of the advanced methods in the classification module of the HICF package.

Figure 12 .
Figure 12.Pattern recognition interface for unsupervised classification in the HICF package.

Figure 13 .
Figure 13.Example of reference data loaded in the classification module.

Figure 14 .
Figure 14.Example of SVM classification results derived in the classification module.

Figure 15 .
Figure 15.Interface for the accuracy assessment of the classification results with respect to reference data.