TEXTURE BASED HYPERSPECTRAL IMAGE CLASSIFICATION

This research work presents a supervised classification framework for hyperspectral data that takes into account both spectral and spatial information. Texture analysis is performed to model spatial characteristics that provides additional information, which is used along with rich spectral measurements for better classification of hyperspectral imagery. The moment invariants of an image can derive shape characteristics, elongation, and orientation along its axis. In this investigation second order geometric moments within small window around each pixel are computed which are further used to compute texture features. The textural and spectral features of the image are combined to form a joint feature vector that is used for classification. The experiments are performed on different types of hyperspectral images using multi-class one-vs-one support vector machine (SVM) classifier to evaluate the robustness of the proposed methodology. The results demonstrate that integration of texture features produced statistically significantly better results than spectral classification.


INTRODUCTION
Accurate image classification is important for many applications including agriculture monitoring, hydrological science, environmental studies, military applications, and urban planning etc. Hyperspectral (Goetz, 2009) sensors capture tens or hundreds of fine contiguous spectral bands of an image scene from ultraviolet to infrared region.The rich spectral information can be helpful to more discriminately identify the surface material, and objects of interest.However, high dimensionality also possess some challenges to supervised classification due to limited availability of training samples and curse of dimensionality.On the other hand contiguous spectral bands are highly correlated and provide redundant information.Therefore, dimensionality reduction by spectral feature extraction or selection is often performed prior to classification to mitigate the high dimensionality related problems.Feature extraction generally transforms the original data into a new smaller representation such that important information can be preserved with minimum number of features.The techniques such as principal component analysis (PCA) (Joliffe, 2002), discrete boundary feature extraction (DBFE) (Lee, et al., 1993), and nonparametric weighted feature extraction (NWFE) (Kuo, et al., 2002), etc. transform the original data into new uncorrelated dataset such that only a small number components have the maximum data variance.
The conventional pixel wise classifiers that use spectral features only, usually produce noisy classified maps.The labelling uncertainties can be minimized by combining some additional information in classification.In remotely sensed images the spatial context of a pixel can provide additional information as neighbouring pixels are highly correlated and likely to have same label.By integrating spectral and spatial information better classification can be performed yielding more accurate results.The integration of spatial information is usually done either by post classification refinement or by using joint spectral-spatial feature vector.Post classification refinement mostly include relaxation labelling and segmentation based approaches.The spatial feature extraction involves determining information about shape, size, cooccurrence, and texture, etc. from a crisp or adaptive neighbourhood.The major approaches in this categories are based on Markov random field (MRF) (Farag, et al., 2005), Gabor filters (Grigorescu, et al., 2002), morphological operators, and wavelet decomposition etc.The high dimensionality possess the major challenge to the spatial feature extraction.The traditional 2-dimensional (D) approaches need to be adapt to 3-D structure of the hyperspectral imagery.
Texture is an important characteristics to model the spatial properties of an image.The texture analysis has been widely used for classification and segmentation of different types of images.The traditional approaches to texture analysis include gray level co-occurrence matrices (GLCM) (Haralick, et al., 1973), MRF models, Gabor filters, Gibbs random field models, and wavelet analysis, etc.The high dimensionality is the major challenge in texture modelling of the hyperspectral images.Additional efforts are required to adapt the traditional texture analysis methods to high dimensional data.Shi, et al. (2003) used Gabor filters on spectral bands of a hyperspectral image with reduced dimensionality to compute texture features.A similar approach was applied in (Rellier, et al., 2004) with a simplified multivariate Gaussian MRF model.Shi, et al. (2005) modelled hyperspectral texture using multiband correlation functions.Tsai, et al. (2013) extended GLCM to 3-D GLCM and defined third-order texture measures for extracting texture features from hyperspectral image cube.
In this research work the second order geometric moments are used to compute texture features and a spectral-spatial framework for supervised classification of hyperspectral *Corresponding author The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-8, 2014ISPRS Technical Commission VIII Symposium, 09 -12 December 2014, Hyderabad, India This contribution has been peer-reviewed.doi:10.5194/isprsarchives-XL-8-793-2014imagery.The rest of the paper is organized as follows.Section 2 provides the theoretical background, section 3 outlines the proposed methodology, the experimental results are presented in section 4, and finally section 5 concludes the work.

BACKGROUND
This section describes the methods used in this work for characterization of spectral and spatial information.We used PCA and DBFE, which are unsupervised and supervised techniques respectively for spectral feature extraction to investigate the integration of texture information with different types of spectral features.Both these algorithms are widely used and have been demonstrated to be powerful approaches.

PCA
The PCA is an unsupervised data transformation technique that transforms the dataset into lower dimensional space that retains most of the variation of original dataset into a new set of orthogonal variables called principal components (PCs) minimizing the correlations among PCs.The PCs are sorted such that the first PC preserves the greatest variance, second PC corresponds to next greatest variance, and so on.For hyperspectral image classification a subset of largest PCs corresponding to some pre-specified percentage of cumulative variance usually 99%, is taken.

DBFE
The DBFE was proposed by Lee, et al. (1993) who introduced the concept of effective decision boundary and defined discriminantly informative features (DIF) and discriminantly redundant features (DRF) for feature extraction.It is a supervised feature extraction technique.The DIF and DRF are extracted from decision boundary using a decision boundary feature matrix (DBFM).The eigenvectors of DBFM corresponding to nonzero eigenvalues are the DIFs and used as new feature vectors.For multiclass problem DBFMs are computed for each pair of classes and averaged (Castaings, et al., 2010).The resulting DBFM is used to obtain new feature vectors.

Geometric Moments
The moment invariants of an image can be used to derive shape characteristics, elongation, and orientation along its axis.Geometric moments are widely used in various image processing tasks such as segmentation, object detection, scene matching, shape analysis, and texture recognition, etc.The image moments within small window centred on a pixel are computed which are subsequently used to compute texture features.The geometric moments of the order of ( ) p q  of a function ( , ) f x y with respect to origin (0,0) are defined as (Tuceryan, 1994) ( , ) p q pq m f x y x y dxdy where , 0,1,2,..., p q   .For digital images, the function ( , ) f x y is digitized into discrete version ( , ) I i j and Eq. ( 1) is approximated as follows 1 1 ( , ) where M N  is the size of the image I. Moments of different order have different geometric interpretations about area, orientation, elongation, and shape, etc.For pixel wise texture analysis, the moments are computed within small windows centred on the pixels.As a result a new feature image pq M is obtained corresponding to each such moments pq m .The number of such images is equal to the number of moments computed.These feature images are further used to compute texture features since moments themselves are not sufficient to characterize good texture features (Tuceryan, 1994).Texture feature of a pixel at location ( , ) i j is determined by using a small window ij W of size L L  and applying some mapping function.We use absolute deviation of moments from their mean value to map moment image pq M to the texture feature image pq F using following transformation where pq M is the mean of moments.Some other linear or nonlinear functions may also be used for transformation.

METHODOLOGY
The flow chart of the proposed methodology is illustrated in Figure 1.In panchromatic images the moments can be determined from the image directly.However, for hyperspectral images some adaptation is required due to high dimensionality.
Figure 1.Flow chart of the proposed methodology.
We decompose hyperspectral image by PCA and use the first PC, which retains the most of the variance, to compute moments.In this work second order ( 2) p q   geometric moments are used for computing texture features.For pixel wise texture feature extraction, 8-connected neighbourhood is considered.A 3 3  window centred on the pixel under consideration is placed and moments are computed on the local sub image.Similarly a 3 3  window is used again to The texture features are stacked with spectral features to form a joint feature vector, which is used for classification using SVM.The accuracy analysis of the results is done in terms of kappa coefficient (  ) (Congalton, et al., 1999).The statistical significance of the difference between two classifications is determined using Z statistics defined as follows where 1  , 2  are kappa coefficients of two confusion matrices

EXPERIMENTAL RESULTS
In this section experimental results and their analysis are presented.Several experiments are performed to evaluate the effectiveness of the proposed methodology.

Description of Datasets
The investigations involve two different types of airborne hyperspectral datasets.

Classification
The supervised classification is done using SVM, which is well suited to classify hyperspectral data (Camps-Valls, et al., 2005).It can handle the issues like high dimensionality, small size training samples, and poor generalization, etc. better than the conventional classifiers.The SVM is implemented using MATLAB interface of LIBSVM (Chang, et al., 2011) tool.We used one-vs-one multi-class SVM with radial basis function (RBF) kernel.The parameters, cost C and the spread of kernel  are determined optimally using 5-fold cross validation.The feature extraction using DBFE is implemented with MultiSpec tool (Landgrebe, et al., 2003).

Results
The texture features obtained from second order ( 2) p q   geometric moments  ( , , , , , ) m m m m m m are combined with spectral feature obtained from PCA or DBFE.The length of feature vector is increased by six as one additional feature is produced by each moment image.The results are compared with spectral classification results to investigate the impact of integration of spatial information.For both spectral feature extraction techniques, the first few components necessary to retain 99% variance are kept in joint feature vector.

University of Pavia Dataset Results:
The details of training and test samples used in the experiments are provided in Table 1.The training samples are randomly chosen from the ground reference and remaining samples are used for evaluating the performance.The results reported for each experiment are the average of ten trials.It can be observed from the results in Table 1 that overall accuracy (  ) of spectral classification using PCA is not good.It is also apparent from the classified map in Figure 4 as it appears noisy.The spectral information from PCA is good enough to    the classes including "Grapes untrained" and "Vinyard untrained".However, in this case accuracies of some classes is reduced but the difference is not statistically significant.The integration of texture information with PCA produced better results than its integration with DBFE with overall  of 99.81% and 99.20% respectively.Figure 5 presents the classified maps for Salinas dataset.The noisy maps are improved by integrating texture information.

CONCLUSION
In this research work a spectral-spatial supervised framework was proposed for integrating spectral and texture features to classify hyperspectral data.The second order geometric moments were used to compute texture features.The proposed methodology was tested on two airborne hyperspectral datasets to evaluate its effectiveness.Two techniques, one supervised and one unsupervised were employed for spectral feature reduction.The experimental results showed that integration of texture and spectral features produced statistically significantly better results than spectral classification.The texture features were successfully integrated with spectral features computed by different algorithms to reduce the label uncertainty.For both the datasets more than 99% accuracy was achieved in terms of kappa coefficient.
of Pavia Dataset: This dataset is captured by ROSIS-3 sensor over urban area of Engineering School at University of Pavia, Italy.It is 610 340  pixels image with 1.3 meters pixel size.Original dataset consists 115 spectral bands in the range 0.43 0.86 m   but 12 bands are discarded due to noise.The false color composite (FCC) and ground reference data are shown in Figure 2.There are nine land cover classes of interest: Asphalt, Meadows, Gravel, Tree, Metal sheets, Soil, Bitumen, Bricks, and Shadow.

Figure 2 .Figure 3 .
Figure 2. University of Pavia: (a) FCC (b) Ground reference 4.1.2Salinas Dataset: It is a 512 217  pixels scene captured by AVIRIS sensor over the Salinas Valley, CA, USA.The spatial resolution is 3.7 meters/pixel.Original dataset has 224 spectral channels out which 20 water absorbing bands are removed.The FCC and corresponding ground reference of Salinas dataset are given in Figure 3.The scene is taken over

Table 1 .
Classification accuracies for University of Pavia dataset.The best results are highlighted in bold typeface.
ij Z is the Z statistic between classifiers i and j.

Table 2 .
The experiments are performed for a different kind of database to evaluate the effectiveness of the proposed methodology.Like previous experiments the training samples are randomly selected from ground reference and remaining samples are kept for accuracy analysis.PCA produced overall  of 89.60% and DBFE performed better with 91.95%.Both PCA and DBFE yielded poorest results for two classes "Grapes untrained" (82.61% and 83.56% respectively) and "Vinyard untrained" (54.53% and 69.79% respectively).By observing confusion matrix it is found that most of the misclassified pixels of "Grapes untrained" are classified as "Vinyard untrained" and vice versa.Additional information provided by texture feature has helped better discriminating the different classes.By spectralspatial classification using PCA and texture features achieved more than 99% accuracy for all the classes.The overall as well as class wise  are statistically significantly improved.The texture features with DBFE also statistically significantly improved overall  and class specific accuracies for most of

Table 2 .
Classification accuracies for Salinas dataset.The best results are highlighted in bold typeface.
ij Z is the Z statistic between classifiers i and j.