A COMPARISON STUDY OF DIFFERENT MARKER SELECTION METHODS FOR SPECTRAL-SPATIAL CLASSIFICATION OF HYPERSPECTRAL IMAGES

An effective approach based on the Minimum Spanning Forest (MSF), grown from automatically selected markers using Support Vector Machines (SVM), has been proposed for spectral-spatial classification of hyperspectral images by Tarabalka et al. This paper aims at improving this approach by using image segmentation to integrate the spatial information into marker selection process. In this study, the markers are extracted from the classification maps, obtained by both SVM and segmentation algorithms, and then are used to build the MSF. The segmentation algorithms are the watershed, expectation maximization (EM) and hierarchical clustering. These algorithms are used in parallel and independently to segment the image. Moreover, the pixels of each class, with the largest population in the classification map, are kept for each region of the segmentation map. Lastly, the most reliable classified pixels are chosen from among the exiting pixels as markers. Two benchmark urban hyperspectral datasets are used for evaluation: Washington DC Mall and Berlin. The results of our experiments indicate that, compared to the original MSF approach, the marker selection using segmentation algorithms leads in more accurate classification maps.


INTRODUCTION
Imaging spectroscopy, also known as hyperspectral imaging, is concerned with the measurement, analysis, and interpretation of spectra acquired from either a given scene or a specific object at a short, medium, or long distance by a satellite sensor over the visible to infrared and sometime thermal spectral regions (Shippert, 2004).Recent technological improvements in spatial, spectral, and radiometric resolution of spectroscopy imagers beget the need of developing new methods for information extraction form this data.The information provided by hyperspectral data, make landcover classification a very promising application.There are two major approaches for classification of hyperspectral images: the spectral or pixelbased and the spectral-spatial or object-based approaches.While the pixel-based techniques, such as the classic Maximum Likelihood or Support Vector Machines (SVM) classifiers, use only the spectral information of the pixels, the objected-based frameworks such as Geographic Object-Based Image Analysis (GEOBIA) (Blaschke et al., 2014) or Minimum Spanning Forest (MSF) (Tarabalka et al., 2010a) classifiers employ both spectral characteristics and spatial context of the pixels.Many researchers have demonstrated that the use of spectral-spatial information, rather than only spectral information, improves the classification efficiency of hyperspectral data (Plaza et al., 2009;Li et al., 2010;Fauvel et al., 2012;Xu et al., 2014;Heras et al., 2014).
In the early studies on spectral-spatial image classification, the spectral information extracted from neighborhoods, defined by either fixed windows (Camps-Valls et al., 2006) or morphological profiles (Fauvel et al., 2008), was used to classify and label each pixel.Segmentation techniques are the powerful tools for defining the spatial dependences among the pixels and finding the homogeneous regions in the image (Gonzalez and Woods, 2002;Chen et al., 2012).The advantages of using segmentation for distinguishing spatial structures from one another are also discussed in (Tarabalka et al., 2010;Bitam and Ameur, 2013).An alternative way to achieve the accurate segmentations of image is marker-controlled segmentation (Soille, 2003;Tarabalka et al., 2010).The idea behind this approach is selecting of one or several pixels for every spatial object as the seed or a marker of the corresponding region.The marker-based segmentation significantly reduced the oversegmentation problem and led to better accuracy rate (Soille, 2003).
Automatic marker selection has been previously used in the literature mostly for the greyscale and color images.Markers are often chosen by searching the flat zones (i.e. the connected components of pixels with a constant grey level value) or the zones of homogeneous texture (Soille, 2003).Gómez et al. (2007) used histogram analysis to obtain a set of representative pixel values, and the markers were generated with all the image pixels having representative grey values.Jalba et al. (2004) used connected operators filtering on the gradient image, in order to select the markers for a greyscale diatom image.Noyel et al. (2007;2008) performed classification of hyperspectral image using different methods, such as Clara (Kaufman and Rousseeuw, 1990) and linear discriminate analysis (Duda et al., 2001) and then filtered the classification maps, using mathematical morphology operators, for selecting large spatial regions as markers.Random balls, which connect pixels of randomly selected sizes, have been also extracted from large regions and employed as the markers (Jalba et al., 2004;Noyel et al., 2007;Noyel, 2008).
Recently, Tarabalka et al. have proposed an efficient approach for spectral-spatial classification using the MSF, grown from automatically selected markers (Tarabalka et al., 2010a).They used a pixel-wise SVM classification in order to select the most reliable classified pixels as markers.In their framework, a connected components labelling is applied on the classification map.Then, if a region is large enough, its marker is determined as the P% of pixels within this region with the highest probability estimates.Otherwise, it should lead to a marker only if it is very reliable.A potential marker is formed by pixels with estimated probability higher than a defined threshold.
It should be noted that, none of the above mentioned methods do not use the spatial information in marker selection process.In this paper, a modified marker selection method is proposed to improve the classification of hyperspectral images.This method benefits from segmentation algorithms to integrate the spatial information into marker selection process.In the proposed method, the pixels related to a given class with the largest population, are kept for each region of segmentation map.Afterwards, the most reliable classified pixels are chosen among the exiting pixels for each region as the markers.The markers obtained are then used in MSF approach to classify the hyperspectral images.

MSF-BASED FRAMEWORK
The MSF framework grown of markers is used in this paper for classification of hyperspectral image.In MSF, each pixel is considered as a vertex νϵV of an undirected graph G = (V, E, W), where V and E are sets of vertices and edges, respectively, and W is a mapping of the edges E into R + .Each edge e i,j ∈ E of this graph connects a couple of vertices i and j corresponding to the neighboring pixels.Furthermore, a weight w i,j is assigned to each edge e i,j , which indicates the degree of dissimilarity between two vertices (i.e., two corresponding pixels) connected by this edge.We used an eight neighborhood and the spectral angle dissimilarity measure for computing the weights of edges, as described in (Van Der Meer, 2006).Given a graph G = (V, E, W), the MSF rooted on a set of  distinct vertices { 1 , … ,   } consists in finding a spanning forest  * = (,   * ) of , such that each distinct tree of  * is grown from one root   , and the sum of the edges' weights of  * is minimal (Stawiaski, 2008).
In order to obtain the MSF rooted on markers, m additional vertices (i.e.t i , i = 1, … , m) are introduced.Each extra vertex t i is connected by the edge with a null weight to the pixels representing a marker .Furthermore, an additional root vertex  is added and is connected by the null-weight edges to the vertices t i (see Figure 1).The minimal spanning tree of the constructed graph induces a MSF in G, where each tree is grown on a vertex t i .Finally, a spectral-spatial classification map is obtained by assigning the class of each marker to all the pixels grown from this marker.Figure 1.An example of addition of extra vertices   ,   and r to the image graph for the construction of an MSF rooted on markers 1 and 2; non-marker pixels are denoted by "0."

THE PROPOSED METHOD
The flowchart of the proposed method is presented in Figure 2. In this method, the SVM and the segmentation algorithms, such as watershed, expectation maximization (EM) and hierarchical segmentation are first used, in parallel, to classify and to segment the hyperspectral images, respectively.Afterwards, all the pixels, related to class, with the largest population, are kept for each region of segmentation map (see Figure 3).Lastly, the most reliable classified pixels are chosen among the exiting pixels for each region as markers.Then the markers are used to build the MSF.
Figure 2. Schema of the proposed method.In the following, the hyperspectral image segmentation techniques are introduced described.

1) Watershed segmentation:
Watershed transformation is a powerful morphological approach for image segmentation.It combines region growing and edge detection.The watershed lines divide an image into the catchment basins, so that each basin is associated with one minimum in the image (Vincent and Soille, 1991).Using watershed segmentation, an image can be partitioned into a set of regions and one subset of watershed pixels, i.e., pixels situated on the borders between regions.
Finally, each watershed pixel is assigned to the neighboring region with the "closest" median.In other words, the distance between the vector median of this region and the watershed pixel should be minimum (Vincent and Soille, 1991).

2) EM segmentation:
The EM method is indeed a Gaussian mixture solution.It belongs to the group of partitioningclustering techniques (Tarabalka et al., 2009).Clustering aims at finding the groups of spectrally similar pixels.Normally, the assumption of belonging all pixels of a given cluster are drawn from a multivariate Gaussian probability distribution.The distribution's parameters are then estimated by the EM algorithm.When the algorithm converges, the outputs are the clusters.However, as no spatial information is used during the clustering procedure, pixels with the same cluster label can either form a connected spatial region or belong to disjointed regions.The pixels in the latter group are the isolated pixels.In order to obtain the segmentation map, a connected components labelling algorithm (Shapiro and Stockman, 2002) is applied to the output image partitioning obtained by clustering.
3) Hierarchical segmentation: The hierarchical algorithm is a segmentation technique based on the iterative hierarchical stepwise optimization (HSWO) region-growing method.It integrates the spatial and spectral information in a two-step procedure.In the first step, the homogenous areas are segmented at their maximum details, and then, by grouping the spectrally similar but spatially disjointed regions, larger and more uniform objects are created (Tilton, 1998).

Datasets
Two hyperspectral images, with different characteristics, are used for our experiments.First dataset is collected by HYDICE sensor over Washington DC Mall.The second hyperspectral image covers the Berlin urban area and has been acquired by HyMap.Table 1 describes the main characteristics of these datasets.

Dataset
Washington For each class in each dataset, we randomly chose 10% of the labelled samples for training and the rest (i.e.90%) were used for testing.

Pre-processing
In this study, the Gaussian radial basis function (RBF), as kernel, is used for the SVM classifier (Camps-Valls and Bruzzone, 2005).The RBF kernel's parameters, i.e.C and  , are chosen by a five-fold cross validation.They are C=128 and γ = 0.1 for Washington DC Mall dataset and C = 256 and γ = 0.01 for Berlin dataset respectively.
To create a map of markers in the proposed method, for each region of segmentation map with number of pixels equal to class's pixels with a maximum population, if it contains more than 40 pixels, 9% of its pixels with the highest estimated probability are selected as the marker.Otherwise, the region marker is formed by the pixels with estimated probability higher than a threshold .The threshold  is equal to the lowest probability within the highest 6% of the probabilities for the whole image.In the next step, the image pixels are grouped into the MSF using the spectral angle dissimilarity measure, built from the selected markers (Van Der Meer, 2006).
In order to compare the results of the proposed method, we have implemented MSF algorithm on the markers obtained based on the labelling of connected components, i.e. original MSF approach (Tarabalka et al., 2010a).In this approach, the labelling of connected components is performed using the eightneighborhood connectivity.For each connected component, if it contains more than 20 pixels, 5% of its pixels with the highest estimated probability are selected as the marker for this component.Otherwise, the region marker is formed by the pixels with estimated probability higher than 2%.

Classification results
Figure 4 shows the color composite image, reference data and the classification maps obtained by the original MSF algorithm, as well as, by different proposed marker sets for Washington DC Mall dataset.As can be seen, the classification maps obtained by the proposed method contain much more homogeneous regions compared to those obtained by Original-MSF.These results prove the importance of the use of spatial information throughout the marker selection procedure.Figure 5 shows the color composite image, reference data and the classification maps of the Original-MSF, as well as the proposed methods for the Berlin dataset.We can see from Figure 5 that by incorporating the spatial information, the proposed algorithm leads to much smoother classification maps than the Original-MSF algorithm.
The accuracy of the classification results is generally assessed by the overall accuracy (OA), the Kappa coefficient of agreement (κ), and the class-specific producer's accuracy (PA).
The OA is the percentage of correctly classified pixels, the κ is the percentage of agreement corrected by the amount of agreement that could be expected due to chance alone, and the PA is the percentage of correctly classified samples for a given class.Table 2 shows the global (κ and OA) accuracy values estimated for different methods and datasets.In this table, for the Washington DC Mall dataset, the OA of Hierarchical-MSF method is approximately 4% higher than the Original-MSF method.Moreover, for the Berlin dataset, however, the Watershed-MSF has an increase of about 5% OA, higher than the Original-MSF; these results are slightly different from Hierarchical segmentation.As can be seen in this

CONCLUSION
In this paper, a comparison study of different marker selection methods for spectral-spatial classification of hyperspectral images was accomplished.The hyperspectral images are first classified using SVM and a segmentation algorithm.Then, the corresponding pixels of each class with the largest population for each region of segmentation map are kept.Lastly, the most reliable classified pixels are chosen as markers and used to build the MSF.The segmentation algorithms used in this study were the watershed, EM and hierarchical algorithms.Experimental results show that compared to Original-MSF approach, the proposed marker selection method improves the classification accuracies and provide classification maps with homogeneous regions.
The proposed methodology succeeded in taking advantage of the spatial and spectral information simultaneously for accurate hyperspectral image classification.While performing particularly well for classification of homogeneous regions, the proposed approach has a drawback common to most of spectral-spatial techniques.It produces a smoother classification map when compared with pixel-wise ones.Therefore, it risks impairing results near the borders between regions (where mixed pixels are often encountered) or in textured areas.Spectral unmixing techniques can be used for Original_MSF Watershed_MSF EM_MSF Hierarchical_MSF accurate analysis of border regions, while segmentation can be applied for textured regions.

Figure 3 .
Figure 3.An example of the interference segmentation map in SVM classification map.

Figure 6
Figure6shows the per class producer's accuracies obtained for the two datasets.As can be seen in these charts, while in Berlin dataset, Watershed-MSF method achieves the best accuracy for most of the classes, in Washington DC Mall dataset, Hierarchical-MSF improves all the class-specific accuracies compared to the Original-MSF.

Table 1 .
The main characteristics of the datasets used.

Table 2 .
table, in all three cases, the segmentation methods have improved the accuracy of MSF classification.Therefore it can be stated that; in marker selection process combining the spatial information obtained by segmentation maps with the SVM classification improves substantially the classification accuracies.The global accuracy values obtained for the datasets used.