LAND USE AND LAND COVER CLASSIFICATION USING HYPERSPECTRAL IMAGERY : EVALUATING THE PERFORMANCE OF SPECTRAL ANGLE MAPPER , SUPPORT VECTOR MACHINE AND RANDOM FOREST

Land Use and Land Cover (LULC) information is an important data source for modeling environmental variables, so it is essential to develop high quality LULC maps. The hundreds of continuous spectral bands gathered with hyperspectral sensors provide high spectral detail and consequently confirm hyperspectral remote sensing as an appropriate option for many LULC applications. Despite increased spectral detail, issues like high dimensionality, huge volume of data and redundant information, mean that hyperspectral image classification is a complex task. It is therefore essential to develop classification approaches that deals with these issues. Since classification results are directly dependent on the dataset used, it is fundamental to compare and validate the classification approaches in public datasets. With this in mind, aiming to provide a baseline, four classification models in the relatively new hyperspectral HyRANK dataset were evaluated. The classification models were defined with three well-known classification algorithms: Spectral Angle Mapper (SAM), Support Vector Machine (SVM) and Random Forest (RF). A classification model with SAM and another with RF were defined with the 176 surface reflectance bands. A dimensionality reduction with principal component analysis was carried out and a classification model with SVM and another with RF were defined using 14 principal components as features. The results show that SVM and RF algorithms outperformed by far the SAM in terms of accuracy, and that the RF is slightly better than the SVM in this respect. It is also possible to see from the results that the use of principal components as features provided an improvement in the accuracy of the RF and an improvement of 28% in the time spent fitting the classification model.


INTRODUCTION
The pace of changes in Land Use and Land Cover (LULC) due to human activity are unprecedented.Activities pursuing economic objectives, such as wood exploration, cattle ranching and agriculture, drive LULC changes mainly through deforestation in tropical countries.LULC changes increase the probability of erosion and flooding occurrence and lead to problems such as loss of biodiversity and increase in greenhouse gases emissions (Mas, 1999).LULC information therefore has a key role in environmental and climate change studies (Henderson-Sellers and Pitman 1992), and the requirement of reliable and high quality LULC maps is a global concern, since these provide baseline information for planning and evaluating natural resource management, modeling environmental variables and developing sustainable practices (Adam et al., 2014;Gómez et al., 2016).
Since the 1970s, remote sensing imagery has provided an uninterrupted and reliable set of information enabling the mapping and monitoring of the Earth's surface (Petitjean et al., 2012;Maus, et al., 2016).The synoptic and multitemporal characteristics, as well as the large coverage area, provide capabilities that make satellite imagery the most suitable approach for mapping large areas, taking account of time and expense (Kavzoglu andColkesen, 2009, Puletti et al., 2016).While remote sensing satellites have the capability of capturing landscape imagery, extracting LULC information from these data requires effective image processing techniques.
Although satellite images have been used for decades for LULC mapping, and the remote sensing community has been seeking improvements in image classification techniques throughout that time, there is still room for improvement in LULC mapping through digital image processing.The construction of LULC products through image classification techniques is a challenging task, due to several variables related to the process (Shao and Wu, 2008, Manandhar et al., 2009, Khatami et al., 2016).Among the several elements that influence the accuracy of remote sensing image classification, defining the image and the classification algorithm are particular issues.
A few decades ago, hyperspectral remote sensing arose offering high potential to improve LULC mapping (Biocudas-Dias et al., 2013).The possibility of acquiring hundreds of continuous spectral bands with narrow bandwidths, providing more spectral detail than coarse bandwidths acquired with multispectral sensors, provided new opportunities for LULC applications (Chutia et al., 2016;Ghamisi et al., 2017).However, higher spectral detail comes with high dimensionality and a huge volume of data, which leads to an issue known as the Hughes phenomenon.This explains that increasing the number of predictor features potentially adds information to separate the classes, however the complexity is also increased and the number of samples in short dataset may not be enough to characterize this complexity, therefore adding more features rather than increasing the classification accuracy may actually decrease the accuracy (Maxwell et al., 2018).
Hyperspectral image classification is also a challenging task due to the limited number of available training samples, the redundant information presented in the features, the uncertainties related to atmospheric or topographic effects and the influence of spatial resolution (Ghamisi et al., 2017).Choosing an algorithm that deals with these issues is therefore essential to get high accuracy LULC products through hyperspectral image classification.Nowadays, machine learning algorithms like Support Vector Machine (SVM), Random Forest (RF) and Neural Nets (NN) have been widely used by the remote sensing community for hyperspectral image classification (Ghamisi et al., 2017), although the results provided by each are dependent on the dataset used, therefore studies that evaluate and compare them in the same dataset are of particular interest.
For this reason, and due to the lack availability of public hyperspectral datasets of large size and with several land cover classes, the International Society for Photogrammetry and Remote Sensing (ISPRS), commission III, working group III/4, hyperspectral image processing, developed the HyRANK Hyperspectral Satellite Dataset (Karantzalos, et al., 2018), aiming to provide a dataset where researchers will be able to validate and compare new LULC classification approaches.
Based on that, the accuracy of four classification models defined with three well-known algorithms carrying out the classification on the HyRANK dataset were evaluated and compared in this paper.The goal was to provide a baseline for future research in LULC classification with the HyRANK dataset.The algorithms adopted were the traditional Spectral Angle Mapper (SAM), which in the recent past has been widely used in the classification of hyperspectral data, and two state-of-the-art machine learning algorithms, SVM and RF.This paper is organized as follows: an overview of the classification algorithms is presented in section 2, in section 3 the dataset, the classification process and the metrics used in the evaluation process are explained, in section 4 the results and discussion are produced.Finally, the conclusions are given in section 5.

CLASSIFICATION ALGORITHMS
A short overview of the classification algorithms SAM, SVM and RF is presented in this section with the aim of providing a background to these classifiers to help in understanding the sections which follow.More details about SAM, SVM and RF can be found, respectively, in Kruse et al. (1993), Cortes andVapnik (1995), andBreiman (2001).
SAM is a non-parametric supervised algorithm proposed in Kruse et al. (1993).The algorithm measures the similarity between two spectra, where one corresponds to the pixel to be labeled and the other is a reference spectrum for the LULC class.The spectra are treated as vectors, and their dimension is defined by the number of image bands.The similarity measurement is the angle between the two spectra from the origin, and it can be computed with Equation 1, where  ⃗ is the pixel spectrum and  ⃗ the spectrum for the LULC class.
SAM is less sensitive to illumination variations than other similarity measurements (i.e.Euclidean distance), since the changes in illumination have a direct effect on the scale and an indirect effect on the orientation, considering the line that connects the pixel and the origin of the multidimensional feature space.In this case, even with changes in the pixel digital number, it will be aligned with its class (Kruse et al., 1993).However, the need for pure reference spectra is the major drawback of this classifier.
Machine learning classifiers have become a major focus for the remote sensing community since such algorithms deal with highdimensional feature space and are able to model complex class signatures (Maxwell et al., 2018).Nowadays the non-parametric machine learning algorithms SVM, RF and NN are state-of-theart for remote sensing image classification (Khatami et al., 2016), producing higher levels of accuracy than parametric algorithms like Maximum Likelihood (Yu et al., 2014).
SVM finds the optimal hyperplane that separates the classes in a multi-dimensional feature space.The best decision boundary is that which minimizes the errors and maximizes the distance between the training samples (Cortes and Vapnik, 1995).SVM is especially useful for small training datasets since it relies only on observations located on the decision boundaries (support vectors) (Mountrakis et al., 2011).This advantage makes SVM more relevant for remote sensing applications in particular, due to the problem of getting training samples which normally require field work and consequently high cost (Tuia et al., 2011).
Despite the advantages, SVM has a drawback in the high number of parameters to be tuned, a high computational cost and the need to choose a kernel function (Mountrakis et al., 2011).The kernel function is used to transform the n-dimensional feature space into a larger dimension space where the classes are linearly separable.
The known kernel options are linear, polynomial, sigmoid, and the radial-base function (RBF) (Kavzoglu and Colkesen, 2009).Mountrakis et al. (2011) highlights that RBF kernel is the one most suitable for remote sensing data classification.
RF algorithm is an ensemble of decision trees.It has been used in several remote sensing classification works because of its simplicity and good accuracy results (Belgiu and Drăguţ, 2016).
Decision tree is a recursive split approach of the input data (Pal and Mather, 2003).The splits are performed starting from a root node (first level of the tree) up to the leaf nodes, decreasing the entropy at each split.The leaves are the last level of the tree and it is where the entropy is at its lowest possible.The intention is to have only samples from the same class in the leaves (Ho, 1995).There are several split nodes in the path that goes from the root node to the leaf node.These contain decision rules based on the available features and a threshold applied to the features chosen.
Despite the decision trees being extremely fast and simple, they are very sensible to noise and frequently overfit the training samples.Because of that, a decision tree can be classified as a weak learner.To overcome these drawbacks an ensemble of decision trees is combined in an RF (strong learner).In the RF algorithm the trees of the forest must be uncorrelated, each tree being unique, hence the random subspace (feature bagging) and bootstrap aggregating (bagging) techniques are applied.
Bootstrap aggregating, presented by Breiman (2001), consists of the random selection, with replacement, of a subset of samples from the training dataset.The random subspace, proposed by Ho (1995), consist of a randomly selected subset of features from all the input features at each node, and, from the new subset of features chosen, considering the one which splits the node that produces the smaller entropy at the next level.
As stated before, both techniques minimize the model variance without increasing the trend.So, while a single decision tree is sensitive to noise, the average forecast for an ensemble of trees is not sensitive, so long as the trees are uncorrelated (Friedman et al., 2001).After growing the forest, each tree casts a vote for a class and the label is defined by the majority vote.The main advantages of the RF algorithm are in dealing well with noise, having fewer parameters to be tuned, and the low computational cost.

Dataset
As stated before, in this work the HyRANK dataset, developed by a scientific initiative of the ISPRS, WG III/4, was used.The main goal with the HyRANK dataset is to provide a dataset with hyperspectral images along correspondent ground truths, to enable the scientific community to validate new classification approaches against the state-of-the-art methods.
HyRANK is composed of five hyperspectral images gathered with the Hyperion sensor Earth Observing-1.As well as being a dataset, HyRANK is intended to be an evaluation online platform in which it will be possible to upload the LULC results for the images Erato, Nefeli and Kiriki.The overall accuracy of the model will be estimated in this platform and will be made available to the community in order to compare results from different classification approaches.Further details regarding the dataset and the accuracy assessment platform can be found in Karantzalos et al. (2018).
At the time of writing, the images are already available, but the online assessment platform is not yet.In order to obtain, train and evaluate classification models, the ground truths from Loukia and Dioni images were therefore split in two sets.85% of the ground truth were assigned for training the classification algorithms, and the remaining 15% were applied to classification model evaluation.

LULC classifications description
All steps to implement the LULC classification and evaluation were performed using available libraries in python.The SAM algorithm is available in the spectral package and the machine learning algorithms are available in the scikit-learn package (Pedregosa et al., 2011).
To perform the SAM's classification we used the 176 surface reflectance bands as features, so each class and pixel spectrum has a 176-dimension.The 14 LULC classes had their reference spectra defined based on the training set, all reference spectra were assembled through the mean of all pixels' spectra belonging to the correspondent LULC class.In the classification process, each pixel spectrum was compared to the 14 reference spectra, so that 14 similarity measurements were produced for each pixel.The label of each pixel was defined as the same as the most similar (lower angle) reference spectrum.
As stated in Section 2, the SVM algorithm has high computational cost, which makes it not feasible to use 176 surface reflectance bands as input features.Another issue related to the feature dimension is the Hughes phenomenon, which can lead to a decrease in the accuracy of the classification models as stated in Section 1.A dimensionality reduction was therefore carried out in the dataset using the principal components analysis (PCA).More details about PCA can be found in Johnson and Wichern (2002).The PCA parameters were fitted over the training samples, but, before that, the training samples were balanced, since unbalanced classes could influence the eigenvectors of the PCA.Each of the 14 LULC classes was balanced through replication.In the sequence, the PCA transformation was performed over the dataset, verifying that 99.9% of the information was presented in 14 principal components, these components were selected as significant input features.
For the SVM classification it was decided to use the RBF kernel, since it is the most suitable option for remote sensing classification.Hence, knowing that SVM with this kernel option assumes all features are centered around 0 and have variance in the same order, the 14 principal components were standardized by removing the mean and scaling to unit variance before the training step.In the sequence, the SVM training was performed using the 14 significant principal components as input features.The classification model having been defined, the classification was carried out and the LULC maps produced for Loukia and Dioni images.The SVC class from the scikit-learn package was used to perform the multiclass classification with SVM.As stated, the RBF kernel function was used, the class_weight was defined as balanced and other parameters were set as default.
Since it was desirable to compare the RF algorithm with the other classifiers and verify whether the Hughes phenomenon would disturb the accuracy of the models, two classification models were defined with the RF algorithm.In the first classification model the 176 surface reflectance bands were used and in the second the input features were 14 significant principal components.The classification was performed with the 176 features since it was known that the RF has low computational cost, so using all available features would not be a time problem, as it is with SVM.The two classification models defined, the classifications were carried out and the LULC maps produced for Loukia and Dioni images.For this work the RandomForestClassifier class from the scikit-learn package was used.The number of trees was set equal to 200 and the depth equal to 20.These values were defined after a tuning process through which it was realized that the overall accuracy would not improve significantly, thereby increasing these parameter values.

Assessment of the accuracy of classification models
Metrics computed from the confusion matrix were used for the evaluation of the classification models.15% of samples from the Dioni and Loukia ground truths were applied in the construction of the confusion matrix.These samples were randomly selected before definition of the classification models, so these samples were not used to fit the classification models.The metrics applied for the classifications accuracy assessment were: Overall Accuracy (OA), Producer's Accuracy (PA), User's Accuracy (UA), Kappa coefficient (K) and F1-score.The OA is a metric for general evaluation, which is computed through the sum of correctly classified pixels divided by the total of pixels in the validation set.The PA, UA and F1-score metrics were used to measure the quality of the classification for each LULC class.
The PA indicates the omission of pixels for a class, the UA expresses the commission of pixels and the F1-score is the harmonic mean between the PA and UA, so it represents how well the model classifies each class.The K coefficient as the OA is a measure of general evaluation, although, it can be considered a more representative metric since wrongly classified samples are considered in its computation.Further details regarding the metrics used in the classification accuracy assessment can be found in Congalton (1991).Aiming to compare the classifiers, the classification model fit time was also measured for each of the 4 classifications performed.All experiments were carried out on a machine with an Intel Core i7-5500, 2.4 GHz clock and 8Gb RAM.

RESULTS AND DISCUSSION
Figure 2 shows the LULC maps for Dioni (left) and Loukia (right) images produced through 4 different classification models as described in Section 3.2.The metrics used for the accuracy assessment of the classification models, as well as the time to fit them, are shown in Table 1.
The most noticeable issue in the LULC maps appeared through a visual analysis of the Dioni LULC map built with the SAM algorithm, Figure 2   With regard to the metrics generated for accuracy assessment of the classification models, it can be seen that the two models defined with the RF algorithm presented the best performances, with a K coefficient equal to 0.91 and 0.89 for the principal components and the 176 surface reflectance bands as input features respectively.Despite the almost insignificant improvement, a slightly better performance was achieved using principal components as features.This can be related to the Hughes phenomenon, since the model complexity is higher for the 176 features than for the 14 features.The RF classification model performances were followed by the model fitted with SVM which had a K coefficient equal to 0. Regarding the F1-score for SAM's classification model, the Dense Urban Fabric, Fruit Trees, Olive Groves and Sparsely Vegetated Areas classes were under 0.3 showing that the boundary limits in the features space were poorly defined for these classes.In general, the RFpca classification model performed better than RFbands and SVM classification models, since the OA for the former was slightly higher than for the latter.Although, looking for the F1-score, it can be seen that the Broad Leaved Forest and Coniferous Forest classes had their boundary limits better defined in the feature space by the RFbands classification model, while the Mineral Extraction Sites class was better classified with SVM, since its F1-score was higher.
Considering the computational time, the RF algorithm again outperformed the other algorithms, the RFpca was the fastest taking just 31.2s to fit.This was followed by the RFbands with 43.7s.The SAM and SVM were, respectively, almost twice and four times slower than RFpca.However, taking into account just the time to fit the classification models, if the time to perform the PCA is considered, the RFpca and SVM would perform slower than the others.

CONCLUSION
In this work, the evaluation of four classification models defined with three well-known classification algorithms, SAM, SVM and RF, was assessed for the LULC classification using the public HyRANK dataset.The results show that the SVM and RF algorithms, outperformed by far the SAM in terms of accuracy of the classification models and that the RF is slightly better than the SVM in the same respect.From the results it can also be seen that the of use principal components as features provided a slight improvement in the accuracy of the RF and an improvement of 28% in the time spent to fit the classification model.
Although the results obtained for the machine learning algorithms, RF and SVM, are considered excellent, with a K coefficient higher than 0.8, there is still room for improvement.One possibility is to perform the selection of bands that best contribute to the delimitation of the classes in the feature space so that the importance of each attribute generated by the RF algorithm can be used.Another possibility for future work is to use deep NN in this dataset, since it can improve the results because of its capacity for engineering new features.The results presented in this paper can be used as a baseline for future research in hyperspectral image classification on HyRANK dataset.

Figure 1 .
Figure 1.(a) Dioni image composition B25G90R160 (b) Dioni ground truth (c) Loukia image composition B25G90R160 (d) Loukia ground truth Figure 2. LULC classification for Dioni image through (a) SAM (b) SVM (c) RFbands (d) RFpca -LULC classification for Loukia image through (e) SAM (f) SVM (g) RFbands (h) RFpc After a preprocessing step, these images were provided with 176 surface reflectance bands.The ground truths from two (Loukia and Dioni) of five images were provided.14LULCclasses were annotated in the ground truth following the CORINE Land Cover principles.These classes are: Dense Urban Fabric, Mineral Extraction Sites, Non-Irrigated Arable Land, Fruit Trees, Olive Groves, Broad-leaved Forest, Coniferous Forest, Mixed Forest, Dense Sclerophyllous Vegetation, Sparse Sclerophyllous Vegetation, Sparsely Vegetated Areas, Rocks and Sand, Water and Coastal Water.The Loukia and Dioni images, as well as the respective ground truths, are depicted in Figure1.

Table 1 -
Evaluation metrics used for evaluation of the classification approaches, Overall Accuracy (OA), Kappa index (K), Model fit time, Producer's Accuracy (PA), User's Accuracy (UA) e F1-score overestimated, since the area estimated for this class is larger than its real area, as can be seen in Figure1(a).However, this is not an unexpected result considering that this class embraces different land cover types (i.e.vegetation and bare soil) which can diminish its classification accuracy, since SAM requires pure reference spectra to work well.Another noticeable issue in this LULC map is the confusion of the Water class with Coastal Water, which can be explained by the similar spectral signatures of these classes.
For the SAM, the UA and PA presented lower values achieving averages of 0.58 and 0.45, respectively.The exception, in this case, was the Water class which reached a PA of 0.97 and UA of 0.89, these values are close to the UA and PA achieved with the machine learning algorithms, which can be explained by two facts.Firstly, the spectral signature for the Water class is very similar to the Coastal Water class, but extremely different to the other classes.Secondly, it could be linked to the higher likelihood of getting a pure reference spectrum for the Water, which is essential to SAM.