AIRBORNE HYPERSPECTRAL REMOTE SENSING FOR IDENTIFICATION GRASSLAND VEGETATION

In our study we classified grassland vegetation types of an alkali landscape (Eastern Hungary), using different image classification methods for hyperspectral data. Our aim was to test the applicability of hyperspectral data in this complex system using various image classification methods. To reach the highest classification accuracy, we compared the performance of traditional image classifiers, machine learning algorithm, feature extraction (MNF-transformation) and various sizes of training dataset. Hyperspectral images were acquired by an AISA EAGLE II hyperspectral sensor of 128 contiguous bands (400–1000 nm), a spectral sampling of 5 nm bandwidth and a ground pixel size of 1 m. We used twenty vegetation classes which were compiled based on the characteristic dominant species, canopy height, and total vegetation cover. Image classification was applied to the original and MNF (minimum noise fraction) transformed dataset using various training sample sizes between 10 and 30 pixels. In the case of the original bands, both SVM and RF classifiers provided high accuracy for almost all classes irrespectively of the number of the training pixels. We found that SVM and RF produced the best accuracy with the first nine MNF transformed bands. Our results suggest that in complex open landscapes, application of SVM can be a feasible solution, as this method provides higher accuracies compared to RF and MLC. SVM was not sensitive for the size of the training samples, which makes it an adequate tool for cases when the available number of training pixels are limited for some classes.


INTRODUCTION
Hyperspectral data is widely applied for monitoring of the environment (Thenkabail, 2011;Adam et al., 2010). Airborne hyperspectral imagery can provide multiple narrow and contiguous spectral bands of less than 10 nm with a high geometric resolution (0.5-1 m). Hyperspectral imagery was proven to be a suitable method for a detailed vegetation classification based on the dominant or subdominant genera or species (Huang and Asner, 2009;Mirik et al., 2013). For processing hyperspectral data it is important to reduce the high dimensionality and inherent multi-collinearity of datasets. Several advanced feature extraction techniques (e.g., MNF, PCA) can be applied for this purpose (Plaza et al., 2009, Green et al., 1988Landgrebe, 2003).
Alkali habitats of the Pannonian Biogeographical region are one of the most extended semi-natural open landscapes of the European Union. Given their complexity, open alkali landscapes provide an excellent possibility for testing the potential of remote sensing in mapping extended areas with a high spatial complexity (Alexander et al., 2015;Zlinszky et al., 2015). Alkali landscapes are characterized by a fine-scale mosaic of different vegetation types. The most typical ones are open alkali grassland vegetation, dry alkali grasslands, tallgrass meadows, and sedge vegetation together with alkali and non-alkali marshes (Deák et al., 2014aKelemen et al., 2013Kelemen et al., , 2015. Alkali landscapes are characterised by a fine-scale mosaic of various vegetation types with similar characteristics (biomass, vegetation structure and environmental conditions; Deák et al., 2014b, Valkó et al., 2014, thus their classification using remote sensing data is often challenging. Our aim was to test the applicability of hyperspectral data in this complex system using various image classification methods. To reach the highest classification accuracy, we compared the performance of traditional image classifiers, machine learning algorithm, feature extraction (MNF-transformation) and various sizes of training dataset.

Study area
Our study area is located in Pentezug-puszta (N 47°34' E 21°6 '). Pentezug -which is an integral part of the Hortobágy National Park (East-Hungary) -has the area of 23.49 hectares ( Figure 1). It harbours a diverse landscape, which represents most of the typical alkali vegetation types: alkali steppes, open alkali grasslands, alkali meadows and marshes. In our study we excluded the roads, buildings, and woody vegetation.

Airborne and Field Data Collection
For collecting hyperspectral data we used an Aisa EAGLE II type hyperspectral sensor with OxTS RT 3003 GPS/INS system. The sensor provided images with 128 contiguous bands (400-1000 nm), a spectral sampling of 5 nm bandwidth, and a ground pixel size of 1 m. The flight took place in good weather conditions from 09:11 to 09:53 GMT, on 7 th July 2013. Reference data was collected from all characteristic and representative vegetation types of the study site in a one-week interval after the flight. Given the fact that in alkali landscapes both the total area, distribution and patch size of different vegetation types generally show a high variability, before the field campaign, we carried out a preliminary survey. During the field survey we enlisted the typical vegetation types and estimated their average patch size and proportion in the study area. During the field survey we surveyed 98 homogenous vegetation patches using a differential GPS. For the calculations based on their relative cover we categorised the species as dominant (>50%) and subdominant (10-50%).

Vegetation Classes
Our aim was to classify the herbaceous vegetation (grasslands and wetlands) as these vegetation types are the best represented (~ 99.5% of the total area) in our study site. Furthermore these vegetation types have the highest importance for nature conservation and site managers in our study region. We built up twenty vegetation classes (Table 1) for which we used the dominant species, the canopy height and the total vegetation cover of different vegetation types. When it was necessary we used the subdominant species as well for creating distinct classes.  (Burai et al. 2015). We aimed to have a ratio of 50-50% of field samples and validation dataset. Pixels were selected randomly from the field samples. The same validation dataset was used for all analyses.

Image Classification
ENVI/IDL 5.0 (Exelis, Inc., Boulder, CO, USA) and EnMap Box (Rabe et al., 2013) softwares were used to classify hyperspectral images. We tested the efficiency of three supervised classification methods (MLC, RF and SVM) which are frequently used for vegetation mapping (Mirik et al. 2013, Huang & Asner, 2009, Lawrence et al., 2006. We did not specified the thresholds for the probability in case of the MLC classification. In case of the RF classification 100 trees were computed using Gini coefficient for the node impurity function; the minimum number of samples in a node was 1. SVM classification was performed with Gaussian Radial Basis Function kernel. SVM parameters (C = 100 and γ = 0.11) were selected by fivefold cross validation.

Image Classification Using Original Spectral Bands
We used SVM and RF with the original dataset for supervised classification. We did not apply MLC on original bands, as it would need at least n+1 training data per class. Image classification was repeated five times with random sampling methods. Given the limited availability of training pixels (from the classes poorly represented in the field) the maximum size of randomly selected training pixels was 30. The overall accuracies provided by SVM and RF classifiers increased slightly with increasing number of training pixels (Table 2 and Figure 2). We found that the standard errors of overall accuracies were similarly low irrespectively from the number of training pixels.  (128)

Image Classification Using MNF-Transformed Bands
Application of the MNF 1-9 transformed bands provided the highest overall accuracies for both SVM and RF classifiers (SVM: 82.06%; RF: 79.14%); additional features over the first 9 MNF bands did not significantly improve the accuracy. We found that when using MLC additional features over the 1-5 MNF bands could not improve the accuracy. Even though each classifier provided considerably high overall accuracies with 30 random training pixels (SVM: 82.06%; RF: 79.14%; MLC: 80.78%) (Figure 3 and Figure 4), we found that only the SVM and RF classifiers had a good performance even with a low number of training pixels. SVM provided an overall accuracy of 79.57% and RF provided 76.55% when using 10 random training pixels, while the accuracy of the MLC classifier decreased considerably (Table 3). MLC classifier using less than 20 training pixels provided low classification accuracies (with high standard error).    Table 3. Producer's accuracy (%) of the classes using SVM, MLC and RF classifiers with 9 MNF-transformed bands and 30 (1) and 10 (2) random training pixels.
We found that ELY, ALO, BEC, CAR, GLY, SAL, FMM, ARA and MUD classes were classified with a high accuracy by all classifiers, when using 30 random training pixels. We found that the tested classifiers provided considerably different accuracies for the CYN, FAR, ACI and TYP classes.

DISCUSSION
We tested the applicability of SVM and RF classifiers with original bands on reduced training samples between 10-30 pixels. Even though both SVM and RF provided a similarly high accuracy for most classes, we detected considerable difference between their performance when using 30 random training pixels. In this case the SVM had a considerably higher overall performance (82.41%) compared to the RF (74.66%). However classification with original bands resulted in a high Producer's accuracy for most classes, in certain cases (PHO, TYP and BOL) we detected low accuracies. In these cases most of the pixels were assigned to another class with similar attributes (ratio of open soil surface or the amount of biomass). In order to select the optimal number of transformed features, we used SVM classification on 2-15 MNF-transformed bands. We found that the SVM algorithm with the first nine MNFtransformed bands produced the best accuracy. Further features did not increase the classification accuracy considerably. Classification accuracies of SVM and RF were considerably higher compared to the MLC classification when using smaller training datasets due to the instability of the estimated covariance matrix of MLC. Based on our results, in complex landscapes application of SVM can be a feasible solution, as it provided the highest accuracies compared to the RF and MLC. SVM was not sensitive for the size of the training samples, which makes it an adequate tool for cases when the available number of training pixels are limited for some classes.