A COMPARATIVE STUDY OF ADVANCED LAND USE/LAND COVER CLASSIFICATION ALGORITHMS USING SENTINEL – 2 DATA

: Land Use/ Land Cover (LU/LC) is a major driving phenomenon of distributed ecosystems and its functioning. Interpretation of remote sensor data acquired from satellites requires enhancement through classification in order to attain better results. Classification of satellite products provides detailed information about the existing landscape that can also be analyzed on temporal basis. Image processing techniques acts as a platform for analysis of raw data using supervised and unsupervised classification algorithms. Classification comprises two broad ranges in which, the analyst specifies the classes by defining the training sites called supervised classification where as automatically clustering of pixels to the defined number of classes namely the unsupervised classification. This study attempts to perform the LU/LC classification for Paonta Sahib region of Himachal Pradesh which is a major industrial belt. The data obtained from Sentinel 2A, from which the stacked bands of 10m resolution are only used. Various classification algorithms such as Minimum Distance, Maximum Likelihood, Parallelepiped and Support Vector Machine (SVM) of supervised classifiers and ISO Data, K-Means of unsupervised classifiers are applied. Using the applied classification results, accuracy assessment is estimated and compared. Of these applied methods, the classification method, maximum likelihood provides highest accuracy and is considered to be the best for LU/LC classification using Sentinel-2A data.


INTRODUCTION 1.1 Land Use / Land Cover in Remote Sensing
Mapping of LU/LC features has become the most applicable factor in various fields of geospatial technology (Melesse et al., 2007).Analysis of LU/LC in change detection purposes shows the technological capacity and improvement over years (Madhu et al., 2017).Using the existing land cover data, further studies on construction of urban patterns, industries, transport lines, conservation of water and other natural resources shall be monitored and maintained (Amna et al., 2015;Rahaman et al., 2017;Nitheshnirmal et al., 2017).

Classification of Land Use/ Land Cover information
Remote Sensing data are raw imagery that could be interpreted only through classification in order to obtain relevant information.The information extracted provides the existing land cover and land use patterns (Topaloğlu et al., 2016).Classification of raw satellite imagery through image processing techniques provides accurate details about the landscape.Especially, modelling environmental issues using classification algorithms is used at a wider range (Al-Ahmadi et al., 2008).There has been numerous studies conducted on the classification accuracies of popularly available Landsat images (TM, ETM+, OLI) with a spatial resolution of 30 m (Congalton, 1991;Huang et al., 2010;Rwanga and Ndambuki, 2017).

* Corresponding author
In recent times, Sentinel-2A provides data with higher spatial resolution than Landsat images which made researchers around the globe to opt Sentinel-2A data for their land use/land cover classification in various studies such as wetland monitoring (Kaplan and Avdan, 2017), crop and tree species classification (Immitzer et al., 2016), urban sprawl (Lefebvre et al., 2016), urban green space analysis (Kopecká et al., 2017) and many other studies where Sentinel-2A data is used to generate various thematic layers.Therefore, it is essential to check which among many classification algorithm best suits for the land use/land cover classification of Sentinel-2A data.There has been studies comparing the classification accuracies of Landsat-8 and Sentinel-2A with SVM and MLC classification algorithms (Topaloğlu et al., 2016;Sekertekin et al., 2017).The abovesaid studies concluded that Sentinel-2A have more accuracy than Landsat-8 by comparing both datasets with SVM and MLC algorithms, but there has been no detailed study on other supervised and unsupervised classification algorithms such as parallelepiped, minimum distance, ISO Data and K-Means for Sentinel-2A data.Therefore, this study is conducted to fill this research gap which aims to classify the Sentinel 2A data using the ENVI classification algorithms and to compare their respective classification accuracies to conclude which classification algorithm is best suited for Sentinel-2A data.

Study area description
This study aims to perform the LU/LC classification for Paonta Sahib region of Himachal Pradesh.It lies between 30.4453⁰N latitude and 77.6021⁰ E longitude.Paonta Sahib is a sub division under the Sirmaur district.The study area is one among the lowest administrative unit but considered useful to the planners who formulate micro level developmental plans.Hence it was termed as the community development block by the Census of India in the year 1991.The total area covered is 762.91 Sq.Km and lies at the foothills of the Siwaliks.The dense mountainous ranges are found to the north of river Yamuna which runs across Paonta Sahib.Prospective industries are present along the banks of the river.The north western terrain is dotted with dense vegetation and reserved forests.The Paonta Sahib region is heavily concentrated with linear and nucleated settlements along the roads and river.Patches of cultivated lands are contributed to agriculture.Since the study area exhibits a diverse landuse/landcover pattern, an attempt has been made to identify it using various classification methods.2.

Sentinel 2 data description
Table 2. Sentinel 2A band specifications

Method opted
The overall workflow in Figure 3, shows the processing of raw remote sensor data into classified data using various classification algorithms.The steps involved denote pre-Figure 3. Flowchart

Resampling
In general, the original remote sensor data consists of spectral bands with varied spatial resolution.The spatial resolution of each band determines the minimal value of the GSD (Ground Sampling Distance) which in turn increases the accuracy.The GSD is the smallest distance that the sensor covers on the ground.Minimum the distance maximum will be the accuracy of the data.Since Sentinel 2A data consists of 13 bands with varied spatial resolution, resampling is done in order to match similar resolution for all the MSI bands.Sentinel 2A band resolution is mentioned in the above Table 2. From the data, bands 1, 9, 10 (of 60 m resolution) and bands 5, 6, 7, 8A, 11, 12 (of 20 m resolution) are resampled to band 2, 3, and 4 (of 10 m resolution).

Layer Stacking
Followed by the process of resampling where the resolution of all the 13 bands was matched to 10 m resolution, layer stacking is done.Layer stacking in image processing is the process of combining image derivatives with same spatial resolution.Also, the spatial resolution of 10 m denotes the GSD in ground which is precise and suitable to figure out exact land features present apparently.This provides an accurate output imagery which can be used for further classification.

Masking
The layer stacked dataset is a raster and it was masked using a polygon feature.The polygon feature of Paonta Sahib was used to extract the raster dataset which is used for classification.

Classification algorithms
The following were the various types of algorithms that were used to classify the raster data.

I.
Supervised Classification Remote Sensing data used for traditional methods of image classification are used in a wide range of applications now a day (Perumal et al., 2010).Collection of training data called 'representative training samples' for each classes from the remote sensor imagery (false colour composite) is used to acquire results through supervised classification algorithms (Liu).The training samples are the areas of known identity that are also termed as the Region Of Interest (ROI). Maximum Likelihood Classifierin which highest probability is assigned to the vector of a class among all the other probabilities of vectors assigned to numerous class.The pixels are assigned to each class based on the threshold value given by the user.If the class probability value is lower than the threshold value set by the user, then the pixels are unclassified (Ahmad et al., 2012). Minimum Distance Classifierin which classes that are close to each other are grouped as a unique prototype.It is used to calculate the mean vector for each class and assign pixel to the closest class.(Bhattacharya).


Parallelepiped Classifierwhich means a standard threshold, is given in order to check if a particular pixel belongs to the respective class or not.The parallelepiped is based upon the standard deviation (a class limit dimension) threshold from the mean of each classes.If the threshold range is between defined low and high, the pixels are classified.Areas that do not fall under the specified threshold range are determined to be unclassified. SVM -which in general applied for a complex dataset where the input numeric attributes are normalized and pre-processed before classifying.It is a nonparametric statistical learning method (Ustuner et al., 2015).Various kernel types such as, linear, polynomial, sigmoid, radial basis function can be set.The kernel type used in this study was the linear SVM Classifier.

II. Unsupervised classification
The unsupervised classification does not require the training samples instead the number of required classes and number of iterations need to be specified.
 ISO Data -ISO Data unsupervised classification clusters the pixels that are evenly distributed and groups the remaining pixels based upon defined threshold.The number of classes and iterations need to be given manually. K -Means -defines the clusters based on the centre pixel of the cluster or assigning each data point as a cluster.The homogenous pixels are clustered together as an object.

Accuracy Assessment
Using the applied classification results, accuracy assessment is estimated and compared.The accuracy assessment is determined by using the ground truth Region of Interest collected.

Classification Results
The above mentioned supervised and unsupervised classification was applied to the Sentinel 2A data, and the following result shown in Figure 4 was obtained.The data was classified into 7 classes namely, River, Forest, Urban, Mountain, Scrub land, Crop land and River associated sand.For defining the training sets in supervised classification, a maximum number of 30 samples were collected from each class.More the number of classes better would be the aggregation of pixels and hence, number of samples defined was maximum.The size of the samples collected was smaller ellipses and were collected in a distributed manner by covering all the areas the feature was present.
The classification results of Maximum Likelihood, Parallelepiped, SVM provided better results and the LU/LC features were well identified.The results showed maximum similarity with that of the original remote sensor imagery.While the classified images also has misclassified pixels especially in Minimum Distance, ISO data and K -Means due to similarity in the pixels of two varied classes.It is seen that, the shadow region of the hilly mountainous terrain were misclassified as river feature, scrub land misclassified as crop land thus yielding fair accuracy.

Estimated accuracy
A confusion matrix displays the reference class and the classified data using which the overall accuracy and kappa coefficient is estimated and compared.The overall accuracy is calculated as a ratio by adding the total number of correctly classified sites to that of the total number of reference sites.This provides a percentage value which is the overall accuracy.The kappa coefficient is a value to evaluate the obtained overall accuracy.In general, kappa coefficient value ranges between -1 to 1 where, -1 denotes poor classification and 1 denotes good classification.The following list of table displays the overall accuracy and Kappa coefficient for each classifier.

CONCLUSION
The LU/LC mapping is the most essential layer in various spatial applications.Hence, it needs to be precise in order proceed with further studies.In spite of the conventional methods of manual digitization, classification provides precise results along with estimated accuracies.In this classification, the LU/LC features are well interpreted using Maximum Likelihood classifier which provided an overall accuracy of 89.30% followed by Parallelepiped classifier of 80.07%, SVM classifier of 75.58%,Minimum Distance classifier of 61.90%, ISO Data classifier of 30.03% and K -Means classifier of 22.09%.Thus, the Maximum Likelihood Classifier of Supervised classification is the best suited algorithm for classification of LU/LC features.

Figure 1 .
Figure 1.Paonta Sahib -Location map processing satellite data like resampling, layer stacking, masking, applying classification, accuracy assessment.Sentinel 2 was developed by ESA and consisted of two identical satellites Sentinel 2A and 2B.The multi spectral data of Sentinel 2A has a total number of 13 bands and was launched on June 23 rd 2015.The spatial resolution of Sentinel 2A varies as 10m, 20m and 60m for different bands.The Sentinel 2 carries a single Multi Spectral Instrument with 13 spectral channels whose band specifications are given in Table2.

Table 6 .
Minimum Distance