EXPLORING CAPABILITIES OF SENTINEL-2 FOR VEGETATION MAPPING USING RANDOM FOREST

Accurate vegetation mapping is essential for monitoring crop and sustainable agricultural practice. This study aims to explore the capabilities of Sentinel-2 data over Landsat-8 Operational Land Imager (OLI) data for vegetation mapping. Two combination of Sentinel-2 dataset have been considered, first combination is 4-band dataset at 10m resolution which consists of NIR, R, G and B bands, while second combination is generated by stacking 4 bands having 10m resolution along with other six sharpened bands using Gram-Schmidt algorithm. For Landsat-8 OLI dataset, six multispectral bands have been pan-sharpened to have a spatial resolution of 15m using Gram-Schmidt algorithm. Random Forest (RF) and Maximum Likelihood classifier (MLC) have been selected for classification of images. It is found that, overall accuracy achieved by RF for 4-band, 10-band dataset of Sentinel-2 and Landsat-8 OLI are 88.38%, 90.05% and 86.68% respectively. While, MLC give an overall accuracy of 85.12%, 87.14% and 83.56% for 4-band, 10band Sentinel and Landsat-8 OLI respectively. Results shown that 10-band Sentinel2 dataset gives highest accuracy and shows a rise of 3.37% for RF and 3.58% for MLC compared to Landsat-8 OLI. However, all the classes show significant improvement in accuracy but a major rise in accuracy is observed for Sugarcane, Wheat and Fodder for Sentinel 10-band imagery. This study substantiates the fact that Sentinel-2 data can be utilized for mapping of vegetation with a good degree of accuracy when compared to Landsat-8 OLI specifically when objective is to map a sub class of vegetation.


INTRODUCTION
Mapping of vegetation with precision is a key task for managing natural resources as well as plays an important role in various protection and restoration programs.Vegetation mapping provides valuable information in order to understand the relation between natural and man-made environment.Vegetation/cropland mapping methodologies acts as a prerequisite for various development schemes run by government.The Agricultural Monitoring Community of Practice of the Group on Earth Observations (GEO), with its Integrated Global Observing Strategy (IGOL), also calls for an operational system in order to monitor the global agriculture using remote sensing (Belgiu, 2018).In literature, there are many studies for Land Use Land Cover classification as well some of them are dedicated to vegetation mapping used various supervised and unsupervised algorithms in pixel based or object based frameworks (Belgiu, 2018;Chuang, 2016;Nay, 2018;Colkesen, 2017;Li, 2014).A meta-analysis on supervised pixel based techniques for land cover classification performed by Khatami et.al (2016) reveals that inclusion of ancillary data, texture, multi-angle and temporal images gives significant improvement in accuracy of classification.However, the author claims that inclusion of texture gives the highest rise in the accuracy of classification.
Recently launched Sentinel satellite is receiving much attention due to its fine spatial resolution, fast revisit time, global coverage, last but not least free availability makes it a great choice for various applications in the field of remote sensing (Wang, 2016).Multispectral properties as well as free availability of Sentinel-2 makes it attractive for researcher and scientists for various applications.Sentinel data has opened a new door of possibilities for precision agriculture mapping with the introduction of three red edge channels.This study aims to explore the capabilities of Sentinel-2 data vis-a-vis Landsat-8 OLI data for vegetation mapping.One of the benefit of Sentinel-2 is its 13 spectral bands having different spectral properties and resolution at three levels (10 m, 20 m, 60 m).Sentinel acquires data globally at every five days and this data has been used for a variety of applications in remote sensing.Sentinel-2 has great potential for spatially-detailed retrievals of phenology (Vrieling, 2018).Belgiu (2018) used time-weighted dynamic time warping (TWDTW) method for crop land mapping on time series Sentinel-2 data by adopting pixel based and object based classification by considering three different study areas and concludes that object-based classification give better results than pixel-based approach.Another study for tea crop mapping has been carried by (Chuang, 2016) using WorldView-2 imagery and machine learning techniques (RF and SVM) and results show that highest overall accuracy is achieved in OBIA.Nay et al. (2018) applied machine learning techniques for forecasting vegetation health by using (MODIS) data sets.For the specific applications like timely monitoring of highly environment Sentinel-2 and Sentinel-3 data have been successfully fused to create daily Sentinel-2 image (Wang, 2018).A combination of Landsat-8 OLI and Sentinel-2 MSI data have been utilized in order to accessed the burn severity in a large wildfire (Quintano, 2018).
The objective of this paper is to explore the potential of Sentinel-2 for vegetation mapping in comparison to Landsat-8 OLI data.This paper is structured as follows: Section 2 presents the study areas and the data; Section 3 describe the classifier; Section 4 is dedicated to the results analysis; Section 5 highlights the main findings of the study as conclusion.

STUDY AREA AND DATA
In this work, Roorkee city and its surrounding regions, located in Haridwar district of Uttarakhand state of India, has been selected as the study area.Agriculture plays an important role in the economy of this area.It is primarily dominated by cropland and consist a variety of Land-Use Land-Cover Classes (LULC).The study area is mainly composed of nine prominent classes: Wheat, Fodder, Trees, Fallow Land, Sugarcane, Water, Built-up, Sandy Area and Other Crops.This region is mainly dominated by wheat and sugarcane crop, class named as 'other crops' signifies the rest of the crops in the study area like double crops (wheat and mustard, wheat and trees) and vegetables fields like cabbage, cauliflower, radish etc. Study area covers about 61.84 km 2 bound within the Latitudes of 29 ○ 51' 20.54''N and 29 ○ 55' 3.31''N and longitudes of 77 ○ 53' 36.82''E to 77 ○ 59' 6.13'' E. Figure 1 shows false colour composite of study area using Sentinel-2 imagery, where band-8 (NIR) is projected in red colour, while Band-4 (Red) is in green and band-3 (Greens) in blue.(Colkesen, 2017) using the band selection scheme (Selva, 2015).This band selection scheme selects a band from high resolution (fine resolution) band set for each low resolution band.Band determination is done on the basis of largest correlation with the visited low resolution (Coarse resolution) band (Wang et al. 2016).For OLI dataset six pan-sharpened bands by adopting Gram-Schmidt algorithm are used in order to improve the spatial resolution from 30m to 15m.

RANDOM FOREST CLASSIFIER
Random Forest (RF) (Breiman, 2001) is one of the most popular ensemble classifier which produces excellent results for various remote sensing applications (Belgiu, 2018;Pal, 2005).An ensemble classifier consists of multiple classifiers, usually producing better classification results when compared to an individual classifier that is used to build ensemble (Dietterich, 2002).RF algorithm is a supervised machine learning method used to construct a number of base model/learner and combine the response of these models by a voting scheme for final prediction.Here, for the construction of RF classifier, a decision tree has been used as a base model or learner.RF classifier construct an ensemble using the same concept as used in bagging (Breiman, 1996).Bagging creates a new training datasets by adopting randomly with replacement policy.This policy states that the training sample currently chosen will not be removed for the next draw.The result is some training sample may be chosen more than once while others will not be chosen at all.This strategy helps to improve the accuracy of classification by reducing the variance.However, during construction of a tree, RF algorithm searches for only a random subset of the input features at each splitting node and the tree is allowed to grow fully without pruning (Chan, 2008).A majority voting scheme is used in order to make the final label of class for the classification.Other classifier used in this study is Maximum Likelihood classifier (MLC), one of the most popular algorithm for classification and has been used as a benchmark Both the classification algorithms are implemented in 'R-framework' using package superclass.The F1-measure has been used for assessment of accuracy and represents the harmonic mean between precision (p) and recall (r) for each class i and is defined as (1)

RESULTS AND ANALYSIS
In this study, Sentinel-2 imagery is classified using 4-band and 10-band data and classification performance is compared with pan-sharpened Landsat-8 OLI imagery using RF and MLC classifier.The training and testing samples were collected by adopting stratified random sampling approach and it should be noted that training and testing data is mutually exclusive.Field visit has been carried out in order to collect the ground truth samples for training and testing purpose.From selected stratified random samples, 70% of the samples have been used for training while 30% have been used for testing or validation purpose.
In order to evaluate the classification performances, two confusion matrixes based statistical measures i.e. overall accuracies and F1-measure is used for class specific performance (Eq.1).It can be observed visually by comparing Landsat-8 OLI image classified by MLC with Landsat-8 OLI classified by RF (Figure 2) shows that maximum crop land is misclassified as other crops.This area is dominated by two major crop i.e. wheat and sugarcane.Here for Landsat data MLC is not able to distinguish between different types of crop in the selected region while RF improves the overall accuracy as well as shows remarkable improvement in sugarcane and wheat crop.Although for Sentinel 4-band data overall accuracy improves by both the classifies but MLC shows the similar types of results like for Landsat-8 OLI, again sugarcane and wheat are misclassified as other crops.On the other hand, for Sentinel 10-band image both OLI is 88.38%, 90.05% and 86.68% respectively (Table 1).While, MLC produces an overall accuracy of 85.12%, 87.14% and 83.56% for 4-band, 10-band and six pan-sharpened band of Landsat-8 OLI respectively.When Sentinel 10-band dataset is compared to 4-band, a rise of (+1.67%) and (+2.02%) in accuracy is observed for RF and MLC classifiers respectively.However, when compared to Landsat-8 data, Sentinel-2 dataset with 10-band shows a significant increment of (+3 The study area consists of a number of land use land cover classes.A total nine different classes have been identified.Since the emphasis is to differentiate between different crop types, five sub-classes of agriculture have been identified, such as, wheat, fodder, sugarcane, trees, other crops.For class specific performance, a significant accuracy rise is observed for Sentinel 10-band dataset (Figure 4) when compared to Landsat-8 OLI (Figure 2) using RF, for wheat (+4.66%) and Fodder (+7.17%), trees (+1.37%), sugarcane (+3.24%), other crops (+5.74%)Table 2. Similar trends are observed for MLC when compared to Sentinel-2 10-band to pan-sharpened Landsat OLI data, a rise in accuracy for wheat (+5.64%),Fodder (+4.54%), trees (+3.38%), sugarcane (+8.01%), others crop (+3.19%)Table 3. Further, it is observed that all the classes show an improvement in terms of classification accuracy but maximum rise is observed for fodder and sugarcane by RF and MLC respectively.Yet, only a minor improvement is observed for classes like built up, sandy area etc.However, as compared to Sentinel 4-band data (Figure 3) with Landsat-8 OLI (Figure 2) a significant improvement has been observed in fodder (+4.84%) and sugarcane (+3.71%) by RF and MLC respectively.However, Landsat-8 OLI classified image appear homogeneous visually but as the class specific accuracy concerned fine details are missing.In case of MLC classifier applied on Landsat-8 OLI data, Sugarcane and Wheat have been misclassified and classified as others crop.While RF improves the results (+3.12%) in terms of overall accuracy and an increase of (+4.28%) for sugarcane and (+4.66%) for wheat.A study performed by Colkesen, (2017) shows that inclusion of six multispectral band of Sentinel-2 gives a significant rise in classification accuracy approximately (+6%) over pan-sharpened Landsat by using canonical correlation forest method.Similar trend trends are observed except that the rise in overall accuracy rise is low and both Sentinel dataset 4-band and 10-band shown better performance when compared to Landsat-8 OLI dataset.This observation is confirmed by both the classifiers.

CONCLUSIONS
Thus, it may be stated that based on the findings of this study, that by using a 10-band Sentinel-2 dataset, best results are obtained by using RF classifier.Further, the study shows that Sentinel-2 data can be utilized to for mapping of sub classes of vegetation, especially crops, with fair overall accuracy as compared to landsat-8 OLI.This study concludes that Sentinel-2 has great potential for vegetation mapping and achieve excellent overall accuracy with sharpened bands.

Figure 1 .
Figure 1.False colour composite of Sentinel-2 image Here, Sentinel-2 multispectral image of 19 February 2018 and Landsat-8 OLI image of 20 February 2018 have been used.In this study, Sentinel-2 bands are considered as two datasets i.e. a 4-band dataset and another is 10-band dataset.The 4-band dataset is formed by using 10 m resolution bands (Red, Green, Blue, NIR) of Sentinel-2 imagery while the 10-band dataset consists of six bands i.e.Red Edge (band-5, 6, 7), Narrow NIR, SWIR (band-11, 12) at 20 m resolution.In order to form 10-band dataset of Sentinel-2, six bands at 20 m resolution are sharpened by adopting Gram-Schmidt algorithm(Colkesen, 2017) using the band selection scheme(Selva, 2015).This band selection scheme selects a band from high resolution (fine resolution) band set for each low resolution band.Band determination is done on the basis of largest correlation with the visited low resolution (Coarse resolution) band(Wang et al. 2016).For OLI dataset six the classifier work well and shown improvement for all subclasses of agriculture.(a) (b) Figure 2. (a) Landsat-8 OLI image classified by MLC (b) Landsat-8 OLI image classified by RF Overall accuracy achieved by RF classifier for 4-band, 10-band dataset of Sentinel-2 and six pan-sharpened bands of Landsat-8 (a) Sentinel 10 band imagery classified by MLC (b) Sentinel 10 band imagery classified by RF

Table 2 .
F1-measure for Landsat-8 OLI, Sentinel 4 band and Sentinel 10 band using RF algorithm and change in percentage accuracy with respect to (w.r.t.) Landsat-8 OLI.

Table 3 .
F1-measure for Landsat-8 OLI, Sentinel 4 band and Sentinel 10 band using MLC algorithm and change in percentage accuracy with respect to (w.r.t.) Landsat-8 OLI.