COMPARISON OF SUPERVISED CLASSIFICATION TECHNIQUES WITH ALOS PALSAR SENSOR FOR ROORKEE REGION OF UTTARAKHAND , INDIA

The Advanced Land Observing Satellite (ALOS) is developed by the Japanese Aerospace Exploration Agency (JAXA) which was launched in the year 2006 for the Earth observation and exploration purpose. The ALOS was carrying PRISM, AVNIR-2 and PALSAR sensors for this purpose. PALSAR is L-Band synthetic aperture radar (SAR). The PALSAR sensor is designed in a way that it can work in all weather conditions with a resolution of 10 meters. In this research work we have made an investigation on the accuracy obtained from the various supervised classification techniques. We have compared the accuracy obtained by classifying the ALOS PALSAR data of the Roorkee region of Uttarakhand, India. The training ROI’S (Region of Interest) are created manually with the assistance of ArcGIS Earth and for the testing purpose, we have used the Global positioning system (GPS) coordinates of the region. Supervised classification techniques included in this comparison are Parallelepiped classification (PC), Minimum distance classification (MDC), Mahalanobis distance classification (MaDC), Maximum likelihood classification (MLC), Spectral angle mapper (SAM), Spectral information divergence (SID) and Support vector machine (SVM). Later, through the post classification confusion matrix accuracy assessment test is performed and the corresponding value of the kappa coefficient is obtained. In the result, we have concluded MDC as best in term of overall accuracy with 82.3634% and MLC with a kappa value of 0.7591. Finally, a peculiar relationship is developed in between classification accuracy and kappa coefficient.


INTRODUCTION
Image classification into several categories or classes followed by an assessment of classification accuracy is complex as well as an interesting area of research in the field of remote sensing (RS) [1].The classification of the RS data is performed to identify a particular area of interest or class, followed by estimation of the accuracy of the classification [1].Researchers are continuously working to develop a quick and highly accurate algorithm for image classification.Image classification techniques are basically classified into two categories i.e. supervised and unsupervised classification, in the supervised classification ground truth observations regarding the class of data are known already to the user [2].In unsupervised algorithms, the user is not aware of the initial conditions of the data, field or the area of observation.The supervised classification techniques are categorized as parallelepiped [3] minimum distance [4], mahalanobis distance [5], , maximum likelihood [6], spectral angle mapper [7], spectral information divergence, [8], binary encoding [9], artificial neural network, [10] and support vector machine [11].Unsupervised classification techniques include Isodata, [12] and kmeans algorithm [13].H. Zhuang [14] developed an approach based on the combination of SAM and change vector analysis (CVA) for the unsupervised classification of the remote sensing satellite multispectral data.E. Hasan [15] developed a model based on the combination of the SAM and surface structure of the Central Eastern Egyptian desert to map gold mines sites.D.Renza [16] developed a new unsupervised change detection technique based upon the differencing of multi-temporal multi-spectral images.* The comparison of the images is obtained by applying SAM between each multi-temporal image and the reference spectrum.E. Zhang [17] improved the classification performance of the hyperspectral images using a sparse representation classifier based on SID.M. Khaleghi [18] used SID, SAM and principal component analysis (PCA) change detection technique to enhance the alteration of the Sarduiyeh area of Kerman province of Iran.M. Janati [19] used the fusion of SID, SAM and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data for the geographical mapping of the basement domain in Arid Regions.H.Jiao [20] used a combination of the SID and SAM supervised classification techniques to develop a novel algorithm for the computer-based spectral encoding.M.Musci [21] used binary encoding (BE) and SAM technique to compute the local binary pattern (LBP), variance, LBP + variance and LBP/variance for the texture classification of the remotely sensed satellite images.G.Sheng [22] used BE and SAM technique for the feature extraction of the coding based highresolution satellite scene classification.Y.Shao [23] compared SVM and Classification And Regression Trees (CART) algorithm on the MODIS time series dataset the accuracy of the classification obtained in this case is 91% and a kappa value of 0.79.D.T.Bui [24] compared the SVM, neural network, kernel logistic regression and logistic model tree for the spatial model prediction of the landslide areas.M.B.Kia [25] used the neural network model with Geographical Information System (GIS) for the flood detection across Johar river basin, Malaysia.V.F.R.Galiano [26] used the SVM and neural network classification technique to evaluate the effectiveness of the random forest classifier for the land cover classification.U. Maulik [27] developed a technique of land use/land cover remote sensing images classification through SVM.H.Hong et.al [28] investigated 445 landslide regions, they used SVM and random subspace model to create a new model named as 'random subspace-based support vector machines (RSSVM)' to make an investigation on various landslides activities.X.Wang [29] used the SVM technique to develop a novel WSL model adding human gaze annotation for the classification of the satellite images.In this research, we have investigated the relation in between the kappa coefficient and the classification accuracy when the classification is made through various supervised classification techniques.The comparison is made in between PC, MDC, MaDC, MLC, SAM, SVM and SID techniques.Finally for the estimation of the accuracy post classification confusion matrix and a kappa coefficient of the classified data is created.The confusion matrix consists of user accuracy [30], producer accuracy [30], omission error [30] and commission error [30] which are computed for the individual class of the classified image, finally, the overall accuracy of the classified data is computed.Kappa coefficient [31] is calculated for the various classified images which is the degree of agreement among various raters.Now from the image matrix shown in fig. 1  Overall accuracy is defined as the ratio of a total number of correctly classified pixels to the total number of pixels.
Omission error refers to the pixels that are unintentionally omitted from the correct class.
Commission error refers to the pixels of another class that are wrongly included in class under consideration.
User accuracy is the accuracy from the point of view of the 'map user' not from the 'map maker'.User accuracy is defined as follows Finally, the kappa value is generated to rate the accuracy of the classification.Kappa coefficient provides information regarding how well the classification of any dataset is performed [32].The statistical range of the kappa coefficient is [-1, 1].The mathematical expression for the kappa coefficient is defined as under Here ( )= relative observed agreement among various raters (this is identical to the accuracy).
( )= is the hypothetical probability of the chance agreement.Now for the categories 'k', number of the elements 'N', the number of the time rater ' ', and for the predicted category k then the hypothetical probability of the chance agreement is defined as Following a specific conclusion can be derived from the kappa coefficient.
If the value of the kappa coefficient lies close to -1 then classification is a worse classification.
If the value of the kappa coefficient lies closer to zero than it is nothing but a random classification.Finally, if the kappa coefficient lies close to 1 then it is significantly better and close to accurate classification.
The value of the obtained kappa coefficient is related with the accuracy of various supervised technique to establish a relationship pattern in between classification accuracy and the kappa coefficients.

BACKGROUND OF THE ALOS PALSAR PROGRAM
The Advanced Land Observing Satellite (ALOS) was launched on 24 th January 2006.The main objective of this satellite was to monitor changes in the land cover & environment using high-resolution satellite images.It was carrying three strong sensors for human visual interpretation, a high-resolution optical sensor, and an active microwave (L band synthetic aperture radar) sensor.ALOS was working on four mission objectives i.e. disasters observations, resource exploration, regional observations and cartography, but the main objective of ALOS was monitoring the content of water, carbon and changing global climate [33].Three different sensors are named as Advanced Visible and Near Infrared Radiometer 2 (AVNIR-2) [34], Panchromatic Remote Sensing Instrument for Stereo Mapping (PRISM) [35] and the Phased Array L-band Synthetic Aperture Radar (PALSAR) [36], [37].PALSAR is L band synthetic aperture radar designed to work under all sought of weather conditions.It is designed to provide data in four polarization combinations (HH, VV, VH and HV), where first H or V represents transmit polarization and second H or V represents receive polarization.Application of the PALSAR data includes sea surface ice monitoring, land use/land cover classification, trees height estimation and interferometry.PALSAR consists of some unique features it is designed to operate in three different modes i.e. fine, scan-SAR and polarimetric its center frequency of operation is 1270 MHz, chirp bandwidth is 14 and 28 MHz, range resolution is 7 to 100 m, incidence angle can vary from 8 to 60 degree, observational swath vary from 40 to 350 km, bit length vary from 3 to 5 bits, data rate vary from 120 Mbps to 240 Mbps.The radiometric accuracy for the images obtained from PALSAR is 1dB/orbit i.e. 1.5 dB.

SUPERVISED CLASSIFICATION TECHNIQUES
Supervised classification is the technique mostly used for the classification of the remote sensing image data, it is dependent on the suitable algorithm and procedures to classify or label the pixels into a particular class of interest.In these technique prototypes pixels of each desired class are assigned the name 'training data', and on the basis of the training data image is classified into several classes, which is again followed by matching of the classified image through 'testing data' [38].Testing data contains the information based upon the ground reality, GPS coordinates, topographic maps, etc.Finally, the accuracy test of the classification is performed to obtain the percentage of correct classification.
Where ( ) is the conditional probability to analyze x from class c.
x is the image data from the n band.

METHODOLOGY
In this experiment we are performing a comparative accuracy assessment test by comparing the classification accuracy of the PC, MDC, MaDC, MLC, SAM, SVM and SID supervised classification techniques.We have obtained the accuracy of classification and estimated the kappa coefficient value followed by the development of a relationship between overall accuracy and kappa coefficient.

Geographical location of the Study Area
In this experiment, we have taken PALSAR data of the Roorkee region of the Uttarakhand, India.The Roorkee is located at 29.87°N and 77.88°E.It has an elevation of 268 meters, i.e. 879 feet above the sea level, it is spread over the region of 129.88 sq.km [39], from the ArcGIS image we have obtained the initial information of the Roorkee region like water bodies in the form of Solani river, vegetation areas, urban settlements, areas where no vegetation is present and bare soil surfaces.Here we have classified the PALSAR data in the four classes, i.e. bare soil, urban, vegetation and water.Now for the training of the dataset we have manually selected region of interest (ROI's) for the four classes with the assistance of ArcGIS Earth, and for computing the accuracy for our classification we have used the ground truth coordinates obtained through field visits to the geographic locations.We have obtained the information of the particular class through latitude and longitude coordinates of the respective area with the assistance of GPS device.The purpose of this investigation is to identify the technique from which we can obtain maximum classification accuracy and relate this effect with the kappa coefficient.Here we have compared seven classification schemes on the same set of ROI's to obtain the maximum efficiency produced by an individual classification scheme.First of all, we have classified the PALSAR data through PC technique and the statistics derived after classification is shown in  Here, after the classification we have obtained no useful information as the majority of the classified image is in green and black i.e. vegetation and bare soil respectively.The initial observation concludes that the region under water and urban classes remains unclassified.Thus this technique does not prove to be an effective technique for the image classification.

S.No
In the second step we have performed the classification of the PALSAR data through MDC technique.The statistics derived by performing this classification is shown in table 3.  The MDC classification scheme proves to be a much better classification scheme as compared to the PC technique, as in MDC technique the initial assessment suggests that all four classes are classified water and urban class which were unclassified in the PC classification have also appeared in this classification scheme.In the third step, we have used MaDC technique for the classification of the PALSAR data.The statistics obtained after the classification of the data is shown in table 4. The MaDC classification scheme appears to be an intermediate classification scheme in between PC and MDC classification scheme as from the classified image of MaDC we have come to a conclusion that all the four classes are classified, as all classes are making their presence count, but when comparing the classified image with MDC classified image.The urban class seems to be not completely classified.SAM definitely proved to be a good classification technique but one thing is certainly true that it is not the best one.As in this classification scheme we have observed very less classified urban area, and at the same time, it is also observed that the class bare soil is also not so effectively classified.The image mostly appear to be bluish and greenish in colour which are the classes for water and vegetation respectively.This scheme is less accurate than the PC classification scheme.In the sixth classification scheme, we have used SVM classification scheme.This classification scheme does not provide effective classification information and the classified image is similar to PC classified image.The SVM classified image is shown in the fig.13.  7.  7 Statistics derived through Support vector machine classification SVM classification scheme is also less accurate classification scheme as from the classified image it is clear that all four classes are not completely classified.Here we have observed that urban region is not classified at all.The vegetation and bare soil classes are most significantly classified and class water is little bit classified.The urban class is completely unclassified.In the seventh and final classification scheme through SID, we have come to a conclusion that it is the worse classification scheme as classification results are tremendously weird.All the classes are completely unclassified and are not matched at all.   8 Statistics derived through Spectral information divergence classification SID classification scheme proves to be a worse classification scheme for the PALSAR data, as we have come to a conclusion that through this scheme no effective information can be obtained.The classified image only present a guess of the landcover in the mind of the user which appears only because we have used supervised classification techniques.This classification scheme is the least accurate and reliable in term of accuracy and kappa coefficient.Now, for obtaining the best classification accuracy we have compared the accuracy obtained from several classification techniques.The overall accuracy of the classification is obtained, which is the ratio of the correctly classified pixel to the total number of the pixels under investigation [40].Here the value obtained through the kappa coefficient is also computed for the classification techniques, and on the basis of these both parameter, best classification technique is identified.Spectral information divergence (SID) 0.1011 28.87% Table 9 Comparative analysis of kappa coefficient and overall accuracy It is now clear from the table 9 that the range of the kappa coefficient is maximum for the MLC followed by MDC, MaDC, SVM, PC, SAM and minimum for SID.The overall accuracy is best for the MDC scheme, followed by MLC, MaDC, SVM, PC, SAM and minimum for SID classification scheme.Here we have also discovered a practical fact that the value of the kappa coefficient and overall efficiency is approximately equal for maximum likelihood and minimum distance classification.Now fig.15 shows a comparative bar plot of the kappa values obtained from various classification techniques.The change obtained in the value of kappa coefficient for MLC and MDC is approx 0.2239 % which conclude that both the scheme can be used to produce the maximum kappa coefficent and overall accuracy.Finally, we have developed a peculiar relation in between the obtained overall accuracy and obtained kappa coefficient for the various classification schemes, this relationship is linear in nature, from this we have also concluded that as the value of kappa coefficient increases the classification accuracy also get increased.

∝
i.e. kappa coefficient is directly proportional to overall accuracy.Now we have plotted the kappa values on vertical-axis and classification accuracy on the horizontal axis.This shows that a new linear relationship is developed in between kappa coefficient and overall classification accuracy.
Fig. 17 Linear relationship between kappa coefficient and accuracy

CONCLUSION
The classification techniques proves to be a very effective and useful tool to match training and testing data, from this research work we have obtained conclusion that among the seven supervised classification techniques, MDC classification technique is best from the point of view of accuracy and MLC is best in term of kappa coefficient, furthermore MLC and MDC classification techniques both can provide good accuracy and kappa values.We have also concluded that classes like bare soil and vegetation are clearly visible through every classification scheme, but the classification of classes like urban and water is different from the different classification schemes.This can also be an area of research why water and urban are not able to appear clearly through various classification schemes.One important recommendation is that when any user wants his data to be classified to a minimum threshold then he can prefer SVM, PC, SAM over MLC, MDC and MaDC techniques as these techniques also provide fair enough classification of the PALSAR data.

14 )
− 27.2 = 72.73%(13) ( ) = 100% − ( ) = 100 − 25.58 = 74.42%(Produceraccuracy is the accuracy from the point of view of a map maker, this provides us information about how much accurately the class will be actually present on the ground.

Fig. 2
Fig.2 Supervised classification diagram Now we are discussing about various supervised classification techniques.Parallelepiped classification: In this classification technique a decision-based rule is used to classify the data.An n-dimensional decision boundary is created in the image data space.Here mean of each class is selected which is based on the standard deviation threshold.Pixels values lying below and above the threshold for all nbands are classified.If a pixel value falls under several classes, classifier assigns that pixel to the first matched class, whereas remaining falls under the unclassified category.Minimum distance classification: In this classification scheme classifier computes Euclidean Distance (ED) between the mean value of the class and the pixel under consideration, and then the pixel is allocated to the class which is at the minimum ED.Mahalanobis distance classification: This is a direction sensitive based distance classifier, in which statistics derived from each class are used.It assumes that all classes have equal covariance.Here

Fig. 3 S e p a r a t in g H y p e r p la n e S u p p o rt V e c to r 1 S u p p o rt V e c to r 2 Fig. 4
Fig. 3 Supervised spectral angle mapperThis similarity is evaluated by the vector starting from the origin.The length of the vectors is the representation of the reflection intensity.The difference between the two spectra is described by the spectrum angle.Finally, by evaluating the difference between the pixel and the reference spectrum an image can be classified in any number of classes.The angle says α (alpha) formed in between the two vectors is the arc cosine it is calculated on the basis of the the product of all bands′ × ′ , total pixels′ ′, and all the reference pixels denoted by′ ′.Then the angle α is represented as follows

Fig. 5
Fig.5 India (Bharat) Uttarakhand Location of the Roorkee e. it provides information about four different polarizations HH (horizontal-horizontal), VV (vertical-vertical), HV (horizontal-vertical) and VH (vertical-horizontal), these polarimetric represents linear polarizations.Now we have four different bands of the PALSAR data characterize as Band 1, Band 2, Band 3 and Band 4, all these bands are fused together to obtain a quad polarized PALSAR image.

Fig. 6
Different band representation of the PALSAR data Now all the four bands are fused together to obtain a single quad pole image on which the classification schemes are applied.Map information of these bands is projection: geographic latitude/longitude, pixels: 0.00005 degree and Datum WGS-84, now all the bands are fused together to optically represent a PALSAR image shown in figure 7.

Fig. 10
Fig. 10 Mahalanobis distance classified PALSAR data MLC classification technique for image classification proves to be an effective classification technique as this technique provides us an better result in terms of visualization of the classification classes.This technique have certainly provided better classification result than PC and MaDC classification techniques.In this classification all the four classes are clearly visible, but the water stream and wetlands are not so clearly visible as compared to the classified image obtained from MDC classification.

Fig. 11
Fig. 11 Maximum likelihood classified PALSAR data In the fifth classification step we have used SAM technique to classify the PALSAR image.The SAM classified PALSAR image is shown in the fig.12.

Fig. 15
Fig. 15 Comparative barplot of kappa coefficientIn the fig.16comparative plot of the overall accuracy is shown.Finally, on comparing the obtained accuracy from various classification techniques, we have concluded that MDC technique have obtained maximum accuracy, which is 0.2352 % more than the MLC classification scheme.

Fig. 16
Fig. 16 Comparative barplot of the Overall Accuracy procedure to calculate user accuracy, producer accuracy, omission error, commission error and overall accuracy is shown.

Table 1
Ground truth survey pointsThe PALSAR data is having importance over other datasets because it is quad-pole i.

Table 3
Statistics derived through Minimum distance classification

Table 4
Statistics derived through Mahalanobis distance classification

Table 5
Statistics derived through Maximum likelihood classification

Table . 6
Statistics derived through Spectral angle mapper classification .