COMBINATION OF GENETIC ALGORITHM AND DEMPSTER-SHAFER THEORY OF EVIDENCE FOR LAND COVER CLASSIFICATION USING INTEGRATION OF SAR AND OPTICAL SATELLITE IMAGERY

The integration of different kinds of remotely sensed data, in particular Synthetic Aperture Radar (SAR) and optical satellite imagery, is considered a promising approach for land cover classification because of the complimentary properties of each data source. However, the challenges are: how to fully exploit the capabilities of these multiple data sources, which combined datasets should be used and which data processing and classification techniques are most appropriate in order to achieve the best results. In this paper an approach, in which synergistic use of a feature selection (FS) methods with Genetic Algorithm (GA) and multiple classifiers combination based on Dempster-Shafer Theory of Evidence, is proposed and evaluated for classifying land cover features in New South Wales, Australia. Multi-date SAR data, including ALOS/PALSAR, ENVISAT/ASAR and optical (Landsat 5 TM+) images, were used for this study. Textural information were also derived and integrated with the original images. Various combined datasets were generated for classification. Three classifiers, namely Artificial Neural Network (ANN), Support Vector Machines (SVMs) and Self-Organizing Map (SOM) were employed. Firstly, feature selection using GA was applied for each classifier and dataset to determine the optimal input features and parameters. Then the results of three classifiers on particular datasets were combined using the Dempster-Shafer theory of Evidence. Results of this study demonstrate the advantages of the proposed method for land cover mapping using complex datasets. It is revealed that the use of GA in conjunction with the Dempster-Shafer Theory of Evidence can significantly improve the classification accuracy. Furthermore, integration of SAR and optical data often outperform single-type datasets.


INTRODUCTION 1.1 Integration of optical and SAR
Synergistic uses of different kind of remote sensing data, particularly, multispectral and SAR imagery for land cover classification has become an attractive research area since advantages of each kind of data sources can be integrated together in order to enhance the classification performance.Many studies based on the combination approach using different datasets and classification techniques have been conducted (e.g.Erasmi and Twelve 2009, Sheoran et al. 2009, Chu and Ge 2010, Ruiz et al. 2010).Most authors reported that the integration of multiple types of data has led to improvement in classification performance.
Although use of multiple types of remote sensing data has high potential to increase the classification accuracy it also makes data volume increase rapidly with large amount of highly correlated features and redundant information.Unfortunately, employing large data volume does not always result in an increase in classification accuracy.In contrary, it will also increase uncertainty within dataset and could reduce classification accuracy significantly.According to Kavzoglu and Mather (2003) a large amount of data inputs decreases generalisation capabilities of the classifiers and produce more redundant and irrelevant data.Lu and Weng (2007) also pointed out that utilisation of too much input data may not improve (but can actually decrease) the classification accuracy, and it is important to select only input variables that are useful for discriminating land cover classes.Hence, the challenging task is how to select optimally combined datasets which give the best classification.
The Feature Selection (FS) techniques often employed to search for optimal or nearly optimal input datasets.Many FS methods have been used in remote sensing such as exhaustive search, forward and backward sequential feature selection, simulated annealing and Genetic Algorithm (GA).Numerous studies have shown that the GA technique is very efficient in dealing with large datasets and has a larger chance to avoid a local optimal solution than other methods (Huang et al. 2006, Zhou et al. 2010).Another advantage of the GA techniques is its capability to search for input features and parameters of classifier simultaneously.

Classification techniques
Applying appropriate classification algorithms is also very important for land cover classification.The traditional parametric classification algorithms such as Minimum Distance, Maximum Likelihood (ML) classifiers have been used widely to classify remote sensing images.These classifiers can produce relatively good classification results in rather short time.
However, the major limitation of these classifiers relied on its statistic assumptions which are not sufficiently model remote sensing data (Waske and Benediksson, 2007).This nature cause remarkable difficulties for parametric classifiers to incorporate different kinds of data for classification.Unlike conventional classifiers, the non-parametric classification algorithms such as Artificial Neural Network (ANN) or Support Vector Machine (SVM) are not constrained on the assumption of normal distribution and are therefore considered more appropriate for handling complex datasets.The other technique which has been applied successfully for classifier's combination is Dempster-Shafer (DS) theory (Du et al. 2009, Trinder andSalah 2010).

One
Although both FS and the MCS techniques have been used widely for classify remote sensing data this study is probably the first effort to integrate these techniques for classifying multisource satellite imagery.
In this study, an approach, in which synergistic use of a FS methods with Genetic Algorithm (GA) and multiple classifiers combination based on Dempster-Shafer Theory of Evidence, is proposed and evaluated for classifying land cover features in New South Wales, Australia.We called this approach FS-GA-DS model.

STUDY AREA AND USED DATA
The study area was located in Appin, New South Wales, Australia, centred around the coordinate 150 o 44' 30" E; 34 o 12' 30" S. The site is characterised with diversity of covered features such as native dense forest, grazing land, urban & rural residential areas, facilities and water surfaces.Textural data, which provides information on spatial patterns and variation of surfaced features, plays an important role in image classification (Sheoran et al. 2009)

Classification
Three non-parametric classifiers are employed for classification processes, including Artificial Neural Network (ANN) with a Back Propagation algorithm (BP), Kohonen Self-Organizing Map (SOM) and the Support Vector Machine (SVM).

Artificial neural network (ANN)
Artificial Neural Networks are nonparametric method which has been used largely in remote sensing, particularly for classification (Tso and Mather 2009, Bruzzone et al. 2004).
The Multi Layer Perception (MLP) model using the Back Propagation (BP) algorithm is the most well-known and commonly used ANN classifiers.The ANN classifier often provides higher classification accuracy than the traditional parametric classifiers.(Dixon andCandade 2008, Kavzoglu andMather 2003).In this study, we used the MLP-BP model with three layers including input, hidden and output layer.The number of neurones in the input layer is equal to a number of input features, the number of neurones in output layer is a number of land cover classes to be classified.The optimal number of input neurones and a number of neurones in the hidden layer was searched by GA techniques.We used the sigmoid function as a transfer function.The other important parameters were set as follows: Maximum number of iteration: 1000; learning rate: 0.01-0.1;training momentum: 0.9.The classification were run using the Matlab 2010b ANN toolbox.

Self-Organizing Map Classifier (SOM)
The Self-Organizing Map (SOM), which was developed by the Tewu Kohonen in 1982, is another popular neural network classifier.The SOM network has unique property that it can automatically detects (self-organizing) the relationships within the set of input patterns without using any predefined data models (Salah et al. 2009, Tso andMather 2009).Previous studies revealed that SOM are effective method for classifying remotely sensed data (Salah et al. 2009, Lee andLathrop 2007).
In this work, the input layer is dependent on different input datasets.The output layer of SOM was a two dimension array of 15x15 of neurons (total 225 neurons).The neurones in the input layer and output layer are connected by synaptic weights which are randomly assigned within a range of 0 to 1.

Support Vector Machines (SVM)
SVM is also a favourite non-parametric classifier.This is a recently developed technique and is considered as a robust and reliable in the field of machine learning and pattern recognition (Waske andBenediksson, 2007, Kavzoglu andColkesen 2009).SVM separates two classes by determining an optimal hyperplane that maximises the margin between these classes in a multi-dimensional feature space (Kavzoglu and Colkesen 2009).
Only the nearest training samples -namely 'support vectors' in the training datasets are used to determine the optimal hyperplane.As the algorithm only considers samples close to the class boundary it works well with small training sets, even when high-dimensional datasets are being classified.The SVMs have been applied successfully in many studies using remote sensed imagery.In these studies the SVMs often provided better (or at least at same level of) accuracy as other classifiers (Waske and Benedikson, 2007).In this work, the SVM classifier with a Gausian radical basis function (RBF) kernel has been used because of its highly effective and robust capabilities for handling of remote sensing data (Kavzoglu and Colkesen 2009).
Two parameters need to be optimally specified in order to ensure the best accuracy: the penalty parameter C and the width of the kernel function γ.These values will be determined by the GA algorithm while searching for optimal combined datasets.
For other cases a grid search algorithm with multi-fold crossvalidation was used.

GA techniques
The concept of the GA method is based on the natural selection process, which has its roots in biological evolution.At the beginning the set of features are generated randomly as a population.In next stages, the individuals are selected as 'parents' at random from the current population, and produce 'children' for the next generation.The GA gradually modifies the population toward an optimal solution based on the fitness function and operations such as selection, crossover and mutation.The application of GA model involves designing of the chromosomes, the fitness function and architecture of the system.The chromosome is usually an array of bits which represents the individuals in the population.An objectives function play important role in the GA method, it is designed and utilized to evaluate the qualities of candidate subsets.
In this paper, we proposed a fitness function which made uses of classification accuracy, number of selected features and average correlation within selected features.Firstly, the GA was implemented for each combined datasets using the SVMs, ANN and SOM classifiers.These processes will give the classification results of each classifier with corresponding optimal datasets and parameters.After that the classification results were combined using Dempster -Shafer theory.The commonly used Majority Voting (MV) algorithm was also implemented for comparison.Six land cover classes, namely Native Forest (NF), Natural Pastures (NP), Sown Pastures (SP), Urban Areas (UB), Rural Residential (RU) and Water Surfaces (WS) were identified for classification.The data used for training and validation were derived from visual interpretation and old land use map with the help of Google Earth images.The training and test data were selected randomly and independently.

RESULTS AND DISCUSSIONS
The overall classification accuracy for the SVM, ANN and SOM classifier over different datasets using feature selection and non-feature selection approach is summarised in the table 3 The classification results illustrated the efficiencies of the synergistic uses of multi-date optical and SAR images.Both non-FS and FS with GA (FS-GA) methods gave significant increase in classification accuracy while the combined datasets (3 rd and 4 th ) were applied.
As for the non-FS approach the combined multi-date Landsat 5 TM+ and SAR data increase overall accuracy by 2.46% and 22.1% for SVM, 5.5% and 21.6% for ANN and 0.06% and 23.97% for SOM compared to the cases that only multi-date Landsat 5 TM+ or SAR images was used.These improvements were even more significant while the FS method were applied.Textural information and NDVI are valuable data for land cover classification.In most of cases, the integration of these data enhances classification results noticeably.For instances, with the FS-GA approach, the classification of a combination of original optical and SAR images with their textural and NDVI data (4 th dataset) gave increases of overall accuracy by 2.85%, 1.49% and 0.80% for SVM, ANN and SOM classifiers, respectively.
It is clearly that, the FS-GA approach performed better than the traditional non-FS approach.For all of datasets and classifiers that have been evaluated, the FS-GA approach gave significant improvements in the classification accuracy.The increases of overall classification accuracy ranging from 0.29% (ANN classifier with the 3 rd dataset) to 2.70% (SOM classifier with the 4 th dataset).The highest accuracy of 85.22% was achieved by the integration of FS methods with SVM classifier for the 4 th dataset.It is worth mentioning that the FS-GA approach used much less input features than the traditional method.For instances, in a case of the SVM classifier and the 4 th dataset only 68 out of 173 features were selected.As was mentioned early in this paper, the increase of data volume does not necessary increase the classification accuracy.In a non-FS method, the accuracy of classification using SOM algorithm for the 4 th dataset was actually reduced by 1.19% compared to the case of 3 rd dataset.However, this problem does not happen while applying the FS-GA technique.In this case, the accuracy of SOM algorithm slightly increased by 0.80%.The Figure 4 below showed the results of classification using the FS-GA techniques with the SVM classifier which gave the best accuracy among single classifier for the 4 th dataset.Although the FS-GA model has already produced significant increase in the classification accuracy of evaluated multisource remote sensing datasets, the integration of multiple classifiers combination with FS-GA method has even further remarkably improved the classification performance.The FS-GA-DS algorithm always gave better accuracy than any best single classifier in all cases.The range of classification improvements was from 1.24% for the 2 nd dataset to 3.07% for the 4 th dataset as compared to the FS-GA model.Of course, increases in classification are even much more significant as compared to the traditional non-FS method.The highest classification accuracy obtained by the FS-GA-DS model was 88.29% with the largest combined datasets.The comparison of improvements in classification performance between FS-GA and non-FS; FS-GA-DS and FS-GA was given in the figure 7 below.One of the probably reason behind the successful of the FS-GA-DS model is its capability to integrate various optimal (or nearly optimal) solutions given by the GA method for specific classifier such as SVM, ANN or SOM to enhance the generality of the final solution.
of the recent technical development for mapping land cover features is classifier combination or Multiple Classifier System (MCS).Each kind of classification algorithm has its own merits and limitations.The classifier combination techniques can take advantages of each classifier and improve the overall accuracy.Application of multiple classifier system (MCS) in remote sensing has been discussed in Benediksson et al. (2007).There are many methods for combine classifier such as Majority Voting, Weigh Sum, Bagging or Boosting.Du et al. (2009) used different combination approach including parallel and hierarchical classifier systems, training samples manipulation with Bagging and Boosting techniques for classifying hyperspectral data.Foody et al. (2007) integrated five classifiers based on majority voting rule for mapping fenland East Anglia, UK.Salah et al. (2010) employed the Fuzzy Majority Voting techniques to combine classification results of three classifiers over four different study areas using lidar and aerial images.
Remote sensing data used for this study includes: Optical: Three Landsat 5 TM+ images acquired on 25/03/2010, 10/9/2010 and 31/12/2010 with 7 spectral bands and the spatial resolution is 30m.In this study 6 spectral bands except the thermal band were used.Synthetic Aperture Radar (SAR): 6 ENVISAT/ASAR VV polarization and 6 ALOS/PALSAR HH polarization images acquired in 2010 (

Figure 1 .
Figure 1.ENVISAT/ASAR VV polarized image acquired on 25/09/2010 where OA = overall classification accuracy (%)W OA = weight for the classification accuracy, W fs = weight for the number of selected features N S = number of selected features N = the total number of input features.Cor = average correlation coefficients of selected bands The values of W OA and W fs were set within 0.65-0.8and 0.2-0.35,respectively.The other parameters for the GA were:Population size = 20-40; Number of generations = 200; Crossover rate: 0.8; Elite count: 3-6; Mutation rate: 0.05.

Figure 4 .
Figure 4. Results of classification using FS-GA with SVM classifier.The improvements of classification accuracy by using FS-GA technique for different classifiers and combined datasets is shown in the Figure 5.

Figure 5 .
Figure 5. Improvements of accuracy by applying FA-GA techniques for SVM, ANN and SOM classifiers.The comparison of classification results between the best classifier in non-FS, FS-GA approach and the multiple classifier combination using Dempster-Shafer theory (FS-GA-DS) is given in the table 4 and figure6.

Figure 6 .
Figure 6.Comparison of best classification results of non-FS, FS-GA and FS-GA-DS methods on different datasets.

Figure 7 .
Figure 7. Improvements of overall classification accuracy achieved by using FS-GA and FS-GA-DS model.The Majority Voting (MV) algorithm is also very effective for combining classification results.However, it gave a slightly lower accuracy than the DS algorithm.Results of MV and DS algorithms were shown in the Table5.

Table 4 .
table 4 and figure 6.Comparison of best classification results using non-FS, FS-GA and FS-GA-DS classifier combination method.

Table 5 .
Classification accuracy by applying DS and MV algorithm for classifier combination.