HYBRID OPTIMIZATION OF OBJECT-BASED CLASSIFICATION IN HIGH-RESOLUTION IMAGES USING CONTINOUS ANT COLONY ALGORITHM WITH EMPHASIS ON BUILDING DETECTION

Automatic building detection from High Spatial Resolution (HSR) images is one of the most important issues in Remote Sensing (RS). Due to the limited number of spectral bands in HSR images, using other features will lead to improve accuracy. By adding these features, the presence probability of dependent features will be increased, which leads to accuracy reduction. In addition, some parameters should be determined in Support Vector Machine (SVM) classification. Therefore, it is necessary to simultaneously determine classification parameters and select independent features according to image type. Optimization algorithm is an efficient method to solve this problem. On the other hand, pixel-based classification faces several challenges such as producing salt-paper results and high computational time in high dimensional data. Hence, in this paper, a novel method is proposed to optimize objectbased SVM classification by applying continuous Ant Colony Optimization (ACO) algorithm. The advantages of the proposed method are relatively high automation level, independency of image scene and type, post processing reduction for building edge reconstruction and accuracy improvement. The proposed method was evaluated by pixel-based SVM and Random Forest (RF) classification in terms of accuracy. In comparison with optimized pixel-based SVM classification, the results showed that the proposed method improved quality factor and overall accuracy by 17% and 10%, respectively. Also, in the proposed method, Kappa coefficient was improved by 6% rather than RF classification. Time processing of the proposed method was relatively low because of unit of image analysis (image object). These showed the superiority of the proposed method in terms of time and accuracy.


INTRODUCTION
Building detection by classifying Remote Sensing (RS) images has been attracted many researches' attention due to its extensive applications (e.g.automation of information extraction (Lari and Ebadi 2007, Hermosilla, Ruiz et al. 2011, Li, Wu et al. 2013), updating Geographic Information System (GIS) database (Gharibi, Arefi et al. 2016) ,change detection and urban management (Bouziani, Goïta et al. 2010, Huang, Zhang et al. 2014)).In recent years, the development of RS sensors results in obtaining high spatial resolution (HSR) images.For this reason, it is possible to detect many features from these images, so the extracted information from these images are useful in urban management.Despite of the mentioned advantage, pixel-based classification in HSR images has faced many limitations (e.g.producing salt-paper results due to high spectral diversity, high time processing and inability to interpret image due to weakness of pixel information) (Chen, Hay et al. 2012).Therefore, with the development of objectbased techniques, the unit of image analysis changed from pixel to object (Blaschke 2010) and the limitations of pixel-based techniques have been solved (Hall and Hay 2003, Laliberte, Rango et al. 2004, Blaschke 2005, Desclée, Bogaert et al. 2006, Bontemps, Bogaert et al. 2008, Gamanya, De Maeyer et al. 2009).In object-based techniques, homogeneous image objects have more signal to noise ratio rather than image pixels.Therefore, the results obtained from object-based techniques are more accurate (Benz, Hofmann et al. 2004, Wang, Sousa et al. 2004, Ronczyk 2011, Petropoulos, Kalaitzidis et al. 2012).Since the input image is segmented into homogenous areas in object-based techniques, the earth's features are extracted based on reality.In other words, the main purpose of image processing is to extract features that match with reality (Castilla and Hay 2008).There are two kinds of classification methods in terms of training samples availability (supervised and unsupervised) (Drăguţ, Csillik et al. 2014, Chutia, Bhattacharyya et al. 2015, Egorov, Hansen et al. 2015) and image analysis unit (pixelbased and object-based) (Lillesand, Kiefer et al. 2014).Thorough different classification methods, many researchers suggested Support Vector Machine (SVM) as the superior method (Xuegong 2000, Camps-Valls, Gómez-Chova et al. 2004, Wang 2005, Khatami, Mountrakis et al. 2016).The advantages of SVM are simple training, good generalization ability (Li and Yin 2013, Bin, Jian et al. 2014, Cheng and Bao 2014, Ghamisi and Benediktsson 2015, Chen and Tian 2016) and efficiency in high dimensional and nonlinear problems with small samples (Gao, Mandal et al. 2010).SVM classification, as well as other methods, has some parameters (e.g.penalty term and kernel parameters).These parameters involve in training process of SVM and affect the classification performance, so these parameters values are essential to be determined before training process.The process of finding the appropriate parameter value is known as model selection (MS) or parameter optimization (Cheng and Bao 2014).There are many MS techniques such as grid search, gradient descent algorithm and trial and error approach.The limitations of these approaches are high time processing and low-level automation.On the other hand, building detection from HSR images has faced various limitations due to low number of spectral bands.For this reason, using other features according to the unit of image analysis (e.g.spectral, textural and geometrical features) (Ji 2000, Shackelford and Davis 2003, Coburn and Roberts 2004) or use of other data sources (e.g.digital elevation data) have been suggested to improve accuracy (Geerling, Labrador-Garcia et al. 2007).However, adding the additional features will lead to increase the presence probability of dependent features, which may reduce the accuracy.Therefore, it is essential to use some techniques to detect and remove dependent features, which leads to improve accuracy.This task is known as Feature Selection (FS) (Wan, Wang et al. 2016).Selecting appropriate features by trial and error approach is based on expert knowledge and is dependent on image type and scene.MS of SVM and also FS cause to apply continuous Ant Colony Optimization (ACOR) algorithm in order to solve the existence limitations.The advantages of ACOR are easy to realize and implement, parallel processing, suitable for continuous domain (Socha andDorigo 2008, Zhang, Chen et al. 2010).Many studies have been done in urban features detection based on image objects.For example, a comprehensive review of object-based features detection was presented by (Cheng and Han 2016).The limitations of the mentioned studies are lowautomation level, SVM parameter determination by trial and error approach (Petropoulos, Kalaitzidis et al. 2012, Liu andBo 2015), using independent features according to expert knowledge (Liu and Bo 2015).For example, Bouziani in 2009 used only features which is selected by expert knowledge for building change detection.The limitations of the mentioned study was low automation level in FS procedure and the limited expert knowledge in all cases (Bouziani, Goïta et al. 2010).On the other hand, Samadzadegan et al. have optimized pixel-based SVM classification by using binary ACO with aim of FS and MS in hyperspectral imagery (Samadzadegan, Hasani et al. 2012, Samadzadegan, Hasani et al. 2012).MS of SVM is a continuous problem and FS is a discrete problem, so simultaneously optimization of SVM is a continuous problem.In the mentioned studies, they transformed continuous problem into discrete problem by discretizing continuous domain.On the other words, the continuous range of variable, which should be optimized, are converted into finite sets.This is not always appropriate, especially if the initial possible range of variable is wide (Socha and Dorigo 2008).In addition, the mentioned studies were based on pixel-based processing.As mentioned before, pixel-based processing result in higher time consuming in optimizing classification because of unit of image analysis (image pixel) and producing salt and paper result.To overcome the above limitations and according to importance of selecting independent features in object-based classification, in this paper, a novel method is proposed with the aim of building detection based on hybrid optimization of object-based SVM classification in HSR image and elevation data.ACOR was applied because it is appropriate for continuous optimization problems.The advantages of the proposed method include independency of image scene, relatively high automation level, relatively high speed processing because of unit of image analysis (image object), decreasing the probability of post processing with the purpose of building edge reconstruction.The results of the proposed method were evaluated by pixel-based SVM classification in terms of time, Ground Truth (GT) in terms of accuracy and precision and Random Forest (RF) classification in terms of the ability of choosing independent features.The rest of this paper is organized as follows: the data are described in section 2. The methods and the proposed method are illustrated in Section 3 and 4, respectively.The results are given in Section 5, together with the discussion.Conclusions are drawn in Section 6.

DATA DESCRIPTION
The proposed method has been applied on subset of the digital aerial image and Digital Surface Model (DSM) of Bandar Anzali, Gilan Province (North of Iran).The digital aerial image was acquired by Ultracam D with a spatial resolution of 0.08 m and 3-band multispectral imagery (Red, Green, and Blue) (Figure 1).A look at Figure 1 1998).SVM finds an optimal separating hyper plane that maximizes the margin between two classes.In nonlinear problems, data are not separable in the original feature space.In these cases, SVM uses kernel functions (e.g.linear, sigmoid, polynomial and Radial Basis Function (RBF)) in order to map data into higher dimensional space, so data can be separable in the new space (Figure 2).In this paper, RBF kernel function is used due to achieving good results in terms of accuracy and time processing (Luo, Zhou et al. 2002, Pal 2002, Zhang, Chen et al. 2010) (Equation 4).x .The SVM finds a hyper plane in a high dimensional space that is able to separate the data with a maximum margin (Equation 1): The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W4, 2017 Tehran's Joint ISPRS Conferences of GI Research, SMPR and EOEC 2017, 7-10 October 2017, Tehran, Iran ( ) In the above equation, w is a weight vector, orthogonal to the hyper plan, b is an offset term and  is a mapping function.
Maximizing the margin is equivalent to minimizing the norm of w .Thus, SVM is trained to solve the following minimization problem: Where C is a regularization parameter that imposes a trade-off between the number of misclassification in the training data and the maximization of the margin and i  are stack variables.The decision function obtained through the solution of the minimization problem equation ( 3) is given by 1 ( ) ( ) ( ) Where the constants are called Lagrange multipliers determined in the minimization process; SV corresponds to the set of support vectors, training samples for which the associated Lagrange multipliers are larger than zero.The kernel function compute dot products between any pair of samples in the feature space.It is an ensemble classifier that apply a bootstrap aggregated sampling technique ("bagging") to build many individual decision trees, from which a final class assignment is determined (Breiman 2001).Observations in the original training data which do not occur in the bootstrap sample are named Out-Of-Bag (OOB) observations.A random subset of predictor variables split apart the training data into homogenous subsets (Mellor, Haywood et al. 2013).The node-splitting variable that allows for the greatest variance is selected.This allows the overall model to increase its generalization capacity before and after the split (Walton 2008).The OOB sample data evaluates the performance by computing the accuracy and error rates averaged over all of the predictions (Breiman 2001) and estimates the importance of each variable in the classification.In RF classification, it is possible to select dependent features by variable importance.

ACOR Optimization Algorithm
ACO was introduced as a novel nature-inspired method by (Dorigo, Maniezzo et al. 1996, Dorigo andSttzle 2004).It is based on the foraging behaviour of real ant (finding the shortest path between their nest and food sources (optimization problem)).The real ants initially explore the area surrounding their nest in a random manner when searching for food.As soon as an ant finds a food source, it carries some food back to the nest and deposits a pheromone trail on the ground during the return trip.The amount of deposited pheromone shows the quantity and quality of the path between the food source and their nest and guides others ants to find the food source.After a while, the path that has the most pheromone is the shortest path between their nest and the food sources.Based on this, ant colonies choose this path as the shortest one.Indirect communication among ants via pheromone trails enables them to find the shortest path between their nest and food sources.This capability of real ant colonies has inspired the definition of artificial ant colonies that can find approximate solutions to optimization problems (Socha and Dorigo 2008).Continuous optimization is hardly a new research field.ACO is suitable for discrete domain.For continuous domain, it was extended to continuous domains without making any major conceptual change to its structure.ACOR extended the population-based ACO with Gaussian Probability Density Function (PDF) as pheromone update and adopted the rankbased selection mechanism (Xiao and Li 2011).The advantages of ACOR is possible to solve problems where some variables are discrete and others are continuous (Socha and Dorigo 2008).ACOR uses initial population and the Gaussian PDFs in order to generate the best individuals of this population.After sampling from these Gaussian PDFs, the new individuals merge with the initial population.Finally, the predefined number of individual are selected from the new population sorted based on an objective function.The best solution is an individual that has the best objective function value in the stopping criterion.This procedure will continue until the stopping criterion is met.The general flowchart of ACOR is shown in Figure 3.Further details about ACOR can be found in (Socha and Dorigo 2008).

Image Pre-processing
Image pre-processing (e.g.radiometric and geometric corrections) is the first step of most of the RS processing.The studied data sources were georefrenced.Histogram Equalization (HE) is a simple and effective radiometric correction method with the purpose of increasing contrast in images, which was applied on the image.It should be noted that the difference between the minimum and maximum value of brightness intensity (luminance) is low in image with low contrast.HE makes it possible to increase the contrast of input image.More details can be found in (Wang, Chen et al. 1999).In order to identify the bare earth's surface and DTM generation, there are many filters which filter the ground and non-ground points.In this paper, Adaptive Triangulated Irregular Network Modelling (ATINM) was applied on DSM, which is capable of identifying the bare earth's points in complex urban areas.More detail about ATINM can be found in (Axelsson 2000).After that, DTM was produced by gridding the output results of applying ATINM on DSM using nearest neighbor method.Finally, nDSM was computed from Equation (5).

Image Segmentation
Image segmentation is an important step in object-based analysis because its quality directly affects the success or failure of those processing based on the object-based analysis (Witharana and Civco 2014).It is the process of labelling pixels in an image such that all of the pixels located in a homogenous area have a similar label (Haris, Efstratiadis et al. 1998).Among the proposed segmentation algorithms, multi resolution segmentation was used in this paper.The advantages of this algorithm includes easy realizing and implementation and producing results that match reality (Baatz and Schäpe 2000, Lefebvre, Corpetti et al. 2008).This method consecutively merges pixels or existing image objects.It is a bottom-up segmentation based on a pairwise region merging technique.Multi resolution segmentation is an optimization procedure which, for a given number of image objects, minimizes the average heterogeneity and maximizes their respective homogeneity.More detail about this method can be found in (Baatz andSchäpe 2000, Lefebvre, Corpetti et al. 2008).By applying multi resolution segmentation, the input image was segmented into homogenous areas.Since the proposed method was applied on the over segments, the segmentation parameters (i.e.scale, shape and compactness) were determined by trial and error approach.For this reason, it was not necessary to determine accurate segmentation parameters in order to create segments that math reality.

Feature Extraction and Preparation
Generally, there are three kinds of features: spectral, textural and structural.In this paper, spectral and textural features were extracted to appropriately separate different classes because the proposed method was applied on the over segments due to determination of the segmentation parameters by trial and error approach.According to (Barzegar, Ebadi et al. 2015), the spectral features used in this paper includes Excessive Green Index (EGI), Difference Green Ratio (DGR), Normalized Difference Index (NDI), Brightness Index (BI), Saturation Index (SaI), Hue Index (HI), Correlation Index (CI), Redness Index (RI), Shadow Index (SI) and LAB space (Table 1).

Formula
Spectral Index EGI (Woebbecke, Meyer et al. 1995)   The second type of features used in this paper were textural features which are extracted from Gray-Level Co-occurrence Matrix (GLCM).The idea of using GLCM was suggested by Haralick in the 1970s (Haralick, Shanmugam et al. 1973).Various textural features can be extracted from GLCM (e.g.mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment, correlation).These features were extracted in five different kernel sizes (e.g.3×3, 5×5, 7×7, 9×9 and 11×11) and four different directions (e.g.0°, 45°, 90° and 135°).Due to removing the impact of different directions on the textural features, the average of textural features was computed for each The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W4, 2017 Tehran's Joint ISPRS Conferences of GI Research, SMPR andEOEC 2017, 7-10 October 2017, Tehran, Iran segment.Then all of the extracted features were integrated with spectral bands and the input image was normalized so that all image pixel values would be located in the range of [0, 1].After that, the extracted features were computed for each segment by averaging the values of all pixels located in each segment.This image was an input for the next step.
4.5 Optimized Object-Based Classification 4.5.1 Hybrid Optimization Due to using various features and determination of SVM parameters, hybrid optimization was applied to select independent features and to determine the parameters of RBF kernel function (γ) and penalty term (C) with relatively high automation level.Based on this, it was essential to collect some training and test samples segments in order to train SVM and hybrid optimization.These samples were selected by an expert in two classes (building and non-building) with respect to GT data.In this paper, ACOR algorithm was used due to reasons discussed in section 1 and 3.
In the initialize phase, the ACOR starts with an initial population.The initial population consisted of N individuals that were located in a D-dimensional search space.According to the optimization problem, each individual included three parts: mask of input features, C and γ.Thus, if the input image had (n) input features, the length of each possible solution will be (n+2).The (n) first position in each possible solution are related to selecting that feature or not.The (n+1) th and (n+2) th positions were related to C and γ.After initializing individuals in the search space, each of them was evaluated by an objective function.In this paper, classification error was used as an objective function to evaluate each individual.An individual of the population was the best member that had the lowest classification error, so the objective of this optimization problem was to minimize the cost function.Finally, after reaching to the stopping criteria (i.e.maximum iteration) in ACOR, the best individual with the lowest cost value was the optimum solution, which includes optimum features, C and γ.

Object-Based SVM Classification
After hybrid optimization, the object-based SVM classification was carried out by the optimum results.Then the classified image was evaluated by 5 different accuracy assessment criteria (e.g.Overall Accuracy (OA), Kappa Coefficient (KC), Quality Percentage (QP), Producer Accuracy (PA) and User Accuracy (UA)) with GT data.The results of the proposed method were also evaluated by pixel-based SVM and RF classification in terms of classification performance.

RESULTS AND DISCUSSION
The pre-processing procedures of the proposed method and the comparative methods were implemented using RS software (e.g.ALDPAT, Global Mapper, eCognition and ENVI) and MATLAB R2015a was used to implement the proposed method.To filter ground points from non-ground points, ATINM was applied on DSM in ALDPAT (Figure 5.a).The cell size of this filter was equal to the resolution of data sources (0.08 m).Then, DTM was produced by gridding the output results of applying ATINM on DSM using nearest neighbor method in Global Mapper.According to Equation 5 As mentioned, it was essential to use HE method on the original image (Figure 6).After that, the spectral and textural features were extracted from HSR image and DSM in ENVI.Next, multi resolution segmentation was applied on the image in order to create homogenous areas in eCognition.The parameters of this segmentation are scale, shape and compactness.It is essential to set these parameters accurately with the purpose of exact feature extraction.However, in over segmentation procedure, these parameters were determined by default values according to image size, area type (e.g.urban or non-urban) and interest feature class (e.g.line features such as roads or regular features such as buildings).Based on this, the values of scale, shape and compactness parameters were set 60, 0.8 and 0.3, respectively.The segmented image is shown in Figure 7.After that, all of the extracted features were computed for each over segment by averaging all of the pixels located in each segment so the segmented image with the features was prepared for the hybrid optimization step.The improvement of KC, OA and QP for different classification methods is shown in Figure 8.According to Figure 8, QP, OA and KC were improved by 9%, 2% and 30% in object-based SVM classification using 15 randomly selected features (Object-SVM-15R) rather than using all of the input features (Object-SVM-176), respectively.It should be noted that SVM parameters were determined in trial and error approach in this step.This showed the presence probability of dependent features among all of the input features and their effects on classification performance.The accuracy criteria were improved by using 15 randomly selected features, but the improvement was not entirely appropriate.For this reason, it is essential to apply FS and MS methods in high dimensional space.In comparison with Object-SVM-15R, the proposed method (Object-SVM-ACOR) improved QP, OA and KC by 20%, 12% and 30%, respectively.These results showed the efficiency of the hybrid optimization in FS and MS procedure of SVM.The optimized pixel-based SVM (Pixel-SVM-ACOR) reduced QP, OA and KC by 17%, 10% and 24% in comparison with Object-SVM-ACOR, respectively.These results showed the limitation of pixel-based classification in complex urban areas.Object-SVM-ACOR improved the KC by 6% rather than RF classification.According to good performance of RF in various RS studies, the relative improvement of the proposed method implied its superiority rather than RF in HSR images classification and clarified the efficiency of ACOR rather than other FS methods because ACOR is based on swarm intelligence that is more intelligent than other methods.3 for all of the methods.It should be noted that accuracy assessment was done GT data.Time processing in the proposed method was relatively low in comparison with pixel-based methods.The reason of low time processing in the proposed method was the unit of image analysis (image object) because image objects reduced the computational complexity rather than pixels.While pixel was a unique unit in pixel-based techniques and this resulted in high time processing.The Object-SVM-ACOR provided more accurate results than pixel-based techniques.These showed the superiority of the proposed method rather than pixel-based techniques in terms of accuracy and time processing.
The classified images obtained from different methods are shown in Figure 9.A look at Figure 9 reveals that the proposed method had better performance than pixel-based classification in building detection and didn't result in salt-paper results that produced in RF classification.According to figure 9, some over segments of shadow on building roof were assigned to building class in the proposed method while these over segments were not classified in building class in pixel-based classification.This showed the limitation and weakness of pixel-based techniques.As it can be seen, some of the segments and also some pixels were misclassified into building class while they must be classified into non-building class (vegetation class).This was due to unavailability of Near Infrared band (NIR) in order to separate building class from non-building class (vegetation class).To address this problem, NIR band or spectral features based on this band (e.g.normalized vegetation index, etc.) or structural features or features extracted from LiDAR data (e.g.laplacian, slope) can be used.Due to applying the proposed method on the over segments and relatively high automation level, it was not possible to use structural features for each segment.It is suggested to apply optimization method in order to determine the segmentation parameter, then it will be possible to extract structural features in the classification process.In Figure 9, the buildings edges were identified clearer in Object-SVM-ACOR than RF.This resulted in post processing reduction for building edge reconstruction.In nonlinear problems, it is essential to determine parameters of SVM because these parameters directly affect the results (MS).According to the limited number of spectral bands in HSR images, digital aerial images cannot provide information about classes with similar spectral behavior.Spectral and textural features or other data sources can be used to solve the limitations.However, adding some additional features will lead to increase the probability of dependent feature that may reduce the accuracy.Hybrid optimization of object-based SVM classification in MS of SVM and FS cause to apply ant colony optimization algorithm in order to solve the existence limitations.The advantages of the proposed method are relatively low time processing because of unit of image analysis, relatively high automation level, independency of image scene and type, post processing reduction for building edge reconstruction, hybrid optimization and accuracy improvement in comparison with pixel-based classification.The results showed that hybrid optimization of object-based SVM classification improved overall accuracy by 17% rather than object-based SVM by all of the input features.This showed the effect of hybrid optimization in classification performance and the presence probability of dependent features.Moreover, in comparison with hybrid optimization of pixelbased SVM classification, the overall accuracy and kappa coefficient were improved by 10% and 24% in the proposed method, respectively.This showed the superiority of optimized object-based SVM classification rather than optimized pixelbased SVM classification.Due to using over segments to satisfy automation level, there was no need to set fine values of the segmentation parameters that match reality.This resulted in lacking of structural features.For this reason, misclassification of some segments (misclassification of vegetation class into building class) were occurred.Therefore, it is recommended to propose a method to optimize the segmentation parameters, so structural features can be extracted for each segment in accordance with reality.Hence, the features can be used in classification that results in better separating between classes.
Figure 1.The study area: a. Digital Aerial image, b.DSM

Figure 2 .
Figure 2. Map data from nonlinear space into higher dimensional space in SVM Random Forest Classification RF classification was proposed by Breiman in 2001.

Figure 4 .
Figure 4.The flowchart of the proposed method Figure 5.The result of a. ATINM, b. nDSM

Figure
Figure 6.The equalized image by HE method

Figure 8 .
Figure 8. Chart of accuracy criteria in different methods

Table 2 .
Table 2 implies the characteristic these features.The characteristic of input features