A SATELLITE IMAGE CLASSIFICATION APPROACH BY USING ONE DIMENSIONAL DISCRIMINANT ANALYSIS

The classification problem in the image processing field is an important challenge, so that in the process image pixels are separated into previously determined classes according to their features. This process provides a meaningful knowledge about an area thanks to the satellite images. Satellite images are digital images obtained from a satellite vehicle by the way scanning the interest areas with some specified sensors. These sensors provide the specific radiometric and spatial information about the surface of the object. This information allows the researchers to obtain reliable classification results to be used to solve some real life problems such as object extraction, mapping, recognition, navigation and disaster management. Linear Discriminant Analysis (LDA) is a supervised method that reduces the dimensions of data in respect to the maximum discrimination of the elements of the data. This method also transfers the data to a new coordinate space in which the discriminant features of the classes are highest using the objection data provided manually. In this work, we consider the classes as if the satellite images have two classes; one is foreground and the other is background. The true classes such as roofs, roads, buildings, spaces and trees are treated sequentially as the foreground. The area outside the foreground class is treated as the background. The one dimensional reduced feature values of pixels, such that each value is reduced according to the binary classification of each class, are considered as membership values to the classes. In this way, each pixel has membership values for each of the classes. Finally, the pixels are classified according to the membership values. We used the ISPRS WG III/4 2D Semantic Labeling Benchmark (Vaihingen) images includes the ground truths and give the accuracy result values for each class. * Corresponding author


INTRODUCTION
Classification of image pixels which are the smallest elements of images is an important problem in the satellite image processing field as in most computer vision fields, because obtaining meaningful information from satellite images is essential process for many remote sensing applications.In the process of pixel classification, the pixels are assigned to previously defined classes (Lu and Weng, 2007;Vailaya et al., 1998).In image classification methods, the pixels are classified according to the some features of the pixels such as colour (radiometric), geometric, and pattern features (Wang et al., 2016).
Appointing the satellite image pixels to land-cover classes is also defined as "semantic segmentation" by the remote sensing community ('2D Semantic Labeling Contest', n.d.;Marmanis et al., 2018;Volpi and Tuia, 2017).Semantic segmentation has been an essential issue in satellite image processing, because this issue provides solutions for many ecological and socioeconomic problems (Vailaya et al., 1998) such as landslide monitoring, inferring geographical information, guidance information for intelligent military systems, infrastructure design, and disaster management (Montoya, 2003;Paisitkriangkrai et al., 2015).
In remote sensing semantic segmentation, the pixels in the satellite images taken over an urban are generally labelled as road, building, tree, and vegetation.This task is a problem due to the fact that the pixels belonging to different classes may be similar to each other, and in the same way, some pixels belonging to the same class may be different from each other (Paisitkriangkrai et al., 2015).The main reasons of this case lie in that one class may contain too many objects, and there may be many redundant objects in the classes (Wang et al., 2017).Therefore, it is deduced from this case that if the pixels belonging to different classes have more distinguishing features, the classification would be better.
In the literature, there are some feature extraction methods that transform the data to another feature space providing more distinctive features for the data belonging to different classes.Linear Discriminant Analysis (LDA) is one of them (Fisher, 1936).LDA is a supervised method that extracts more distinctive features from available features.In image processing, the pixels belonging to two different classes can be distinguished more accurate using this method.Saglam and Baykan (2017) used LDA to define the roof pixels in the satellite images taken over from an urban (Saglam and Baykan, 2017).According to the study, the urban images are considered they consist of two classes; the roofs are handled as foreground, and the rest of the zone as background.They used only spectral (radiometric) features, not spatial or surface features.LDA transforms the multiple radiometric values to one distinctive feature for two classes.
In this work, LDA is used for acquiring new multiple distinctive features for multi class discrimination from available pixel features including radiometric and surface values.According to the method presented in this paper, LDA acts each class as foreground separately in the satellite images which consist of multiple classes such as roads, buildings, vegetation and trees.In this way, it extracts a new feature for each class, thence more than one feature (multiple LDA values) are obtained.Computing the LDA processing for each class, single dimensional data spaces are generated as much as the number of classes.Each LDA value represents a point in a single dimensional space.Finally, each data is assigned to a class in respect to the LDA values.In this work, we also developed a membership function which generates new membership values for each data using their LDA values to assign the elements of the data to classes.
In experiments, we used the Vaihingen dataset ('2D Semantic Labeling Contest', n.d.;Labeling and Vaihingen, 2016) to evaluate our method.The success of our method is demonstrated in this paper.In the implementation of the method, only the spectral features of the image pixels are taken into account, not spatial features.The results numerically and visually show that our method performs a respectable classification considering only the use of spectral properties of images and the simplicity of the method.

Linear Discriminant Analysis (LDA)
LDA, also known Fisher's Linear Discriminant Analysis, is a feature extraction and dimension reduction method (Fisher, 1936).LDA enhances the inter-class variance while reducing the intra-class variance.This operation generates new attributes for the data, such that, these attributes provide the highest distinctive features for the data to be classified (Duda et al., 2000;Martis et al., 2013).LDA is a supervised method, because it needs a training data set which has to have goal ground-truth data.
According to the method, coefficient vector is generated using the ground-truth data, such that, the vector must have a number of values equals to the number of feature of the data (Lu and Weng, 2007).To obtain this vector, firstly, the intra-class covariance matrices are computed for each class, after that, the covariance matrices are also added together (1).In Eq. ( 1), W S denotes the addition of the intra-covariance matrices of the classes.

( )( )
where m = the number of classes  = mean vector of the feature vectors of all the data The next step is figuring out the maximization process which seems in Eq. ( 3).According to Eq. 3, the vector w which makes j(w) at highest.

()
In our method, LDA is calculated for each class separately as foreground.The other class in each LDA calculation is the class of background, which refers to the regions except the foreground.In the other words, we employee two classes as background and foreground for each LDA calculation, such that, a LDA calculation is needed one by one for all of the classes, for example four classes of foreground ("road", "building", "vegetation" and "tree" as the name of the classes) as taken into account in this study.If two classes are taken into account, Eq. ( 3) is in the direction of 12 ()  μμ linearly (Duda et al., 2000).For this reason, the calculation of the vector w can be directly calculated as in Eq. ( 4) instead of maximizing the equation in Eq. ( 3).
After the weight vector w is obtained, each value in data x i is multiplied by w as in Eq. ( 5).

LDA T ii
x  wx (5) As a result, each element in the data has one value LDA x i instead of multiple vector values.In this way, a dimension reduction process is also performed.However, in this study, each data has multiple values, because the LDA calculation is performed for each class as foreground.Thence, a weight set of w is obtained as 12 {w ,w , ,w } m .
In the method used in this paper, it must be known that which class (foreground or background) represents minimum or maximum LDA values.Therefore, Eq. ( 4) must be used for the proposed method instead of Eq. ( 3).In Eq. ( 4), if 1 μ is the mean of the foreground, the elements which have highest LDA values belong to foreground, otherwise they belong to background.This case gives information about the direction of distribution between two classes (foreground and background).

Threshold value calculation for two-class labelling
As a result of the calculations of LDAs for each class, each data element has as a number of feature values equals to the number of classes.Each feature value obtained with LDA represents a point in a different single dimensional space, because each LDA value of an element obtained with the binary discrimination of the related class as foreground ("road", "building", "vegetation" or "tree" in this study) and remaining data as background.In each space, the distribution of all the data labelled as foreground or background is located as one of the four classes is foreground and the others are background.Namely, there would be four single dimensional spaces if four classes are handled as ), meanly if 2 μ refers to the mean foreground in Eq. ( 4), the data belonging to the foreground would have smaller values in the space.Thus, separating the data into two classes in the space can be actualized as Eq. ( 6) and Eq. ( 7) using a threshold value t .
1 label if( =mean foreground) in Eq. ( 4) In the equations ( 6) and ( 7), label i x is the label value which class i th data is assigned to, for example "1" for background and "2" for foreground.If 1 μ in Eq. 4 refers the mean foreground, highest LDA values belong to the foreground.If 2 μ refers the mean foreground, smallest LDA values belong to the foreground.The value k refers the single space number, which represents the distribution of a real class, in which the data is separated into two classes as "foreground" and "background" for k th real class such as "road", "building", "vegetation", or tree".
In this case, the problem of specifying the threshold values for the spaces emerges.In this study, we first normalized the LDA values to integers in the range (0 to 255 in this study).After that, we test all of the integers in the range for the best classification success on the training data.The value which gives the best score is selected as threshold for that single dimensional space.For the score measurement for every integer, we use the evaluation of F1-score (8), which is also used for the success of the final classification results in this paper.In the equations ( 9) and ( 10), tp defines the size of the true detected data for a class (true positives), fp defines the size of the false pair of the detected data for the class (false positives), and fn defines the size of the false pair of the outside the detected data for the class (false negatives).In the other words, it can be said that tp fp  is the size of the detected data for a class, and tp fn  is the size of the true data for the class (Wang et al., 2017).
The process of obtaining the threshold t is performed for each class separately.In a result, a threshold set 12 { , , , } m t t t is obtained.After obtaining the LDA values and the threshold value for each class, the data can be classified into two classes as foreground and background according to be separated class, using the related threshold values and the LDA values.But, the goal of the study is to separate all classes from each other in the data.In this paper, we classified all the data simultaneously using the LDA values and the threshold values.

Min-Max selection function:
Using only the obtained LDA values, firstly normalizing each in its related class in a range (e.g.0 to 1), the data elements can be assigned to classes by the way selecting the index of the smallest LDA value (11) if 1 =mean foreground μ in Eq. ( 4) or the highest value ( 12) if 2 =mean foreground μ in Eq. ( 4) for each data.This way is called as "Min-Max function" in this paper.Each value in a LDA vector of an element was obtained for a different class (for one of the four classes in this study) in the LDA calculation in previous steps.In other words, for the LDA value

Max-membership selection function:
The other way of selecting the class, is to applying a membership function.In this study, we assign membership values to pixels for each class.A membership value is the degree of an element in the data to assigning a class; such that it denotes the proximity of an element to the related class.A membership function transforms the current values to membership values.The membership function denoted for the class "road" (as an example) in the equations ( 13) and ( 14) is improved in the scope of this work.Eq. ( 13) defines the membership value mem ,"road" i x for the class "road" of i th element of the data if the foreground was considered as 1 c in the LDA calculation in previous steps ( 1 =mean foreground μ in Eq. ( 4)), and Eq. ( 14) if the foreground was considered as 2 c ( 2 =mean foreground μ in Eq. ( 4)).In the equations, "road" t refers the threshold value for the class "road" determined in previous steps.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W4, 2018 GeoInformation For Disaster Management (Gi4DM), 18-21 March 2018, Istanbul, Turkey 1 LDA ,"road" "road" LDA ,"road" "road" "road" mem ,"road" LDA LDA ,"road" "road" ,"road" "road" "road" if( =mean foreground) in Eq. ( 4) ,"road" LDA ,"road" "road" "road" mem ,"road" LDA LDA "road" ,"road" ,"road" "road" "road" if( =mean foreground) in Eq. ( 4) The membership function is operated for each class using the related LDA value.In conclusion of the membership calculation process, each value has membership values instead of the LDA values.Each membership value is in the range [-1,1].The higher the membership value is the greater the membership degree for the related class.Finally, the index of the highest membership value of an element is selected, and the element is assigned to the class labelling with this index (15).This way is called as "Max-membership function" in this paper label ,"road" "build.""veg.""tree" =ind(max( , , , ))

EXPERIMENTAL RESULTS
We applied the developed method on the Vaihingen dataset provided by the ISPRS Commission III ('2D Semantic Labeling Contest', n.d.;Karakoyun et al., 2017;Labeling and Vaihingen, 2016).The dataset includes 33 satellite images; the size of each is about 2500×200 pixels.The ground-truth of 16 of them is shared for training and validation to researcher, and that of 17 of them is saved for centrally benchmarking by the commission.
In this paper, we used the 16 images whose ground truths are shared.We used 12 of them for obtaining the weight vector set We used 4 of them (area5, area7, area23, and area30) (Wang et al., 2017) as validation data to present the accuracy results of the method.
The images in the dataset consist of NIR (Near Infrared), R (Red), and G (Green) channels.The DSMs (Digital Surface Maps) and nDSMs (Normalized DSMs) are also provided by the commission.In Fig. 1, the image "area30" in the data set and its nDSM projection is showed as an example.The MATLAB program is used in this study for the method and image presentations.We use the values of NIR-R-G_DSM-nDSM as the feature vectors of the data.
The ground truth includes 6 class; those are "road", "building", "vegetation", "tree", "car", and "clutter".The classes "car" and "clutter" occupy a small area in the dataset; so we ignored these classes because of similarity of their radiometric features to the other large classes and misdirecting the general classification.
At first, we calculated the LDA values and obtained the threshold values for each class on the training data.For computing the LDA values and determining the threshold values, the binary ground-truths are needed for each class.For example, for the class "road", the binary ground-truth of the class is set as "road" as foreground and "not road" as background.But, in the data set, the truths of all the classes are in the same data together.Therefore, we firstly extracted the truth class separately for LDA calculation and threshold specifying.The threshold values obtained and their F1-scores for the related classes are given in the Table 1.In Fig. 2 values, and "Max-membership function" uses the membership values to assign the pixels to the classes.In the Table 2, the two labelling methods are compared on the validation data.In Fig. 3, the classification results of the validation images are also presented visually.

CONCLUSION
In this paper, the classifications of some satellite images obtained from a benchmark dataset are intended using one dimensional LDAs.For this purpose, the binary distributions of LDA is used, and applied the binary classification for each class as if they are foreground for obtaining a threshold value for each.Two labelling functions which assign the pixels to the classes is presented; one uses the LDA values and the other uses the membership values generated from the LDA values and the threshold values.In result, the two methods are compared to each other.The results show that the function that uses membership values is better than the other.Max-membership method is practical to put into practice for supervising classification for large size images.We offer a new approach in this paper using the spectral features of images, not using the spatial features.For further studies, this method can be developed incorporating the spatial features.

S
is obtained, the matrix of the inter-class variances B S is obtained as in Eq. (2). 1

Figure 1 .
Figure 1.The sample image "area30" in the dataset (a) and the projection of its normalized DSM (b) The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W4, 2018GeoInformation For Disaster Management (Gi4DM), 18-21 March 2018, Istanbul, Turkey foreground one by one.Therefore, if a threshold can be determined in a single dimensional space, the data would be separated into two classes.In the LDA calculations, if the foreground is specified as 2th class ( 2 c foreground 

Table 2 .
F1-score validation results of the methods with the functions of "Min-Max" and "Max-membership"