PERFORMANCE EVALUATION OF ELM WITH A-OPTIMIZED DESIGN REGULARIZATION FOR REMOTE SENSING IMAGERY CLASSIFICATION

The automatic classification technology of remote sensing images is the key technology to extract the rich geo-information in remote sensing images and to monitor the dynamic changes of land use and ecological environment. Remote sensing images have the characteristics of large amount of information and many dimensions. Therefore, how to classify and extract the information in remote sensing images has become a crucial issue in the field of remote sensing science. With the development of neural network theory, many scholars have carried out research on the application of neural network models in remote sensing image classification. However, there are still some problems to be solved in artificial neural network methods. In this study, considering the problem of large-scale land classification for medium resolution and multi-spectral remote sensing imagery, an improved machine learning algorithm based on extreme learning machine for remote sensing classification has been developed via regularization theory. The improved algorithm is more suitable for the application of post-classification change monitoring of features in large scale imaging. In this study, our main job is to evaluate the performance of ELM with A-optimal design regularization (here we call it simply as A-optimal RELM). So the accuracy and efficiency of A-optimal RELM algorithm for remote sensing imagery classification, as well as the algorithms of support vector machine (SVM) and original ELM are compared in the experiments. The experimental results show that A-optimal RELM performs the best on all three different images with overall accuracy of 97.27% and 95.03% respectively. Besides, the A-optimal RELM performs better on the details of distinguish similar and confusing terrestrial object pixels.


INTRODUCTION
The remote sensing technology has been applied in many fields such as environment or urban monitoring. Multi-spectral remote sensing imagery has an enormous amount information of many types of objects on the earth. The information extraction from RS imagery is the basic requirement for the application and proanalysis. The automatic classification technology of remote sensing images is the key technology to extract the rich geo-information in remote sensing images and to monitor the dynamic changes of land use and ecological environment. Therefore, how to make full use of image information to identify and classify surface features is an eternal theme in the field of remote sensing technology efficiently and accurately. For the application of large-scale land cover change detection based on post-classification, traditional machine learning method is more appropriate. Over the years, with the development of machine learning theory and technology, a variety of neural network methods have been proposed and utilized to solve image classification problems, including Radial Basis Function Neural Network (RBFNN), Multi-layer Perception (MLP) (Iversen et al., 2005), Self-organization Map Networks (SOM) (Giacco et al., 2010), Wavelet Neural Networks (WNN) (Angrisani et al., 2001) and other neural network classification methods. Various optimized algorithms have been developed based on those neural network methods to solve classification cases. Wavelet transform and WNN methods was used to detect and classify ephemeral signals simultaneously and automatically, using wavelet nodes to replace nodes in the first layer of the network. The experimental results are like those obtained by existing methods. MLP classifiers and autocorrelative neural networks (AANN) were combined to improve the classification discrimination between feature space overlapping data effectively. Gao et al. combine SMOTE algorithm and Particle Swarm Optimization (PSO) to optimize the RBF model, and propose a classification method that can effectively handle the bicategorical imbalance problem (Gao et al., 2011). Feng et al. used dynamic BP algorithm to train the MLP learning model for mail classification study and showed a significant improvement in learning efficiency and classification accuracy compared to the traditional MLP model (Feng and Daqi, 2013). These methods such as neuron network algorithms have gone through a process ranging from simple to complex, from specific to extensive and from single method to multi-combined. However, there are still some problems to be solved in artificial neural network. For example, when dealing with a large amount of data in high-dimensional feature space, it is easy to fall into local minimum for nonlinear optimization problems, training efficiency is not high enough, a lot of parameters to be set manually, and the activation function should be differentiable. Especially when processing large amounts of data in highdimensional feature space, the speed of ANN training is severely affected if the traditional methods of feedforward propagation and backward propagation are used to estimate the power array by random gradient descent (Abuelgasim et al., 1996). In order to train artificial neural networks, it often takes days or more. In addition, gradient descent-based algorithms converge easily to local minimum questions, resulting in low prediction accuracy relatively. Simultaneous iterative adjustment of parameters leads to dependencies between parameters. The algorithm also requires many iterative steps in order to obtain better generalization performance. In 2004, Huang G.B proposed the method of extreme learning machine (ELM) (Huang et al., 2004). The improved algorithm based on the ELM method also performs well on large-scale remote sensing classification and can also be applied into the following change monitoring experiment and analysis (Lin et al., 2018). In this study, an optimal algorithm with A-optimized design regularization ELM is proposed and applied to real remote sensing classification experiments.

Method
The extreme learning machine (ELM) (Huang et al., 2004) is a machine learning method of single-hidden layer of feed-forward networks (SLFNs), which assigns the input weights and number of hidden layers randomly. A simple learning method for SLFNs called extreme learning machine (ELM) can be summarized as follows: Algorithm ELM. Given a training set = {( , )| ∈ ， ∈ , = 1, … , }, and activation function ( ), and the number of hidden nodes L Step 1. Randomly assign input weight wi and bias bi, i=1,…, L.
Step 2. Calculate the hidden layer output matrix H, which can be expressed as Step 3. Calculate the output weight : where H is feature mapping matrix, or hidden layer matrix of the neural network. Y is output vector or matrix. And † is the Moore-Penrose generalized inverse of matrix H( Huang et al., 2011), which is generated by † = ( T ) −1 T In practical application cases, there is still an important unsolved problem of ELM, which is how to obtain the most appropriate architecture of SLFNs. And as for the original ELM, there are some problems such as overfitting, non-optimal and reasonableness problem. The general method to improve the generalization ability and avoid over-fitting is regularization. It is important to select a proper regularization parameter for ELM. So, it is important to select a proper regularization parameter for the ELM. However, there is no general method to choose a proper or an optimal regularization parameter so far. Deng et al. have proposed a numerical heuristic method based on cross validation to determine the regularization parameter (Deng et al., 2009). The output function of ELM classifier is improved by the regularization parameter λ: In such equation, we have the output Y as a N × m matrix, when there are N distinct training samples and m output neurons. the output weight is thus a L × m matrix, where L represents the number of hidden neurons.
In order to acquire an optimal regularization parameter, we apply A-optimal design regularization, proposed by Cai (2004), to determinate the regularization parameter. To improve generalization performance and stability of ELM, in some applications regularization is introduced to penalize the weight matrix ̂. But how to obtain appropriate regularization parameters is still a crucial issue worth discussed. In order to acquire an optimal regularization parameter, an A-optimal design regularization is introduced, which is realized by calculating the minimum trace of mean square error (MSE) of ̂ in this study. The A-optimal regularization is applied to determine the parameter. The regularization parameter λ follows by A-optimal designed regularization in the sense of minimizing the trace of MSE matrix:

Experiment
To examine the practicability, two accuracy comparison experiments were conducted by real Landsat remote sensing imagery of different region. For the accuracy comparison experiment, two Landsat 8 OLI images of different terrain features and region area are utilized. The research area of the two experiments are shown in Figure 1, which are Wuhan East Lake in middle part of China, and Hamburg City in north Germany. In the research area of Wuhan East Lake, the land scape mainly consists of buildings, bare land, forest land, paddy and lakes of different shapes and size. In the study area of Hamburg, the land cover types are divided into bare land, building, paddy, forest, cement land and water. The pre-processing includes radiometric calibration, atmosphere correction and feature space construction. The A-optimal RELM method proposed in this study is compared with other two methods: support vector machine (SVM) and standard ELM method.
In the pre-processing steps, the radiometric calibration and atmosphere correction were processed on all RS images in ENVI 5.1 to remove the influence of atmosphere radiomutation and to achieve real surface reflectance. For the feature space construction, seven spectral band was directly used as feature space in Wuhan imagery, while in the Hamburg imagery the principle component analysis (PCA) band and different index bands are adopted to build the feature space. For the sample dataset construction, the Google Earth high-resolution imagery is taken as a reference to pick up the samples. And for the input data in the experiment, the training and testing samples are generated from the dataset randomly. The main steps of this study are shown in Figure 2.

RESULTS
To examine the accuracy of the A-optimal RELM method, a comparison experiment with other two methods: SVM and original ELM was generated by two Landsat8 OLI images of Wuhan East Lake and Hamburg. The classification results of Wuhan East Lake and Hamburg Landsat images are presented in Figure 3 and Figure 4 separately. To demonstrate the difference more directly, the confusion matrix of SVM and A-ELM is also calculated in this experiment. The kappa coefficient calculated by the following equations: And N represents the number of sample pixels in each dataset.

Classification comparison of Wuhan East Lake
In the classification result of standard ELM, most paddy pixels are misclassified into forest. And the boundary between watershed and land, especially in area with complex terrain, is not clear enough. The accuracy statistics of the three methods are shown in Table2. From the precision statistics table of Wuhan image, the A-optimal RELM method reaches the best result with the overall accuracy of 97.27% and the kappa coefficient of 0.9504, meanwhile the standard ELM has the lowest overall accuracy of 84.11%. It is obvious that most pixels of paddy are misclassified as forest by standard ELM in Figure 3 (b). The confusion matrix of the three methods are demonstrated in Table  3-5 to give a direct comparison. In the confusion matrix of SVM, it is obvious that a large amount of water pixels is misclassified to building because of the similar reflection feature. And through the result of A-optimal RELM, we can see that the phenomenon of misclassifying has been corrected effectively.

Accuracy Comparison
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2020, 2020 XXIV ISPRS Congress (2020 edition)   Table 5. Confusion matrix of A-optimal RELM on Wuhan image experiment

Classification comparison of Hamburg
From the precision statistics of Hamburg image in Table 6, the A-optimal RELM method also performs best with the overall accuracy of 95.03% and the kappa coefficient of 0.9399. And SVM has the lowest overall accuracy of 90.51%. The confusion matrix of the three methods are demonstrated in Table 7-9. In the confusion matrix of SVM, misclassification occurs in several different landcover types. For example, part of bare land pixels is misclassified to building. The pixels of cement floor and building also misclassified to each other because of the similar spectral feature. And in the result of A-optimal RELM, the number of misclassification pixels is significantly reduced. Especially, the number of misclassification pixels from bare land to building has been reduced from 114 to 19. In the classification results in Hamburg, it is obvious that a fraction of water was misclassified to cement land by ELM in Figure 5. And the classification results of other land cover pixels are similar in Hamburg imagery.   Bareland 531  3  1  0  0  0  Building  19  581  1  0  43  11  Forest  1  0  419  11  6  0  Paddy  1  0  0  324  0  0  Cement  3  14  11  15  456  1  Water  0  0  2  0  1  440   Table 9. Confusion matrix of A-optimal RELM on Hamburg image experiment The experimental results show that A-optimal RELM performs the best on two different images with overall accuracy of 97.27% and 95.03% respectively. Overall, compared with SVM and ELM, A-optimal RELM can reach the highest precision and it is also appropriate for multi-spectral RS imagery classification. Besides, the A-optimal RELM performs better on the details of distinguish similar and confusing terrestrial object pixels. It indicates that based on the classification experiment from the high-precision Aoptimal RELM algorithm, convincing followed-up analysis can also be carried out through the accurate classification results.

CONCLUSIONS
In this research, the results of the two comparison experiments demonstrate that the A-optimal ELM has a higher classification accuracy than SVM and original ELM methods. And the classification results on different Landsat remote sensing images have shown a better stability.
In conclusion, the A-optimal ELM is more stable and effective than a standard ELM. On the other hand, the experimental results demonstrate that the A-ELM has a higher classification accuracy than the other two classification methods. Additionally, based on the classification results, a simple spatial-temporal analysis about the LUCC in the Chaohu Lake basin is carried out. It can directly show the change status in the region spatially and temporally.
Overall, compared with SVM and ELM, A-optimal RELM can reach the highest precision and it is appropriate for multi-spectral RS imagery classification. Besides, the A-optimal RELM performs better on the details of distinguish similar and confusing terrestrial object pixels. It indicates that utilizing the highprecision A-optimal RELM algorithm, convincing followed-up analysis can also be carried out based on the accurate classification results. For the future work, performance on other remote sensing imagery with different feature space construction need to be conducted: high-resolution images such as GF satellite, World View-2 or Quickbird, etc.