OBJECT-BASED RANDOM FOREST CLASSIFICATION OF LAND COVER FROM REMOTELY SENSED IMAGERY FOR INDUSTRIAL AND MINING RECLAMATION

The RF method based on grid-search parameter optimization could achieve a classification accuracy of 88.16% in the classification of images with multiple feature variables. This classification accuracy was higher than that of SVM and ANN under the same feature variables. In terms of efficiency, the RF classification method performs better than SVM and ANN, it is more capable of handling multidimensional feature variables. The RF method combined with object-based analysis approach could highlight the classification accuracy further. The multiresolution segmentation approach on the basis of ESP scale parameter optimization was used for obtaining six scales to execute image segmentation, when the segmentation scale was 49, the classification accuracy reached the highest value of 89.58%. The classification accuracy of object-based RF classification was 1.42% higher than that of pixel-based classification (88.16%), and the classification accuracy was further improved. Therefore, the RF classification method combined with object-based analysis approach could achieve relatively high accuracy in the classification and extraction of land use information for industrial and mining reclamation areas. Moreover, the interpretation of remotely sensed imagery using the proposed method could provide technical support and theoretical reference for remotely sensed monitoring land reclamation.


INTRODUCTION 1
China has a rather large mining industry.The exploitation of mineral resources has resulted in great contributions to China's social and economic development, yet it has also caused seriously negative impacts on land resources and the ecological environment in some regions.Therefore, land reclamation and ecological reconstruction have become important measures to coordinate the development of mineral resources and the protection of land resources, as well as to promote the construction of an ecological civilization.The recognition of the extraction and classification of land cover and use information in reclamation areas using remote sensing technology has become one of the important approaches in the verification and assessment of the effectiveness of land reclamation.This is also critical to continuously track the management and maintenance of the reclaimed land in the later stages.
With a single pixel as the unit of analysis, traditional pixel-based remotely sensed imagery classification algorithms could not take into account the spatial relationship among neighboring pixels to a certain extent.This isolates the recognition results and increases the likelihood of the salt-and-pepper phenomenon.However, object-oriented classification makes up for the deficiencies of traditional pixel-based classification methods.Among the current remotely sensed imagery classification methods based on machine learning, the random forest (RF) classification method features relatively high accuracy, fastness in processing large data sets, insusceptibility to overfitting, strong ability to process multidimensional variables and the capability of estimating the importance of variables.Therefore, this method is widely used in multidimensional data classification and regression, and has achieved relatively good results (Yue Ma et al, 2016).
In several countries, early studies have been carried out on the application of the RF classification method to remotely sensed imagery classification.For example, in a study by Pal. M and Pall Oskar Gislason et al., the method was used to classify land cover, and was compared with methods such as iterative, support vector machine (SVM) and decision tree, in terms of accuracy and efficiency to verify the superiority of the RF classification method (M.Pal., 2005;Gislason, P.O., 2006).In recent years, the RF classification method has also been applied to the classification of remotely sensed imagery in domestic studies.For instance, in studies conducted by Ma Yue and Guo Yubao, this method was used in the classification and extraction of land use information in both farming and urban areas, and achieving comparatively high accuracy in each case (Yubao Guo, 2016).However, current studies and application of the RF classification method in China are insufficiently comprehensive.For example, the RF classification method itself has certain defects.It is excessively encapsulated, the operation process is uncontrollable, and the model can only be optimized through parameter adjustment.Moreover, most studies focus on medium-resolution images and plain areas, while few studies focus on high-resolution images and mountainous areas.
Industrial and mining reclamation areas are mostly located in mountainous and hilly regions where the terrain is undulating, the distribution of surface features is fragmented, the project area is usually small and the layout is often scattered.The accuracy of extracting information on surface features using medium-resolution images cannot meet the demand for land reclamation management.Therefore, it is necessary to carry out studies on land use classification with high-resolution images in industrial and mining reclamation areas.Besides, the object-oriented RF classification method can make up for the shortcomings of the pixel-based classification method and help improve the classification accuracy.
In this study, the object-oriented RF classification method was applied to classify land use information in industrial and mining reclamation areas using high-resolution images.Compared with support vector machine (SVM) and artificial neural network (ANN), the object-oriented RF classification method was better in terms of performance and applicability in land use classification of industrial and mining reclamation areas, which provides theoretical reference and technical support for monitoring of land reclamation.

OVERVIEW OF THE STUDY AREA
The study area is located in Shiping Village, Gulin County, Luzhou City, Sichuan Province that geographic coordinates is 28°0′55′′~28°3′26′′N, 105°59′32′′~106°2′13′′E.With an altitude of 410~1025 m, this area is located within the middle subtropical zone, featuring an average annual temperature of 17.1~18.5℃and an average precipitation of 748.4~1184.2mm.Since there are a number of sulfur plants and factories in this study area, there are piles of waste sulphur that cause pollution to the surrounding land (Yufang Zhang et al, 2014).According to Current Land Use Classification (GB/T 2010(GB/T -2017)), the types of land use in the study area include arbor forest, shrubbery land, farmland, industrial and mining land, rural housing land, transportation land (highways and country roads) and river ponds.Figure 1 shows the geographical location, distribution of sampling points, and remotely sensed imagery data of the study area.

DATA COLLECTION AND PREPROCESSING
The data used in this study was the GF-1 satellite remotely sensed imagery.The auxiliary data included UAV aerial imagery, DEM data, ground measurement data and Google Earth data, among which, the GF-1 satellite remotely sensed imagery was used for the classification and extraction of land use information in reclamation areas.DEM data was used as the auxiliary data to extract slope and aspect information required for image classification and the improvement of image classification accuracy.The UAV aerial imagery and Google Earth data were used for sample collection and accuracy evaluation.

GF-1 Satellite Remotely Sensed Imagery
Launched in 2013, the GF-1 satellite carried two 2m-resolution panchromatic/8m-resolution multispectral cameras (PMS) and four 16m-resolution wide-field-of-view cameras (WFV) (Limin Wang et al, 2015).The image pair collected by the 2m PAN/8m PMS of GF-1was selected as the remote sensing data used in this study.There were 5 bands, namely B, G, R, NIR and PAN.The date was collected on October 9, 2016 when there was no cloud cover.Image preprocessing was performed using the ENVI5.3software platform.Preprocessing of multispectral data included radiometric calibration, FLAASH module atmospheric correction and orthorectification, after which radiometric calibration and orthorectification of the panchromatic data were conducted.Fusion of panchromatic and multispectral data were then performed using the Gram-Schmidt method.Finally, the imagery was tailored to generate the image data of the study area (Yuqiu Jia, 2015).

Auxiliary Data
In the case of the auxiliary data, the aerial images were taken on November 2016 with a Pentax-645D camera mounted on a UV-II UAV, with a spatial resolution of 0.2 m.After distortion correction and free network aerial triangulation encryption of the aerial images, the DEM data had been produced with a spatial resolution of 2 m.

Ground Measurement and Sample Data
The ground measurement data was synchronously collected as the UAV took the aerial images.Tianbao hand-held GPS was used as the ground measurement device.It has a horizontal accuracy of greater than 1m (Trimble geoexplorer 2008 Series GeoXH, trimble navigation limited, USA).In addition, Google Earth images were also used to assist in the selection of training and test samples.Specifically, the training samples had 48,279 pixels, (20%), and the test samples had 209,691 pixels (80%).

Technical Methods
The workflow of the technical methods used in this study is as follows: (1) Preprocess and sharpen the panchromatic and multispectral images, register and resize aerial images, DEM images and satellite images, and resample aerial images with a spatial resolution of 1 m using the Nearest Neighbor method to increase the image processing speed; ( 2  The calculated characteristic variables of the images included the normalized difference vegetation index (NDVI) which is suitable for extracting vegetation information.In also included the biophysical composition index (BCI) which is suitable for extracting impervious surfaces (Hanqiu Xu et al, 2016).These are calculated based on spectral information, the slope, aspect and curvature calculated based on topographic data, the mean, variance, homogeneity, entropy and second moment calculated based on texture information, and Local Moran's I and Local Getis Ord Gi (Yu Zhao et al, 2016) reflecting spatial information.

Calculation and Selection of Feature Variables
The images were acquired in October, when the vegetation coverage of some dry lands is relatively low, and their visual characteristics are quite similar to those of industrial and mining land.Since most spectral indices are designed to highlight only one land cover, and confusion between other land cover types, in particular impervious surfaces and bare soil, has not been successfully addressed.(Hanqiu Xu et al, 2010).Therefore, the BCI index was constructed to enhance the ability of the classification algorithm to identify areas with low vegetation coverage and soil as well as industrial and mining areas.The calculation of BCI is as follows (Horne, J. H., 2003) (5) min max min 3 3 3 3 Where B, G, R and NIR denote the blue, green, red and near-infrared band, respectively, TC1, TC2 and TC3 are the first three components after the tasseled cap transformation.H, V and L are the normalized TC1, TC2 and TC3.

RF Classification Method
Given that a machine learning algorithm is composed of a combination of decision trees, the RF classification method runs relatively quickly and is suitable for processing high-dimensional data (Zhen Lei, 2012).Its implementation process is as follows: Firstly, a random Bootstrap method was used to extract N training sets from the original data by means of sampling with replacement.Such a process is called Bagging.Secondly, N decision trees were constructed using the N training sets.During the growth of each tree, m (m ≤ M) feature variables were randomly selected from the M feature variables for internal node partitioning.In the end, the types of new samples were determined by voting based on the prediction results of the N decision trees.In the process of training data extraction, about 1/3 of the data was not extracted.Known as the out-of-bag (OOB) data, such data can be used for error assessment of category misclassification and variable importance estimation.The Gini coefficient was used in the variable selection process to measure the impurity level of the variables.Usually, the default number of trees (ntree) is 100, and the default number of variables (mtry) is the square root of the total number of image bands (Xingling Wang et al, 2005).
To further improve the classification accuracy, the grid-search method based on the OOB error was used in this study to optimize ntree and mtry.
Using the grid-search method, with ntree = M and mtry = N, different RF classifiers were trained with M×N (ntree, mtry) combinations, then the learning accuracy of each RF classifier was estimated according to the OOB error, and the combination with the highest learning accuracy was eventually obtained among a number of combinations as the optimal parameters.The advantage of this method is that it can ensure that the obtained search solution is the global optimal solution in the grid, which can avoid major errors (Jiantao Liu et al, 2016).
Both the RF classification algorithm and the grid-search algorithm were implemented in the IDL 8.3 language platform.

Multiresolution
As the key part of object-based classification method, image segmentation divides images into several homogenous object units.Among the many image segmentation methods, multiresolution segmentation is the most widely used.It is a regional merge segmentation method based on minimum heterogeneity.To conduct multiresolution segmentation using this method, it is necessary to set the segmentation parameters in advance, including the weight of each band, spectral factor, shape factor, and segmentation scale.Since the selection of scale parameters directly determines the quality and accuracy of object-oriented image analysis, scale parameters are the most important parameters.The multiresolution segmentation was performed in the eCognition 9.0 platform.In order to further optimize the segmentation results, estimation of scale parameter (ESP) was used to evaluate the scale parameters.The curve of local variance and the heterogeneity change rate at different scales were made by ESP, and the potential optimal parameters were found using the ROC-LV curve.

Training and Test Samples
The training and test samples pixel were selected based on the UAV aerial imagery, ground actual measurement data, and high-resolution Google Earth imagery.Of this data, 48,279 pixels were training samples (20%), and 209,691 pixels were verification samples (80%).

Results of Pixel-based Classification
The grid-search method was used to optimize the parameters of the RF algorithms of the 4 models.The mtry parameter optimization range of Model 1 is (2,3,4,5), and the mtry parameter optimization range of Model 2 is (3,5,7,9), that of Model 3 is (5, 10, 15) and that of Model 4 is (6,12,18) The optimization range of the ntree parameter of all the 4 models is (25,50,75,100).The optimal mtry and ntree of the 4 models are (4,100), (7,100), (10,100) and (12,100), respectively.The above parameters were used to perform the RF classification algorithm to obtain the classification results.The overall classification accuracy of the Model 1~4 are 82.79%,84.91%, 86.75% and 88.16%, respectively.According to the variation range after topographic features were added as a variable, the classification accuracy experienced a maximum improvement of 2.12%.The classification accuracy can also be increased by adding the texture and spatial features as the variables.
Among the 4 models, Model 4 has the highest classification accuracy of 88.16%.According to the confusion matrix in Table 3, the classification accuracy of shrub and villages is below 80%, i.e., 76.75% and 68.61%, respectively.Although the accuracy is relatively low, they are 19.86% and 14.3% higher than the 56.89% and 54.31% of Model 1.This indicates that the classification accuracy can be significantly improved if there are multiple characteristic variables.Figure 3 shows the commission and omission errors of various types of surface features.After adding topographic information as a variable, the commission and omission errors of each ground type were reduced to varying degrees, especially for shrub, villages and roads.It can be seen that topographic data is relatively effective for extracting construction land information.With texture and spatial information included, the commission and omission errors of the various types of surface features generally showed a downward trend.Although the commission error of some surface features, e.g., road, was improved, the omission error decreased.Therefore, the classification accuracy was improved overall.

Comparison of Different Methods
Using the same 33 feature variables of Model 4, the images were classified using SVM (denoted as SVM_Model 4) and ANN (denoted as ANN_Model 4) method, and the classification results were compared with the RF classification results of Model 4 (RF_Model 4).The algorithms were compared in terms of execution time, classification accuracy, and Kappa coefficient such that the applicability of each classification method to the classification and extraction of land use information in reclamation areas was analyzed.The comparison results are shown in The comparison results show that the overall accuracy of SVM_Model 4 and ANN_Model 4 is 81.14% and 83.92%, respectively.This is 3.27% and 4.24% lower than that of RF_Model 4. In regard to the execution time, the classification of RF_Model 4 took 26 minutes, which is much lower than the 57 minutes and 63 minutes of SVM_Model 4 and ANN_Model 4 respectively, thus greatly improving the classification efficiency.Table 5 shows the inter-class accuracy assessment.The mean accuracy of RF_Model 4 is 85.94% with a standard deviation of 0.113; the mean accuracy of SVM_Model 4 is 81.71% with a standard deviation of 0.137 and the mean accuracy of ANN_Model 4 is 74.9% with a standard deviation of 0.186.From the perspective of descriptive statistics, RF_Model 4 has the highest overall accuracy, the most even mean accuracy and the smallest dispersion.This means that this method can achieve a better effect in the extraction of several surface features.In summary, the RF method has obvious advantages in the classification and information extraction of industrial and mining land.In contrast, the ANN method is poorly applicable to the extraction of such information.5. Object types accuracy assessment of different method (%)

Segmentation and Object-based Classification Results
Among the 4 per-pixel RF classification models, the classification accuracy of Model 4 is as high as 88.16%.Therefore, scale segmentation was conducted based on the 33-band variables images of Model 4. The shape factor was set as 0.2, and the compactness factor was set to be 0.5.Six scale parameters were selected through ESP,i.e.,88,71,64,49,28 and 10, to perform multiresolution segmentation, and the results of local segmentation are shown in Figure 4.It can be seen that the segmentation results of the scale 88, 71 and 64 are not obviously different from each other.The segmentation results become finer from scale 49, whereas the segmentation results of scale 28 and 10 are too fine and there are excessive objects.The number of segmentation objects corresponding to the 6 scales are 704, 1077, 1336, 2352, 7363 and 49071, respectively.The number of objects segmented by scale 28 and 10 is much larger than that of the first 4 scales, indicating a certain degree of over-segmentation.
RF classification was performed based on the segmentation results of six scales.The overall accuracy was 87.71%, 86.89%, 85.12%, 89.58%, 84.55% and 80.54%, respectively.The local effect of the classification is shown in Figure 5.When the segmentation scale is 49, the classification accuracy reached the highest value of 89.58%, and the confusion matrix is shown in Table 6.The classification accuracy of object-based classification is 1.42% higher than that of pixel-based classification (88.16%), and the classification accuracy is further improved.The classification results of the two are locally compared, as shown in Figure 6.As can be seen from Figure 6, the results obtained by object-based classification are relatively compact, the fragmentation degree is significantly reduced, the category and the shape are relatively highly consistent, the boundaries among different geographical categories are relatively clear, the distinction among different categories is comparatively obvious, and the noise is effectively reduced.Therefore, object-based classification can effective address the "salt-and-pepper phenomenon" encountered by the traditional pixel-based classification method.

CONCLUSIONS
Land reclamation and ecological reconstruction have become important measures to coordinate the development of mineral resources and the protection of land resources, as well as to promote the construction of an ecological civilization.However, industrial and mining reclamation areas are mostly located in mountainous and hilly regions where the terrain is undulating, the distribution of surface features is fragmented, the project area is usually small and the layout is often scattered.The accuracy of extracting information on surface features using medium-resolution images cannot meet the demand for land reclamation management.Therefore, it is necessary to carry out studies on land use classification with high-resolution images in industrial and mining reclamation areas.Besides, the object-oriented RF classification method can make up for the shortcomings of the pixel-based classification method and help improve the classification accuracy.
In this study, the object-oriented RF classification method was applied to classify land use information in industrial and mining reclamation areas using GF-1 remotely sensed images.The RF method based on grid-search parameter optimization achieved a classification accuracy of 88.16% in the classification of images with 33 characteristic variables, this classification accuracy was higher than that of SVM and ANN under the same characteristic variables.In terms of efficiency, the RF classification method performs better than SVM and ANN, it was more capable of handling multidimensional feature variables.Therefore, compared with SVM and ANN, the RF classification method was better in terms of performance and applicability in land use classification of industrial and mining reclamation areas.The RF method combined with object-based analysis approach could highlight the classification accuracy further.The multiresolution segmentation approach on the basis of ESP scale parameter optimization was used for obtaining six scales, i.e., 88, 71, 64, 49, 28 and 10, to perform image segmentation, when the segmentation scale was 49, the classification accuracy reached the highest value of 89.58%.
The classification accuracy of object-based RF classification is 1.42% higher than that of pixel-based classification (88.16%), and the classification accuracy is further improved.Therefore, the object-oriented RF classification method was better in terms of performance and applicability in land use classification of industrial and mining reclamation areas, which provides theoretical reference and technical support for monitoring of land reclamation.

Figure 1 .
Figure 1.Location of study area, training samples and 3D representation of remote sensing image ) Calculate and select feature variables based on spectrum, terrain, texture and spatial information of the data; (3) Establish four different feature variable combination models, i.e.Model 1 (SPE), Model 2 (SPE+DEM), Model 3 (SPE+DEM+TXT) and Model 4 (SPE+DEM+TXT+SPA), to respectively perform pixel-based RF classification and assess the classification accuracy of each model; (4) Select the model with the highest classification accuracy for multiresolution segmentation, conduct object-oriented RF classification with different scales, and assess the results for accuracy (Maxwell, A. E., 2015; Chaofan Wu et al, 2016); (5) Compare the object-oriented RF classification method with the SVM and ANN classification methods, and assess the performance of the object-oriented RF classification method.Figure 2 shows the workflow.

Figure 2 .
Figure 2. Workflow of the study a. commission errors b. omission errors Figure 3. Commission and Omission errors of the classification results of Model 1~4 a. Result of scale 88 b.Result of scale 71 c.Result of scale 64 d.Result of scale 49 e.Result of scale 28 f.Result of scale 10 Figure 4. Multiresolution segmentation results using six different segmentation scale a. Classification result of scale 88 (overall accuracy 87.71%) b.Classification result of scale 71 (overall accuracy 86.89%) c.Classification result of scale 64 (overall accuracy 85.12%) d.Classification result of scale 49 (overall accuracy 89.58%) e. Classification result of scale 28 (overall accuracy 84.55%) f.Classification result of scale 10 (overall accuracy 80.54%) Figure 5. Classification result of six different segmentation scale a. Object-based classification b.Pixel-based classification Figure 6.Comparison of local classification result between object-based and pixel-based method

Table 1 .
Wang et al, 2011)re variables included DEM, and slope, aspect and curvature calculated based on DEM.For the texture information feature variables, a 3×3 mobile window was selected after comparative analysis with multiple experiments.The gray-level co-occurrence matrix (GLCM) was applied to identify 8 kinds of texture features (WenjingWang et al, 2017)for each of the 4 bands of the images, i.e., mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment and correlation (ShuzhiWang et al, 2011), which resulted in a total of 32 variables.Because of the high correlation among these variables, they were reduced by principal component analysis (PCA) transformation.With the standard deviation of 0.3 as the threshold, a forward sorting of the variables was conducted, and the first 12 variables (PC1~PC12) were preferably selected to be involved in the image classification.Local Moran's I and Local Getis Ord Gi are the spatial information feature variables calculated based on all the spectral, topographic and texture information.A forward sorting of the variables was carried out with a standard deviation of 0.6 as the threshold, and 10 variables were selected to be involved in the classification.All the characteristic variables are shown in Table1.Statistic of feature variables Table 2 shows the number and distribution of the different types of ground samples.

Table 4 .
Overall accuracy assessment of different method

Table 6 .
Confusion matrix and accuracy assessment of object-based RF classification