AUTOMATIC IDENTIFICATION METHOD OF CONSTRUCTION AND DEMOLITION WASTE BASED ON DEEP LEARNING AND GAOFEN-2 DATA

Due to the relatively complex construction and demolition waste (C&DW) spectrum and texture, it is difficult to identify C&DW by simply constructing a remote sensing index. Therefore, this study proposes an automatic identification method of C&DW based on deep learning and the Gaofen-2 (GF-2) Data. Pingdingshan City and Jining City in China were selected as the research areas in the study. The dataset used for deep learning training and testing in the study area was captured by the GF-2 Data. On the basis of this dataset, the deep learning model DeepLabv3+ is used to identify C&DW. The overall accuracy rate of the deep learning model for identifying C&DW is 82.02%, and the overall mIoU is 82.39%. The accuracy of the model for the identification of C&DW areas is further verified by ground verification. The results of this study are helpful for the survey and management of C&DW, which is beneficial to the study of spatial and temporal distribution of urban C&DW, resource utilization and environmental pollution risk reduction.


INTRODUCTION
Construction and demolition waste (C&DW) refers to the solid waste generated by man-made or natural reasons in the production activities of construction and demolition of buildings, including waste concrete, residue soil, waste masonry, residual mud, abandoned materials and other wastes (Zhao et al., 2019). With the development of the economy and the improvement of people's demand for urban modernization, construction and related industries have developed rapidly. At the same time, the C&DW surge problem is increasingly serious. The generation of a large amount of C&DW leads to many urban environmental problems (Zhang et al., 2020), such as land waste, water pollution, air pollution, etc., which has aroused widespread attention in society. Due to the characteristics of uncertain stacking location, huge output and complex composition of C&DW, how to effectively identify and grasp the spatial distribution information of C&DW has become one of the hot issues in society.
Compared with the traditional manual field survey for the positioning and detection of C&DW sites, the efficient, rapid and high-precision images generated by remote sensing technology are more likely to achieve ground object monitoring. With the rise of remote sensing technology, the application fields of high-resolution satellite remote sensing data have involved surveying and mapping, urban planning, land management, environment, agriculture and so on . Due to the particularity of C&DW in color, shape, texture and other aspects, and the great similarity between C&DW and other ground objects in the background in satellite images, C&DW identification based on high-resolution remote sensing images has become widely concerned and challenging research. C&DW identification based on remote sensing images can be divided into two aspects: composition monitoring of C&DW (Hang et al., 2018;Jia et al., 2021) and investigation of municipal solid waste (Zhang et al., 2013). At present, the research on the component identification of C&DW and the identification of municipal solid waste based on satellite remote sensing has established some discrimination basis for the spectrum and texture of C&DW, but a systematic theory of remote sensing monitoring method has not been established for C&DW. Therefore, there is still great uncertainty in the remote sensing monitoring of C&DW.
Machine learning has been successfully used for object recognition within images. Existing research shows that the random forest (RF) algorithm can identify trend characteristics of C&DW, and C&DW can be analyzed and classified by machine learning and spectral analysis. However, machine learning algorithms can have different recognition effects for the same class of objects (Ge et al., 2020). A machine learning algorithm is established to identify C&DW based on the Google Earth Engine platform . Because traditional machine learning algorithms cannot capture the deep features of C&DW, it is difficult to effectively distinguish it from the surrounding ground objects, and image meta-misclassification occurs. And machine learning recognition results are dot rather than planar. Therefore, compared with traditional machine learning algorithms, deep learning is more suitable for identifying C&DW.
In recent years, the rapid rise of deep learning technology has brought great impetus to remote sensing image analysis and research. And the deep learning method based on convolutional neural network has shown great potential in surface covering classification (Xu et al., 2017), target recognition and extraction (Zhu et al., 2018). Compared with the traditional remote sensing image analysis method, the deep learning method can abstract the high-level features of the image and reduce the dimension efficiently. It has shown better model generalization ability and higher prediction accuracy in the existing high-resolution remote sensing image analysis research. At this stage, most of the object identification studies based on highresolution remote sensing images are oriented to typical urban scenes, such as buildings  and vehicles (Peng et al., 2021). There is little research on identifying C&DW. The transfer learning and Inception-V3 model were used for retraining to realize the automatic identification of C&DW . Due to the complex texture, shape and color of C&DW, a simple CNN structure cannot be applied to identify C&DW in complex backgrounds. It is difficult to make a distinction between buildings, bare land and other objects missing or wrong. So the accuracy of remote sensing automatic identification of C&DW is not high. To extract the C&DW area accurately from remote sensing images with complex backgrounds, a deeper and more specific network architecture is needed.
DeepLab algorithm was proposed by the Google team in 2015, which is a model focused on semantic segmentation. To improve the training performance of the whole network model, multi-scale features are introduced into the DeepLab algorithm. Design convolution and pooling operations with different parameters to obtain feature maps of different sizes, and effectively fuse the obtained feature maps in the network model. DeepLabv3+ algorithm is one of the most popular network models in the field of image semantic segmentation. Compared with simple CNN networks and other mainstream deep learning networks (U-Net (Olaf Ronneberger et al., 2015), PSPNET (Zhao et al., 2017)), DeepLabv3+ has a leading edge on multiple public datasets with its distinctive structural features. DeepLabv3+ in the ASPP module obtains more abundant contextual semantic information through multi-expansion rate hole convolution and extracts multi-scale features of C&DW. It can increase the model receptive field, avoid the problem of information loss caused by pooling operation, and maintain the spatial resolution of the image. And it shows excellent performance and is leading-edge on public datasets such as Cityscapes (Cordts et al., 2016), PASCAL VOC 2012 (Everingham et al., 2015). DeepLabv3+ has been used on CubeSat images to map retrogressive thaw slumps (RTSs) on the Tibetan Plateau (Huang et al., 2020). This model can effectively recognize RTSs, although the colors and textures of the RTS are diverse and similar to those of the surrounding environment. RTS remote sensing recognition is similar to C&DW recognition from satellite images. DeepLabv3+ can identify ground objects that do not have typical textures and spectral features. Three-dimensional urban densification is monitored using Landsat data (Chen et al., 2020). Mapping horizontal and vertical urban densification in Denmark with Landsat time series from 1985 to 2018. The results show that an implementation of deep networks and the inclusion of multiscale contextual information greatly improve the classification and the model's ability to generalize across space and time.
Therefore, this study proposes remote sensing automatic identification of C&DW based on deep learning network DeepLabv3+ and GF-2 data. Building a C&DW sample library based on GF-2 data, the deep learning model DeepLabv3+ is used to automatically identify the remote sensing image of C&DW, which provides a new idea for real-time urban monitoring and intelligent control of C&DW.

Study Area
Pingdingshan City is located in the central and southern part of Henan Province in China, 33°08'-34°20'N and 112°14'-113°45' E. The total area is 7882 square kilometers. In 2018, China launched the project "Construction Waste Precision Control Technology and Demonstration", and Pingdingshan was included in the "13th Five-Year" national key research and development plan. Jining City is located in the southwest of Shandong Province, China. Its geographical coordinates are 105°54' -117°06' E, 34°25' -35°55' N, with an area of 11,000 square kilometers. As shown in figure 1, remote sensing images of Pingdingshan and Jining in central and eastern China.

Data Source
GF-2 is China's first civil optical remote sensing satellite with a spatial resolution better than 1 meter. it has an orbit height of 631 km and an imaging width of 45 km. GF-2 data carries two cameras: a panchromatic camera and a multispectral camera with resolutions of 1 m and 4 m, respectively. It has the characteristics of high positioning accuracy, high spatial resolution and fast attitude maneuver capability, and the data characteristics have reached the international advanced level. GF-2 data was successfully launched on August 19, 2014. It is the first civilian optical remote sensing satellite with an independent intellectual property right with a spatial resolution of submeters. The spatial resolution of the point below the satellite can reach 0.8 meters. Data include panchromatic and four multispectral bands.  The imaging time is 2018-2020, and the cloud amount is less than 5%. Data include panchromatic and four multispectral bands.

METHOD
This study regards the C&DW identification problem as a twoclass semantic segmentation experiment. The preprocessing operation of the obtained remote sensing image focuses on the texture and spectral characteristics of C&DW on GF-2 data. And the sample set of C&DW is constructed by image clipping, image labeling, and data enhancement. In this study, the DeepLabv3+ model with Xception_65 as the backbone network was selected as the pre-training model for the semantic segmentation experiment of C&DW. The specific process is shown in figure 2:

Image Pretreatment
The images were preprocessed with geometric correction, radiometric correction and image fusion. This study selects the NNDiffuse Pan Sharpening tool for image fusion. Pan Sharping algorithm is used by the NNDiffuse Pan Sharpening tool, which perfectly combines high spatial resolution panchromatic and low-resolution multispectral image content to generate highresolution color images, and high-resolution color images are generated. The spatial resolution of the fused image is consistent with that of the panchromatic band. The size of each fusion image is 28864 × 27511 pixels, and the resolution is 1m.
To facilitate model training, cut them into unoverlapping areas of size 451×451. When the clipping region exceeds the image range, the pixels with a value of 0 are filled in the non-data region. In all cut images, 1510 sample images containing C&DW were selected by visual interpretation method. From the sample image randomly selected 70% as training set, 30% as validation set.

Labelme Labeling
This study only uses three-color band (RGB) to make data sets.
Most of the C&DW dumps in the study area are formal waste dumps with green net coverage, so the ground objects in the study area are divided into two categories (C&DW and background). Background classes include buildings, roads, woodland, water, bare land, etc. Transform the clipped image lattice into JPG format and label the C&DW in the study area using LabelMe. The pixel value of C&DW is 128, and the background pixel value is 0. Label sample is RGB three channels 8-bit PNG format image. Original images and labeling samples are shown in figure 3. In semantic segmentation experiments, image labels need to be converted into singlechannel images.  image needs 8-bit color image or 8-bit gray image. If using OpenCV to do data enhancement, the input 8-bit color map will become 24-bit color map, resulting in the problem of the dataset that cannot be trained. So PIL is used to do data enhancement experiments, including rotation angle, flip, color, contrast, brightness changes. Finally, 7600 training data and 2001 verification data were obtained.

Dataset Producing
The C&DW dataset used in the study is modeled according to PASCAL VOC2012. The original image, the corresponding tag file of the original image and the gray image are put into JPEGImages, SegmentationClass and SegmentationClassRaw respectively. The data set samples are remote sensing data clipped images, and there are similar features between adjacent samples. In order to avoid the problem of overfitting in network training, which leads to high training accuracy and low verification accuracy, the order of samples should be disturbed before training. The training set and validation set of the experimental data set are randomly assigned to 3:1. Two text files in the ImageSets folder: train.txt and val.txt store the names of the training data and validation data respectively.

DeepLabv3+ Model
DeepLabv3+ (Liang-Chieh Chen et al.) adopts an encoderdecoder structure. In the encoder part, it is mainly used to extract the context information of the input image, that is, using the method of cavity convolution to extract the characteristics of various aspects. This method can obtain arbitrary resolution feature images in the process of analysis, then detect the characteristics of the convolution network according to the pyramid model. Further, analyze all the characteristics of the whole image. The decoder part is mainly used to strengthen the object boundary information. The main DeepLabv3+ network structure is shown in figure 4: and input into the decoder through up-sampling. In the decoder part, the original image features extracted by the backbone network are first compressed by 1×1 convolution, and then 3×3 convolutions are performed with the result features of the upsampling output in the encoder part. Finally, the resulting map is output by up-sampling.

Model Training
In this study, based on the DeepLabv3+  To ensure the reliability and accuracy of the experimental results, the model needs to be trained repeatedly. Optimizing the network by observing the change in the Loss curve during training. When training the DeepLabv3+ network, the train_batch_size, learning_rate and learning_rate_decay_factor are set as 8, 0.001, 0.1, respectively. After 100000 training iterations, the learning rate decreased to 0. And the loss function value gradually tends to a stable value of 0.1.

Model Evaluation
The accuracy of the experimental model is evaluated from three aspects: parallel union ratio mIoU, overall accuracy OA and recall Recall. The ratio of intersection to the union between the pixel set of segmentation true value and prediction result. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France The predicted value is 1 The predictive value is 0 True value 1 TP FN True value 0 FP TN Table 4. Relationship among TP、FN、FP、TN

Results
The semantic segmentation results of experimental C&DW are shown in figure 3. The red region in the segmentation results is C&DW, and the black region is the background. In order to better observe the effect of C&DW extraction, the semantic segmentation results of C&DW are superimposed with remote sensing images.
Most of the C&DW sites in the study area are covered with green dustproof nets, so the study is regarded as a supervised two-classification problem of C&DW, that is, C&DW covered with green dustproof net and background. As shown in Figure  3(d), the overlay comparison between the predicted image and the original image C&DW location shows that: Under the background of complex ground objects in original images, the model is still quite sensitive to C&DW covered with green dustproof net. The experimental results are not mixed with other ground objects. And the model can still accurately identify the scope of C&DW when the green dustproof net of C&DW is damaged. Compared with Figure 3(b) and Figure 3(c), the boundary identification of C&DW by the network is more accurate.
According to the study on validation dataset data, the model obtained after 100,000 iterative training was selected as the initial training model. The experimental verification data set was used to realize the random sampling verification of C&DW in the whole study area, and the results showed that the mIoU was 82.39%. The overall accuracy of C&DW identification is 82. 02% and the recall rate is 84. 49%. The experimental results show that the method based on GF-2 data and deep learning network DeepLabv3+ can accurately and efficiently identify C&DW.

Ground Verification
In order to verify the credibility of the experimental results and better reflect the identification effect of experimental methods on C&DW, we conducted field visits to C&DW dumps in the study area. Ground verification of C&DW identification results. Taking Jining City as an example, as shown in Figure 6 : The survey time was 2021.10. Remote sensing images on the left recorded the approximate location of C&DW dumps in Jining City (including regular accommodation and informal C&DW areas), the field situation of randomly selected C&DW dump on the right. In order to prevent pollution in Jining, C&DW landfills are covered by green nets. Most of the C&DW piled in the regular accommodation areas restarts the project on schedule and starts the work of accommodation and reconstruction. However, due to the long-term natural effect of C&DW in informal areas, the surface green net is seriously damaged, which has threatened environmental safety. . Identification details comparison By comparing the visual interpretation results with the experimental results, the method of combining GF-2 data and deep learning has a good identification effect on the C&DW area. As shown in figure 7. (I), for formal C&DW dumps, most of the C&DW has green net cover. Although there is damage to the net surface, the model can still be accurately identified. In comparison, Figure 7. (II) shows that the C&DW net in the informal C&DW area is seriously damaged, and other ground objects are often confused, resulting in the omission of identification results of the model.

CONCLUSION
With the rise of high-consumption and high-emission construction industry, the number of C&DW has been increasing year by year. The environmental problems caused by the accumulation of C&DW have become one of the most urgent problems in the world. The traditional target detection algorithm is more suitable for ground objects with obvious features and simple background. Due to the relatively complex C&DW spectrum and texture, it is difficult to identify C&DW by simply constructing a remote sensing index. Therefore, this study proposes a method for automatic identification of C&DW based on deep learning and high-resolution remote sensing images. First, there is no open dataset available to realize the deep-learning-based remote sensing identification of C&DW. In this study, we generated a C&DW remote sensing dataset based on remote sensing images captured by the GF-2 data. On the basis of this data set, this study chooses the DeepLabv3+ semantic segmentation model combined with the Xception_65 feature extraction network to realize the semantic segmentation of C&DW. Its mIoU reaches 82.39%, achieving the maximum separation effect within the allowable range of hardware environment. Finally, the feasibility of this method in C&DW identification is proved by ground verification. And it is beneficial to the monitoring and control of urban C&DW.
Although the experimental results preliminarily meet the regulatory identification requirements of semantic segmentation of C&DW, there are still deficiencies : (1) The complex features of C&DW increase many difficulties for the identification of C&DW. The study of semantic segmentation based on GF-2 data and DeepLabv3+ model for binary classification of C&DW. Good identification accuracy and ground verification results were obtained, but no comparative experiments were conducted with other models. Then other models can be used to explore the effect of C&DW identification.
(2) The study on the automatic identification method of C&DW is only a preliminary exploration of the dynamic information change detection of C&DW, and it can only realize the identification and location of C&DW. It is impossible to accurately estimate the quantitative characteristics of C&DW, such as the volume of C&DW yard.