TEMPORAL AND SPATIAL ANALYSIS OF CLASSIFICATION TREE FOR IMPERVIOUS SURFACE MAPPING FROM SENTINEL-2 MSI DATA

For studies of urban development, it is an important method for obtaining the distribution of impervious surface (IS) areas and their dynamic change from remote sensing data. The dilemma of the same spectrum for different features and different spectrums for the same features, posed by the complexity of the IS objects, is the fundamental obstacle encountered in the extraction of urban IS areas. In this study, an automatic extraction method for urban IS areas is proposed and analyzed, based on classification and regression tree (CART) and ensemble learning strategies. The Sentinel-2 MSI data of 30 cities in China from 2018 to 2020 were selected for IS extraction experiments. We perform temporal and spatial modeling of the splitting threshold offset in the classification model to explore the effect of time and space on IS extraction. The obtained offset models show that the temporal variation is not significant, while the spatial offsets have more obvious linear relationships.


INTRODUCTION
Impervious surface (IS) is a sort of artificial covering that does not allow water to penetrate fast and is a common feature of metropolitan settings. Building tops, walkways, highways, squares, airports, parking lots, and other sorts of artificial building spaces are examples of the low permeability covered ground features. Urban IS areas are the result of human activity, and variations in the areas directly reflect the changing state of cities. IS makes it difficult for rain to penetrate the soil, reducing local infiltration and soil moisture, while increasing IS areas increases surface runoff. Furthermore, there is a strong positive relationship between IS coverage and surface temperature, and increasing IS coverage increases the urban heat island effect. For studies of urban development, it is a vital method for obtaining the distribution of IS areas and their dynamic change using remote sensing data. The dilemma of the same spectrum for different features and different spectrums for the same features, posed by the complexity of the IS features, is the fundamental obstacle encountered in the extraction of urban IS areas. Existing research has provided many strategies for the extraction of IS areas from various angles.
Different definitions of indexes have been proposed by researchers in the study of index-based approaches. Normalized difference built-up index (NDBI), normalized difference bare-soil index (NDBaI), and albedo are taken for improved linear spectral mixture analysis method was explored to estimate urban IS areas (Fan et al., 2015). And the combinational buildup index (CBI), principal component analysis, normalized difference water index (NDWI), and soil-adjusted vegetation index (SAVI), are proposed to extract IS (Sun et al., 2016) (Sun et al., 2017a). Another index, the normalized IS index (NISI), integrates DMSP-OLS and Moderate Resolution Imaging Spectroradiometer (MODIS) normalized difference vegetation index (NDVI) data, was provided (Guo et al., 2017). A modified * Corresponding author normalized difference IS index (MNDISI) is proposed, and a Gaussian-based automatic threshold selection method is used to identify the optimal MNDISI threshold for delineating IS from background features (Sun et al., 2017b). For the Advanced Spaceborne Thermal Emission and Reflection Radiometer (AS-TER) spectral data, a perpendicular IS index (PISI) was proposed (Tian et al., 2018) for IS extraction. Considering the topography of the target area, some scholars have proposed indexes to extract IS areas and shaded areas separately (Luo and Chen, 2021), even considering LiDAR data (Luo et al., 2018).
There are different ways to construct IS extraction models with the proposed indexes. The MODIS NDVI and DMSP/OLS nighttime light imageries are taken to establish regression models that estimated the fraction of IS area in urban green space (Kuang, 2019). The random forest model is employed for direct and comprehensive IS extraction on subpixel estimation (Deng et al., 2017). The IS extracting methods based on machine learning, fully convolutional neural networks (FCNN), are provided at pixel level with high-resolution satellite imagery (McGlinchy et al., 2019) (Lin et al., 2020). A method called Continuous Subpixel Monitoring (CSM) was developed to map and monitor urban IS area change continuously at the subpixel level (Deng and Zhu, 2020). To alleviate the difficulties in the conventional scheme, the geospatial distribution knowledge of IS areas at large scales was taken . The combined use of the data from Sentinel-1 dual-polarized Synthetic Aperture Radar (SAR) and Sentinel-2 Multispectral Sensor Images (MSI) have been successfully applied in many remote sensing applications. Scholars also propose a method for IS mapping by synergetic fusion of dual-polarized SAR and multispectral information (Sun et al., 2022), (Shrestha et al., 2021), , (Feng and Fan, 2021), .
This paper presents our work on IS extraction modeling and temporal and spatial analysis using sentinel-2 satellite data, based on the ensemble learning model of classification trees. The Sentinel-2 satellites deliver high-resolution multispectral data, which is critical for IS studies. To create training samples, we make a combination of existing land-cover data and Sentinel-2 data. For classification and regression tree (CART) learning, we use repeated sampling and training of the IS extraction tree model for ensemble learning. IS extraction experiments are conducted for 30 cities in China in different periods of sentinel-2 data from 2018 to 2020. Then, the temporal and spatial relationships of classification trees are studied and analyzed.

Data Sources
In the study, we used Sentinel-2 MSI acquisitions and global land cover from Copernicus Global Land Service. Sentinel-2 MSI delivers multi-spectral imageries with a wide field of view that can be used to monitor plant, soil, and water cover, as well as observe interior waterways and coastal areas. Sentinel-2 MSI detects the Earth's reflected brightness in 13 spectral bands, ranging from visible and near-infrared (VNIR) to shortwave infrared (SWIR). We select four bands (Green, Red, NIR and SWIR1) from Sentinel-2 data for the study. The land cover data provides a global dynamic land cover map at 100 m spatial resolution, which includes a continuous field for all basic land cover classes that provide proportional estimates for vegetation/ground cover for the land cover types. Band "urbancoverfraction" of the Land Cover maps (v3.0.1) in 2019 over the entire Globe, which reaches an accuracy of 80% at Level 1, is taken for generating IS samples. Sentinel-2 and land cover data are available on Google Earth Engine (GEE) platform. The national capital, municipalities directly under the central government, and the capital cities of provinces and autonomous regions in mainland China, Hong Kong, Macao, and Taipei were selected as experimental cities for impermeable surface extraction, as shown in Figure. 1. We use Sentinel-2 data from January 2018 to December 2020, and corresponding land cover data in 2019. Multispectral data with less than 20% cloud coverage in the range of two months around January 1 (winter), March 1 (spring), June 1 (summer), and September 1 (autumn)

Experimental Sites
were taken as experimental data. The Sentinel-2 data from Nanning in Guangxi Province, Guiyang in Guizhou Province, Xining in Qinghai Province, and Lhasa in Tibet Autonomous Region were excluded from the experiment because their data did not meet the requirements.

Decision Tree Base Learners
The method of decision tree for IS extraction is based on the nonlinear relationship where, ρ refers to attribute values form the description data, namely the four bands form Sentinel-2 in the study, h refers to the class labels, 1 for IS and 0 for non-IS. T refers to the classification tree model. The class labels of samples are set by where, t refers to the IS rate threshold, it is set to 30% in the experiments. We combine class labels and the four band values at random positions to generate sample data for training IS extraction models.
The decision tree model is taken for the base learners. A decision tree can learn to split the training samples into subsets based on attribute values. A decision tree is constructed topdown by choosing an attribute at each step according to how well the attribute splits the set into homogeneous subsets with the same class label. Here we take the Gini purity as the measure. The Gini impurity is calculated for different spectral data of the sample data as follows, where p0 refers to the fraction part of hs = 1 and p1 refers to the non-IS part with hs = 0. The minimum Gini impurity is taken as the priority classification attribute. The optimal bisection parameter of the classification tree is determined as shown in Figure. 2.
This process is repeated on each derived subset recursively. When subsets all have a same class label, or when there is no gain to the prediction, recursion is done. When the splitting recursion is completed, a decision tree has been built up. An IS extracting result example in 2018 of Taiyuan City in Shanxi Province is shown in Figure. 3.
However, due to the limitations of accuracy as well as resolution, there are imperfect or incorrect labels inevitably exist in the training samples. The quality of the extracting results can be compromised from multiple trees. As shown in Figure 1, the marked area in (a) is significantly different from the other subplots. We employ the ensemble learning model to reduce the impact.

Ensemble Learning Based on Decision Trees
Ensemble learning is the process of combining the outputs of multiple learners to give the final result. To keep the samples highly differentiated for each learner, samples are generated in a repeated random sampling mode. Each sampling brings about a 1000-sample set, including 500 IS samples and 500 non-IS samples in our experiments. Each sample set is used to train a decision tree. These decision trees are used to vote for the results of classifying. The voting results H can be calculated as and hi refers to the output form decision tree i.
Since the definition of classification trees is usually based on the attribute values, it is not convenient to handle the splitting thresholds when performing classification tree synthesis. To simplify the operation, we convert the splitting-based continuous tree structure model into a discrete model based on a regular grid. The discrete attribute values are defined as where, ρ0 refers to an attribute reference value, ∆ρ is the attribute grid interval, and Z refers to the integer set. The discretized tree models are synthesized according to Equation. 4, i.e., the extraction model corresponding to ensemble learning. The flow is described as shown in Figure. 4.  In our experiments, based on the statistics of the four bands of the sample data, we define the discrete attribute ρi in range of five times the standard deviation as ρi ∈ µi + n · 5σi N | n = −N, −N + 1, · · · , N ,

Build tree statistics
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France where, i refers to the band index, µi refers to the mean value of samples from band i, and N refers to dimension radius of the grid (N = 100 in the experiments).
We consider that there is no significant change in IS in urban cities within three years, and the classification tree models for the same period are combined as an integrated model for that time. For example, a part of the tree models for January 2018, 2019, 2020, and the integrated model are shown in Figure. 5.

Temporal Tree Offset Modeling
The classification tree model is built based on attribute splitting values. The study here focuses on the analysis of the variation of attribute splitting thresholds in time and space dimensions. It is assumed that the model of the classification tree is largely consistent, which can be ensured by the quality of the experimental results. The analysis of the temporal dimension focuses on the changes in the offsets of the splitting points between periods. For analysis purposes, we define a measure of the IS distance of attributes, S, as follows, where, d(ρ) = ||ρ − ρt|| 2 refers to the distance from ρ to ρt, and ρt refers to the nearest attribute splitting point. S represents the signed distance from the target to the splitting point, taking a positive sign in the IS areas and a negative sign in the non-IS areas.
Based on the IS distances of the attributes of different decision tree models, we define the IS distance difference δS between models as follows, where, S0 refers to the distance to the reference model, S refers to the gradient. δS describes the distance difference between the attributes of the target model and the reference model. This distance difference will be more accurate if the two models are more similar. The distance difference is directly related to the offset of the splitting point, namely, As shown in Figure. 6, the offsets a and b are related to the area of the difference part.

A B a b
Figure 6. Tree model offset and attribute IS distances. The red and green lines refer to attribute IS distance for the two different tree models, and the blue line refers to the zero distance reference. Offset a and b between the models is related to the area A and B.
We obtain by integrating the model over the defined range of attribute, f , namely For each city, we obtained the IS distances for the attributes during the four different periods of the year. If the attribute IS distances changes over time, there is a corresponding offset in their corresponding classification tree splitting points. We consider the relationship between the corresponding offsets of different bands and periods and establish the following function where, τ refers to the temporal variable defined in 1, 2, 3, 4 for the four periods, α, β, γ refer the offset model coefficients to be determined.

Temporal Tree Offset Analysis
Based on the offset observation data and the quadratic model, the residual equation can be constructed as follows, where B = coefficient matrix consisting of 1, τ , τ 2 X = parameter vector consisting of α, β, γ L = attribute offset values f at different times V = residual vector of the offset model.
The solution of the equation can be obtained as follows, The continuous curves of the temporal offset corresponding to the four bands are shown in Figure. 7. The figure shows that among the four bands, the trend is not obvious in the other three bands, except for the Green band which has significant negative growth. The covariance of the estimated parameters, D, is calculated as follows, where r refers to the degrees of freedom. The standard deviation information of the parameters is extracted from D and listed in From the standard deviation data in the table, we can find that the confidence level of the model parameters being non-zero is not high. The effect of spatial factors could be mostly ignored.

SPATIAL ANALYSIS OF CLASSIFICATION TREE MODELS
We relate the offsets of the tree model to the spatial locations, expressed in the expression as follows, where, ϕ, λ refer to centered longitude and latitude coordinate values, ci refers to the model coefficients to be determined. ϕ, λ can be expressed as follows, where, ϕ1 and λ1 refer to the original longitude and latitude coordinate, ϕ0 and λ0 refer to the reference longitude and latitude coordinate. In the experiment, we select the latitude and longitude of Xi'an City in Shaanxi Province, (108.936899 • , 34.278453 • ), as the reference. We obtain the parameters of the spatial offset model, which are listed in Table. 3.  Table 3. Parameters (×10 −3 ) of the spatial offset model, τ for period index and b for band index.

Index
From the comparison of the model parameters in the table, it can be found that there is a certain regularity in the value of c0 at different bands, with the band order decreasing before increasing. c1 and c2 tend to have larger values in b1 and b4, indicating that the spatially correlated linear shifts are more pronounced in these two bands. c3, c4 and c5 do not have large values, indicating that the spatial quadratic feature is not significant. These 16 spatial offset models are plotted as shown in Figure 8. The linear variation characteristic of the offsets can be found visually.

CONCLUSIONS
We employed CART to achieve automatic IS extraction with sentinel-2 MSI and land cover classification data for 30 cities in China from 2018 to 2020, and analyze the temporal and spatial offsets of the classification model splitting thresholds. To reduce the sensitivity of the classification model to imperfect data, we ensemble the three models for the same period of three years. The temporal and spatial offset models were built up by linear regression. The results show that the offsets are not significant in the temporal dimension, while the linear variations in the spatial dimension are relatively more pronounced. Figure 8. The spatial offset model of the experimental cities. The first row refers to January, the second row to March, the third row to June, and the fourth row to September. The columns from left to right refer to the bands Green, Red, NIR and SWIR1.