MAPPING URBAN AREAS USING DENSE TIME SERIES OF LANDSAT IMAGES AND GOOGLE EARTH ENGINE

Urban information extraction from satellite based remote sensing data could provide the basic scientific decision-making data for the construction and management of future cities. In particular, long-term satellite based remote sensing such as Landsat observations provides a rich source of data for urban area mapping. Urban area mapping based on the single-temporal Landsat observations is vulnerable to data quality (such as cloud coverage and stripe), and it is difficult to extract urban areas accurately. The composite of dense time series Landsat observations can significantly reduce the effect of data quality on urban area mapping. Multidimensional array is currently effective theory for geographic big data analysis and management, providing a theoretical basis for the composite of dense time series Landsat observations. Google Earth Engine (GEE) not only provides rich satellite based remote sensing data for the composite of dense time series data, but also has powerful massive data analysis capabilities. In the study, we chose Random Forest (RF) algorithm for the urban area extraction owing to its stable performance, high classification accuracy and feature importance evaluation. In this work, the study area is located in the central part of the city of Beijing, China. Our main data source is all Landsat8 OLI images in Beijing (path/row: 123/32) in 2017.Based on the multidimensional array for geographic big data theory and the GEE cloud computing platform, four commonly used reducer methods are selected to composite the annual dense time series Landsat 8 OLI data. After collecting the training samples, RF algorithm was selected for supervised classification, feature importance evaluation and accuracy verification for urban area mapping. The results showed that 1), compared with the single temporal image of Landsat 8 OLI, the quality of annual composite image was improved obviously, especially for urban extraction in cloudy areas; 2) for the evaluation results of feature importance based on RF algorithm, Coastal, Blue, NIR, SWIR1 and SWIR2 bands were the more important characteristic bands, while the Green and Red bands were comparatively less important; 3) the annual composite images obtained by the ee.Reducer.min, ee.Reducer.max, ee.Reducer.mean and ee.Reducer.median methods were classified and accuracy verification was carried out using the verification points. The overall accuracy of the urban area mapping reached 0.805, 0.820, 0.868 and 0.929, respectively. In summary, the ee.Reducer.median method is a suitable method for annual dense time series Landsat image composite, which could improve the data quality, and ensure the difference of features and the higher accuracy of urban area mapping.


INTRODUCTION
Urban areas are the center of sustainable development. The 2030 United Nations Sustainable Development Agenda included an independent Sustainable Development Goal (SDG) that aims to " make cites and human settlements inclusive, resilient, adaptable and sustainable " .Access to resource in building and managing cities is a huge challenge for social governance.We needs a lot of scientific knowledge that can help transition urban into a more sustainable future (Zhu et al., 2019). In the past decades, many scholars studied a large number of urban area extraction based on remote sensing technology. Schneider et al. mapped global urban areas using MODIS 500m data(.Schneider et al., 2010). After Landsat and other data are available for open access, it provides a rich source of data for urban area mapping. Zhu et al used the random forest algorithm to map urban areas based on Landsat7 and SAR data (Zhu et al., 2012).Many urban area mapping studies are based on singletemporal Landsat remote sensing data, which is difficult to mitigate the effects of clouds, strips, etc. (Wen et al., 2012), and it is also easy to confuse urban with other objects (such as bare land, farmland), so it is difficult to extract urban areas accurately. Li et al. proposed a method to map urban areas in Beijing based on the annual multi-temporal Landsat data, which mitigated the confusion between the urban and other objects (Li et al., 2015). However, they did not deal with Landsat data quality (such stripe and cloud). At present, the development of big data and cloud computing technology offer great potentials to resolve the problems in urban area mapping. Highperformance platforms such as Google Earth Engine (GEE) (Gorelick et al., 2017) and Data Cube not only provide plenty multi-source remote sensing data, but also have powerful massive data analysis capabilities. The image composite based on all Landsat dense time series data during the year can effectively mitigate the impact of data quality(such stripe and cloud) on urban areas mapping (Mateo-García et al., 2018). Landsat dense time series data can be easily obtained using the Google Earth Engine, but how to composite time series data more efficiently is a key problem we need to consider. This study discussed how to composite effectively based on the annual dense time Landsat8 image, combined with the multidimensional arrays for analysing geoscientific data and random forest algorithm, for mapping urban area of Beijing, China in 2017.

MATERIALS
The study area is in Beijing, China ( Figure 1). Beijing is located on the northwestern edge of the North China Plain and covers an area of 16,410.54 square kilometers. Due to dense population activities such as urbanization and afforestation, its land cover has undergone tremendous changes.Now, its land cover types mainly include forest, farmland, water, urban, etc. As a region with relatively high urbanization, Beijing is a typical city for the study of urban areas mapping. We chose the time series of Landsat8 OLI surface reflectance data of Beijing(path/row:123/32) for 2017 as our primary data source (obtained from U.S. Geological Survey: http://earthexplorer.usgs.gov/)(

Multidimensional Arrays for Analysing Geoscientific Data
With remote sensing data growing in variety and size, it calls for new management and analysis theory for big data. To efficiently integrate information from high dimensional data, Lu at al. explicitly proposed a array-based modeling theory, named multidimensional arrays for analysing geoscientific data (Lu et al., 2018). A multidimensional array can be seen as a function that maps the Cartesian product of multiple discrete, totally ordered, and finite dimensions to a multidimensional attribute space. Let D denote an n-dimensional index or dimension space and V refer to an m-dimensional attribute or value space. A multidimensional array A is then defined as , where and individual dimensions Di are finite and totally ordered. Since multidimensional arrays are functions, operations to modify array data can be categorised into five types ( Figure 2).They are select, scale, reduce, rearrange and compute operations, respectively. The application of reduce operations can yield composite image from time series Landsat 8 OLI images. Figure 2. Array operations. "A" indicates original arrays, "B" indicates result arrays after certain array operations are applied (Lu et al., 2018).

Google Earth Engine
Google Earth Engine is a cloud-based planetary scale geospatial analysis platform.It makes use of high-speed computing power of Google to handle a variety of significant social and environment research topics, including deforestation, drought and climate monitoring. GEE stores multiple PB-level data sets for remote sensing and other data, while providing high-performance parallel computing capabilities. GEE is controlled based on an Application Programming Interface (API) and associated webbased Interactive Development Environment (IDE), and also supports visual visualization of results. The functions in GEE are designed for multidimensional data, and ee.Reducer is a commonly used class to composite annual intensive time series data.

S
Here are many advantages of the random forest algorithm:(1) Because the base classifier of the random forest is a CART decision tree, the random forest can be classified or regressed; (2) Out-of-bag (OOB) unbiased estimation of random forest can effectively estimate the generalization error of classifier;(3) Variable importance estimation is also one of the important advantages of random forest algorithms, and is widely used in feature selecting. Based on the stability performance of the random forest algorithm and the evaluation of feature importance, it was chosen as the method of supervised classification.

Accuracy Assessment
There are several ways to assess the accuracy of classification results. Among them, the accuracy evaluation based on the error matrix is the most commonly used method.

Annual Composite Result
The research area is in Beijing, China. Based on the GEE platform, Landsat8 OLI data can be easily obtained. The data set of the experiment is " LANDSAT/LC08/C01/T1_SR " . ee.Reducer is a useful class in GEE, which provides dozens of multidimensional array reduce methods ( Table 2).The four commonly used Reducer methods provided by GEE was used to composite the annual dense time series Landsat8 OLI data. (See Table 4).

Reducers
Use Band ee.Reducer.median() B1-B7 Table 4. Selected reduce methods The annual composite image has many advantages for urban mapping. It is more important that the composite image without cloud and cloud shadows can be obtained based on the multidimensional array reduce method, so that the influence of data quality on urban mapping may be mitigated. In addition, the composite image calculated by selecting the appropriate reduce method can have better feature distinguishing ability than the single-temporal image, reduce the commission and omission error , and improve the accuracy of urban mapping. Based on the four reduce methods in Table 4, a partial map of the annual composite image of Beijing in 2017 is obtained, as shown in Figure 2. It can be seen that the annual images composite based on the above methods are clear and cloudless images, which is very advantageous for urban mapping, especially for urban extraction in cloudy areas.

Sample Analysis
The main land cover types in Beijing in the study include forests, farmland, water, and urban. Combining the performance of the classifier with the time-series spectral characteristics of the features, the ground objects were divided into three types, namely water, urban and other (including forests, grasslands, farmland, etc.). We refered the definition of urban land as (Schneider et al., 2014): sites that are dominated by built environment, including all non-vegetative, human-constructed elements (e.g., roads and buildings). The "dominated" indicates a greater than 50% coverage of built environment within each pixel. Based on the knowledge mentioned above, training and verification sample points were selected. Stable training sample points were collected by visually interpreting Landsat8 images and supplementing them with higher resolution images in Google Earth. Training sets of each types has 500 training sample points and 200 verification sample points. The characteristic mean line of the training sample points is the most simple and intuitive chart for evaluating the correlation and separability of various training samples. For the four annual composite images obtained by the ee.Reducer.min(), ee.Reducer.max(), ee.Reducer.mean(), and ee.Reducer.median() methods, the feature mean values and standard deviation of three types of training samples points were separately counted (Figure 4.). It can be seen from Figure 4.that the spectral change of the urban during the year was not obvious. And the urban didn't have significant phenological phenomena.For water type, there was some spectral change during the year. For the other type, mainly including forests, grasslands, farmland, etc., the spectral changes during the year were obvious, with some phenological characteristics. Especially in the NIR band, the maximum value is about 4 times the minimum value. The number of decision tree estimators (n_estimators) is one of the important parameters of the Random Forest (RF) algorithm. By setting the number of n_estimators of the random forest algorithm to 100 (Foody et al., 2006), stable classification results could be obtained. After the training the data, the normalized feature importance of each variable is output, that is, the sum of all feature importance is 1. If the feature importance is greater, the more important the feature, and vice versa, the less important the feature.

Classification Results and Accuracy
After the training the data, four annual composite images are separately classified. The comparison of the classification results is shown in Figure 6. Based on the annual composite image obtained by ee.Reducer.median(), it has the best classification result by visual interpretation, and there is no obvious commission or omission errors. The classification results obtained from the annual composite image by ee.Reducer.min() arose a commission between farmland and urban in the fallow period. And the shadow of the mountain was also greatly affected the result and was mistakenly classified as water. The obvious error is that there has some omission errors in urban in the classification results obtained based on the annual composite image obtained by ee.Reducer.max(

CONCLUSIONS
New remotely sensed big data streams are revolutionizing urban areas mapping. Particularly, Long-term sequence based satellite observations provides abundant data for urban remote sensing. In this research, we focus only on urban area mapping form annul composite image by Random Forest (RF) algorithm in Beijing, China,but it can been reference for other mapping study. We can find out from the research that 1), composite image form reducer method such as ee.Reducer.median() can mitigate the interference of cloud ect.; 2) Coastal, Blue, NIR, SWIR1 and SWIR2 bands were the more important bands, while the Green and Red bands were comparatively less important for classification; 3) the ee.Reducer.median methods is appropriate for image composite. Though a majority of urban areas mapping are based on daytime optical and thermal sensors, there are other sensors that also provide unique observations of urban characteristics, such as LiDAR, RADAR, and nighttime lights sensors. If combined with multi-source remotely sensed data, the accuracy of mapping can be improved.