Accuracy Assessment of the GlobeLand 30 dataset in Jiangxi Province

The Globeland30 dataset is the most highly spatial resolution global land cover mapping product, which developed by the National Geomatics Center of China (NGCC) in 2015. It plays a significant role in environmental monitoring, climate change, and ecosystem assessment, etc. In this study, Jiangxi province was selected as our study area, the 1:100000 land use data in 2010 was employed as the reference data. We aim to examine the accuracy of the Globeland30 from three methods, including area error analysis, shape consistency analysis and confusion matrix. The results show as follows: The land cover types in the study area are primarily occupied by the cultivated land and forest, and secondarily by grassland, water bodies and artificial surfaces. The area error of cultivated land, forest and water bodies are all less than 13%; The general conformance of the shape consistency reaches to 67%, but the shape consistency of every land type differs to a large degree, the best shape consistency of forests is up to 75%; The confusion matrix is obtained in two cases of different class boundary with buffer and no buffer area. It is found that the overall accuracy and kappa coefficient of GlobeLand30 are improved with buffer area. The value of overall accuracy is higher than 78%, the value of kappa coefficient is higher than 0.52.


INTRODUCTION
Land cover refers to the synthesis of various material types and their natural attributes and characteristics on the earth's surface.Accurate measurement of land cover change is an important parameter for studying sustainable development planning, land resource management and geographical condition monitoring (Chen et al., 2012).With the development of satellite spatial resolutions, people have had a new demand for the global surface land cover data, and the original rough precision global surface coverage data set cannot be satisfied (Chen et al., 2014).GlobeLand30 is the world's first land data coverage product with 30m resolution, including two phases in 2000 and 2010.Its classification accuracy is reported to be very high, and it meets to the classification requirement of earth system research (Liao et al., 2015;Chen et al., 2011).GlobeLand30 is used by various countries in the world to make up for the blank of the new era of high-precision global land cover data, and provides basic information for global ecosystem assessment.
The accuracy evaluation of the surface coverage data is the prerequisite for the rational use of these data products.Because of the shortcomings of data quality, classification system and drawing method, the quality of data products has some kind of problems, which causes some serious loss in practical applications (Ma et al., 2016).By evaluating the accuracy of the data products, the data producers can further improve the data, and improve the quality of the data products.Moreover, by comparing and analysing different data products, we can find out the advantages and disadvantages of these datasets, and then provide users helpful information on choosing their best suitable datasets.Therefore, it is necessary to evaluate the precision of the land cover data before applying the land cover data into researches or projects, which is beneficial to broaden the applications of the datasets and to provide scientific support for decision-making polices (Cao et al., 2012;Ning et al., 2012).
The research on the accuracy evaluation of the Globalland30 dataset has been carried out by domestic and foreign scholars, for example, Chen et al. (2016) made a global accuracy assessment of GlobeLand30; Brovelli et al. (2015) uses the cross validation method to evaluate the accuracy of GlobeLand30 in Italy; Meng et al. (2015) studied the accuracy of GlobleLand30 in Shaanxi province by using the sample evaluation method; Ma et al. (2016) used the comparative analysis method to evaluate the GlobeLand30 of Henan province; Huang et al. (2016) used the spatial consistency method to study the accuracy of GlobeLand30 in Shaanxi province.This study evaluated the accuracy of globeland30 in Jiangxi Province by combining three methods in different respects.Which can provide a relatively more detailed information on the performance of the globeland30 datasets.

Study Area
Jiangxi province is located in south-eastern China.It is located on the South Bank of the middle and lower reaches of the Yangtze River covering an area of latitude from 24°29′ to30°04′N and longitude from113°34′ to 118°28′E.The whole Jixi province has a subtropical humid climate and abundant rainfall with a total area of 166 thousand and 900 square kilometres.The characteristics of landform and climate make Jiangxi's land cover types more abundant.The land cover types in GlobeLand30, except for tundra, glacier and permanent snow cover, are all included in the study area.This is the main reason why we select Jiangxi Province as our research area.

Land cover data:
In this paper, the evaluated data is GlobeLand30 in 2010, including 10 types of land cover, such as cultivated land, forest, grassland and water bodies, etc.The data were developed at the end of 2013, which mainly uses 30 m Landsat multispectral images, a large number of auxiliary data and reference materials, WGS84 coordinate system was employed in the production of the data.GlobeLand30 in 2010 covers a land range of 80 degrees north and south, with a total of 843 wide range products (Liao et al., 2015). .The overall accuracy of the first and second level data interpretation is higher than 90%.It is China's high precision land use product (Liu et al., 2003a;Liu et al., 2003b;Liu et al., 2009;Liu et al., 2010;Liu et al., 2014;).

Data Processing:
Due to the different characteristics in data source, legend, scale and reference system between GlobeLand30 and the land use data in China, a data preprocessing phase was required before accuracy evaluation (Ma et al., 2016).Its main contents include projection transformation, splicing, cutting, reclassification, etc.The data processing flow is shown in Figure 1.First, GlobeLand30 and land use data are reprojected into the same coordinate system of WGS-84.Then, data is spliced and clipped to get raster data with the same boundary.Because of the difference in classification systems between GlobeLand30 and land use data, reclassification is an essential step to obtain a consistent classification system.Based on the GlobeLand30 classification system, the land use data is reclassified to achieve the unification of the classification systems.The transformation relation table is shown in Table 1 (Chen et al., 2015;Liu et al., 2009).Classification errors due to this spatial mismatching should be particularly evident at the border between two different classes.To verify this behaviour, all cells belonging to a buffer of 60m around different classes border were eliminated and the confusion matrix of the remaining pixels is calculated and compared with the confusion matrix obtained in the case of no buffer area (Gallego et al., 2001).The data processing results are shown in Figure 2

RESULTS AND ANALYSIS
Many different methods and indexes have been proposed in the previous literature to evaluate the accuracy of land cover data.
In the paper, the data are processed and counted, and the number and area of the pixels of all kinds of ground objects are obtained with raster and vector structures.The analysis methods such as area error, shape consistency and confusion matrix are used to evaluate the land cover dataset.The calculated results are expressed in table form, thus the differences in the area, quantity and space position of GlobeLand30 and reference data can be visually obtained.

Area Error Analysis
Within this study, the number of pixels was respectively counted for different land categories in different data types, the areas of all kinds of ground objects are computed through field calculator.By means of the area error coefficient (C), the area precision of various land categories is evaluated for GlobeLand30 (Ma et al., 2016).The smaller the C value is, the more similar the evaluation data is to the reference data, the smaller the area error is, and vice verse.

Shape Consistency Analysis
The shape consistency analysis is from the aspect of the vector.It uses the principle of superposition analysis to evaluate the shape accuracy of GlobeLand30.In this paper, the shape consistency index (SCI) is introduced and the results of the superposition analysis of GlobeLand30 and reference data are obtained to evaluate the shape accuracy of GlobeLand30 (Cao et al., 2012).The greater intersecting area between GlobeLand30 and reference data, the better the consistency of the two shapes.In summary, the shape consistency analysis reflects the shape matching of GlobeLand30 and reference data.
where SCI = shape consistency analysis S1= map spot of the i th class of GlobeLand30 S2= map spot of the i th class of reference dataset

Confusion Matrix Analysis
Confusion matrix is a commonly used method for accuracy evaluation of land cover data.It is a list of rows that select the pixel as an evaluation unit, composed of GlobeLand30 and reference data, and used for Spatial Comparison between two data.In the row list, the column represents the reference data, and the row represents the data to be evaluated, and the elements on the main diagonal line are correctly classified as the number of pixels (Liu et al., 2007).There are many accuracy evaluation indexes derived from the confusion matrix, such as overall accuracy, mapping accuracy, user accuracy (Ma et al., 2016).These accuracy evaluation indicators can be calculated from the error matrix data.The confusion matrix has realized the accuracy evaluation of GlobeLand30.In this paper, in order to compare the effect of pixels of different types of boundary on the results of GlobeLand3 accuracy evaluation, we get the confusion matrix in the two cases, i.e., data with buffers and data without buffers at the boundary of the data.The overall accuracy, mapping accuracy, Kappa coefficient and other indicators are employed to illustrate the accuracy of Globeland30.
The overall accuracy (OA) is an overall measure that represents the ratio of the total number of pixels to the total number of pixels by the correct classification, as shown in formula (3).The producer accuracy (PA) indicates the ratio of the number of correctly classified pixels to the class in the data to be evaluated, as shown by formula (4).The user accuracy (UA) represents the ratio of the number of pixels in a correct classification to the class in the reference data, as shown in formula (5).The Kappa coefficient (K) is obtained based on the confusion matrix.It is the index representing the consistency of data classification, and it is also used to evaluate the accuracy of remote sensing data.
where OA = overall accuracy PAi = user accuracy of i th UAi = producer accuracy of i th K = Kappa coefficient N = the total number of pixels nii = the number of pixels that are correctly classified n+i = the number of pixels in a reference data ni+ = the number of pixels in GlobeLand30 r = the number of the classes i = the i th class

Area Error Result
Figure 4 and 5 are the histogram of percentages of different land types.Table 2 and 3 are results obtained by area error analysis, the area and error coefficient of different land types are listed in these tables.From Figure 4, Table 2, in reference data, the total area of the northern part of Jiangxi province is 7636.99km 2 , cultivated land and forest area were 7341.37km 2 , 10497.79km 2 , the area percentage were 34.11% , 48.78%, the sum of these two percentages is up to 82%.So the cultivated land and forest constitute the main part of the land cover in the northern part of the reference data, the percentage of the other land types is only 13%.In GlobeLand30, cultivated land and forest also constitute the main part of northern land use, and their area and area percentages are similar to those in reference data.However, the areas of grassland and shrubland are different in the GlobeLand30 and reference data.
The area in the south is 21521.23km 2 , in terms of reference data or GlobeLand30, the main parts of the land use are cultivated land and forest, the sum of their area percentage is over 95%, other land types are distributed little, especially grassland, shrubland and water bodies, resulting in significant difference in area distribution between reference data and GlobeLand30.In terms of area error, from Table 2 and 3, the area error of cultivated land, forest and water bodies is relatively small in GlobeLand30, but there is difference in the area error between the three land types in the north and the south.In northern Jiangxi, the smallest area error is 1.97%, followed by 4.69% of cultivated land and 19.87% of water bodies.In the south, the area error of cultivated land is 0.89%, followed by 6.99% of water bodies and 12.1% of forest.Moreover, from Table 2 and  3, it is not difficult to find that the difference of the area error between the bare land and the grassland residual land category is relatively small, while the difference of the area error is bigger in the south, such as bare land and grassland.The reason for this phenomenon may be related to the intensity of land type distribution in the selected areas.In conclusion, the area error analysis results of land use composition in the southern and northern part of Jiangxi province are basically the same, and the area consistency with the reference data is good.

Shape Consistency Result
The shape consistency analysis is realized by the principle of vector superposition, and the results are shown in Figure 6.The results of consistency between southern and northern Jiangxi are represented in the form of two-dimensional histogram, mainly in order to show the similarities and differences between two regions in shape consistency more directly.It is known from the figure that the shape consistency index of cultivated land, forest and water bodies is higher in the north, higher than 60%; secondly, the index value of wetland and water bodies are near 60%; the shape of the grassland, shrubland and bareland was the worst, and the finger value of the grassland was less than 20%, and the shrub and bare land were zero.The conformance index of water bodies is more than 1 in the south.
The reason for the occurrence of this phenomenon is that the area of water bodies obtained by the vector superposition is larger than the area of the water bodies in the reference data.
The shape of cultivated land and forest was in good agreement, followed by artificial surfaces and grassland, shrubland, wetland, and bareland had the worst shape consistency.From the comparison of the shape consistency between the southern and northern land types, the consistency coefficient of most land types in the north is slightly higher than that of the south, and the consistency of their land types is generally good.

Confusion Matrix Result
The confusion matrix of reference data and GlobeLand30 is established, and the relative accuracy evaluation index is calculated on the basis of the error matrix.In order to study the influence of pixel classification on the boundary of different land types, we set up the boundary buffer area and get the confusion matrix in two cases with buffer and no buffer area.
For the establishment of boundary buffer, first of all, the reference data and the GlobeLand30 classes are extracted, and then implemented by the way of corrosion.As shown in Figure 7, the result of the establishment of the buffer zone in northern Jiangxi data is shown, and it is indicated that the pixels at the boundary are completely eliminated.Table 4 is a result diagram of the overall accuracy of the GlobeLand30 and Kappa coefficient of the GlobeLand30 in the two case of a buffer and no buffer zone.From the Table 4, we can see that the overall accuracy and Kappa coefficient of GlobeLand30 have been improved compared with that of no buffer area.To a certain extent, it also indicates that the probability of pixel misalignment is large, which has great influence on the accuracy of evaluation data.When the data are classified, the pixel in the boundary region should be considered separately, and the accuracy of their classification is improved and the effect on the accuracy of the data is reduced.

Error Analysis
The results of GlobeLand30's accuracy assessment present GlobeLand30 and reference data are inconsistent.There are two main reasons for the inconsistency: on the one hand, it was caused by different data sources and different data scales; on the other hand, because of the pixel errors caused by confusion of different land types or different data classification standards.By combining the field investigation, the reasons for the inconsistency of the two kinds of data are explained briefly: 1) The form of land changes at any time: different times of data image acquisition, the same land cover type presents different forms.When data are classified, pixel misclassification will appear, such as cultivated land, grassland, water bodies, bareland and shrubland.Part of the cultivated land in Jiangxi is distributed in the valley, unlike plain area, cultivated land is distributed in a massive form.The crops are seasonally harvested, it is easy to realize the classification of cultivated land during the flourishing period of crop growth, but in the absence of crop or crop growth stage, cultivated land is easily confused with bareland or grassland pixels.The seasonal variation of grassland is more obvious, especially in winter, grassland appeared withered, or grassland is at the boundary of the shrubs or woodlands, and there is no distinct boundary between the different classes, those all result in pixel error.There is also a period of dry water and abundant water in the water bodies, which also affects the classification of water bodies.
2) Leakage points: There will be leaks in land types that are small or dispersed in areas, such as artificial surfaces, shrubland and wetland.From Figure 8 (a) and (b), the artificial surfaces (red) is interconnected with cultivated land (pink) and forest (deep green), and artificial surfaces distribution is scattered, resulting in artificial surface leakage in the classification, the artificial surfaces is wrongly classified as forest and cultivated land, and the final artificial surfaces area is reduced.Jiangxi is a subtropical humid climate with superior water and heat conditions and less shrubland.From Figure 8(c) and (d), we know that most shrubland (yellow) are scattered, when the shrubland are classified, the leakage is obvious, and most of them are divided into forest (deep green).As shown in Figure 8 (e) and 8(f), many wetlands (light blue) are distributed along the river, some of which are narrow in width, causing the wetlands to be categorized into water bodies (deep blue).In addition, when the area of a land cover type less than the minimum mapping unit, i.e., that has not reached the requirement of the classification, it can also lead to leakage.GlobeLand30 is different from the classification system used in reference data, such as bareland, it is defined as land covering less than 20% cartographic units in GlobeLand30, including salt surface, sand, gravel, rocks and crusts; In China's land use data, bareland is not a separate category, the bare land and others of unused land are usually used as bareland, the bare land is land under 5% vegetation coverage; Others in the unused land, including the alpine desert, the tundra, etc.Except for bare land, the definition of other land types is also different.

CONCLUSION
This paper evaluates the accuracy of GlobeLand30 in 2010 with a study area of Jiangxi Province, and a reference data of 2010 Chinese land use data by three methods including area error, shape consistency and confusion matrix The reasons for the disagreement between GlobeLand30 and the reference data are summarized by taking the field investigation into consideration.Through the analysis of the results of the evaluation, the main conclusions of the GlobeLand30 accuracy evaluation are as follows: 1) According to the proportion of the land type area, the land types in the south and north of Jiangxi province are mainly cultivated land, grassland and water bodies and artificial surfaces.The accuracy of cultivated land and land is high, which is basically the same as that of reference data, and the remaining land types are quite different from reference data in terms of area error or shape consistency.
2) The overall accuracy and kappa coefficient of GlobeLand30 increased through eliminating of pixels at the boundary of different land types.In this case, the overall precision of the northern Jiangxi is 83.56% and the Kappa coefficient is 0.73; the overall precision of the south is 78.12% and the Kappa coefficient is 0.53.Generally speaking, classification accuracy of Globeland30 is higher in northern part than that in southern part.From the area of the land type, the reason causing the difference between the south and the north Jiangxi Province is that the land covers of grassland, wetland, water bodies and artificial surfaces area in the north are more abundance than in the south.The less the area of the land is, the less the number of pixels is, the pixel is more likely to be missing or misplaced.
Based on the data of China's land use in 2010, although the accuracy of GlobeLand30 has been verified to a certain extent, there are still many deficiencies in the analytical method.It needs to be pointed out: 1) The phenomenon of pixel confusion will appear between different land types, which will affect the classification of land types.In the future, we should increase the research on this area and reduce the error caused by pixel confusion.
2) Cultivated land, forests and water body are quite different in terms of area, but they have high accuracy in terms of area error and shape consistency, so the accuracy of land type classification is not related to the area size.The next step will be to consider whether the classification accuracy of the land type is related to the density of its distribution and the terrain in which it is located, the density of each type of land is associated with the number of polygons that constitute it 3) The accuracy of GlobeLand30 is verified by comparing method.It may doubt the correctness of the result.In the later study, we will use field measurement sample points to supplement the evaluation results, so as to improve the credibility of the evaluation results.
Data: This article uses the 1:100000 China land use data (LUCC) in 2010.Based on the 4 land use change database in the late 1980s, 1995, 2000 and 2005, the data were completed by using the land use change remote sensing information and human-machine interaction method to interpret the Landsat TM digital images in 2010.It is provided by the Institute of geography and resources of the Chinese Academy of Sciences.China's land use data include 6 first level class and 25 second level classes, including farmland, woodland and grassland, etc.

Figure 1 .
Figure 1.The flow chart of data processing (a) China land use data (b) GlobeLand30 Figure 2. Land cover reclassification images in southern Jiangxi Province (a) China land use data (b) GlobeLand30 Figure 3. Land cover reclassification images in northern Jiangxi Province error Ki= the area of the i th class of GlobeLand30 dataset Ni = the area of the i th class of reference dataset

Figure 4 .
Figure 4. Comparison of GlobeLand30 and China land use data in northern Jiangxi Province

Figure 6 .
Figure 6.Shape consistency coefficient of different land types for GlobeLand30 and China land use data in Jiangxi province Buffer of 2 pixels eliminated around GlobeLand30 and China land use data in northern Jiangxi province A schematic diagram of the leakage of artificial surfaces, shrubland in Jiangxi province 3) Classification systems have different definitions of land types:

Table 1
and Figure 3.
. Land type reclassification and corresponding relation

Table 2 .
Area statistics and error coefficient of different land types for GlobeLand30 and China land use data in northern Jiangxi province

Table 3 .
Area statistics and error coefficient of different land types for GlobeLand30 and China land use data in southern Jiangxi province

Table 4 .
The comparison of accuracy assessment criteria of GlobeLand30 and China land use data