CLASSIFICATION ACCURACY ASSESSMENT FOR REGIONAL VECTOR DATA PRODUCT BASED ON SPATIAL SAMPLING: A CASE STUDY OF JAPAN
- 1School of Surveying and Geo-Informatics, Tongji University, Shanghai, 200092, China
- 2National Quality Inspection and Testing Center For Surveying and Mapping Products, Beijing, 100830, China
Keywords: Spatial sampling, Spatial Vector Data, Stratification Strategy, Classification Accuracy Assessment, Spatial Correlation
Abstract. Spatial vector data is a kind of data that represents real spatial information through points, lines and polygons. Spatial data quality is one of the basic theoretical research in geographic information science. Accurate and reliable data quality assessment is very important for its theoretical significance and practical value. This paper proposes an improved method for the traditional classification accuracy evaluation of spatial vector data: (1) Quantitative estimation of sample size. According to the statistical principle of probability theory, the overall quantity is estimated by controlling the sampling error and the acceptance quality level. The sample quality is the unbiased estimate of the overall quality. (2) Stratification strategy: the overall objects are divided into three layers according to the three basic geometric structures -- points, lines and polygons. The difference within the layer is small and the difference between layers is large, which conforms to the basic principle of stratification. Then, the proportion of the total number of elements in each layer is taken as the weight to distribute layer by layer, and the sample size of each layer is obtained. (3) Allocation of samples. The spatial property of spatial sampling is mainly reflected in the allocation of samples. Considering the spatial correlation of elements in same layer, Local Moran's I index was used to calculate the correlation degree of a certain attribute between each spatial element and its neighbouring elements. After cluster analysis of elements in each layer, samples were screened by setting a reasonable threshold value. (4) Sample inspection. Each sample was examined against reference information, including images and data. The classification of each sample is judged by the principle of majority judgment. (5) Classification accuracy assessment. The classification accuracy information of samples was obtained by making the confusion matrix of the classification result of samples and the real results. The classification accuracy of experimental data is evaluated according to Kappa index. A case study of Global Core Vector Data of Japan shows the improved method in this paper and process of classification accuracy assessment for regional spatial vector data product. Global Core Vector Data are organized according to the country or region, including three categories of transportation, river system, place names, which are divided into 8 middle categories and 52 small categories. In this paper, 1405 samples of Global Core Vector Data in the experimental area of Japan are selected by spatial stratified sampling in 3 strata. The experimental results show that the proposed improved method is applicable to classification accuracy assessment of regional spatial vector data product and overcomes the disadvantages of type-based spatial stratified sampling that relies on the classification information of all elements. The Kappa coefficient is 0.831, which reflects the result of classification accuracy assessment in the experimental area is good. The proposed improved method provides a reference for the method of classification accuracy assessment classification of following global spatial vector data product.