Investigating the completeness and omission roads of OpenStreetMap data in Hubei, China by comparing with Street Map and Street View

OpenStreetMap (OSM) is a free map of the world which can be edited by global volunteers. Existing studies have showed that completeness of OSM road data in some developing countries (e.g. China) is much lower, resulting in concern in utilizing the data in various applications. But very few have focused on investigating what types of road are still poorly mapped. This study aims not only to investigate the completeness of OSM road datasets in China but also to investigate what types of road (called omission roads) have not been mapped, which is achieved by referring to both Street Map and Street View. 16 prefecture-level divisions in the urban areas of Hubei (China) were used as study areas. Results showed that: (1) the completeness for most prefecture-level divisions was at a low-to-medium level; most roads (in the Street Map), however, with traffic conditions had already been mapped well. (2) Most of the omission OSM roads were either private roads, or public roads not having yet been named and with only one single lane, indicating their lack of importance in the urban road network. We argue that although the OSM road datasets in China are incomplete, they may still be used for several applications.


INTRODUCTION
Along with the development of Web 2.0 technology, an everincreasing amount of geographic information data has been created and updated by volunteers, a certain procedure called "volunteered geographic information" (Goodchild, 2007). As one of the most successful examples, OpenStreetMap (OSM) is an online map database (http://www.openstreetmap.org/) which can be edited and updated by volunteers all over the world. The OSM data is not only free to use but also has a global coverage. OSM can be an essential data source to enhance digital earth products that use digital technology to organize the spatial and temporal change data of the earth (Schultz et al., 2017), and it also has the potential to play an important role in digital earth applications (Mooney and Corcoran, 2014), a wide range of which include 3D modelling (Over et al., 2010), path planning (Zielstra and Hochmair, 2012), emergency relief (Zook et al., 2010), and land use and cover mapping (Arsanjani et al., 2015;Schultz et al., 2017;Zhou et al., 2019). These applications are based on OSM data despite concerns relating to the quality of the data arising from most OSM volunteers being nonspecialists and amateurs (Goodchild, 2007;Haklay et al., 2010).
Extensive research has focused on assessing various quality elements of OSM data, including positional accuracy, thematic accuracy, topological consistency, and completeness (Girres and Touya, 2010;Haklay et al., 2010;Senaratne et al., 2017;Zhou, 2018). Among these different quality elements, the matter of completeness (measuring whether a region has been well covered) has gained much attention. Haklay assessed the completeness of OSM road datasets in England by comparing them with a corresponding authoritative dataset -Ordnance Survey (OS) (Haklay et al., 2010). He used the difference between the road lengths in the OSM and OS datasets as a completeness measure, a measure which happens also to have been used for assessing the completeness of OSM road datasets in the United States . Koukoletsos et al. proposed an automated method for matching OSM and OS datasets, calculating the completeness in terms of the lengths of matched roads proportional to the total length of roads in either the OSM or OS dataset (Koukoletsos et al., 2012). Ludwig et al. used a similar method to compare OSM and Navteq (produced by a commercial company -HERE) road datasets in Germany (Ludwig et al., 2011). Girres and Touya used a ratio of road lengths between OSM and authoritative datasets in France as the completeness measure (Girres and Touya, 2010), while Brovelli et al. used the length percentage of an OSM road dataset included in a predefined buffer of an authoritative dataset (Brovelli et al., 2017). Ciepluch et al. compared three typical online maps of Ireland (OpenStreetMap, Google Maps, and Bing Maps) by manually counting the number of errors (e.g. "incorrect streetname", "incorrect road or street designation") (Ciepłuch et al., 2010). Common to all these studies is an authoritative dataset (obtained from either a mapping agency or commercial company) for assessing the completeness of an OSM dataset. Such an authoritative dataset is not always available, however, due its expense or being out of date. Thus, a practice known as "intrinsic quality assessment" have been made to assess the quality of OSM data without authoritative datasets. Barron et al. proposed a framework including 25 indicators for OSM quality assessments based solely on the data's history (Barron et al., 2014). Neis et al. analyzed the evolution of the OSM road dataset in Germany from 2007 to 2011 (Neis et al., 2012). Neis et al. selected 12 urban areas from around the world and found a correlation between socio-economic factors (e.g. income) and data provided from the analyzed areas . Gröchenig et al. analyzed the annual changes of OSM features and defined three stages: Start, Growth, and Saturation, which described a certain degree of data completeness in the analyzed regions (Gröchenig et al., 2014). Zhou and Tian proposed three indicators, i.e. street block area, perimeter, and density to quantitatively estimate the street block completeness of OSM road datasets; a street block denoted a closed region formed from several road segments (Zhou and Tian, 2018). Sehra et al. developed a toolbox based on an open-source geographic information system (GIS) software package, Quantum GIS, to perform intrinsic quality assessment (Sehra et al., 2017). Furthermore, the above studies showed that OSM data is almost complete in developed countries or regions. For example, Neis et al. found that the OSM road dataset in Germany had a small difference (OSM is still missing about 9% data) compared with a commercial dataset for car navigation (Neis et al., 2012). Barrington-Leigh and Millard-Ball also reported that more than 40% of countries (e.g. United States, Japan, France, Canada, Australia, and Germany) had a fully mapped OSM road dataset, but that the completeness was much lower in some other countries (e.g. China, Russia, and India) (Barrington-Leigh and Millard-Ball, 2017). Ming et al. assessed the OSM road dataset in Wuhan (a prefecture-level division of China) and found that the road completeness of this region was only 38% (Wang et al., 2013). Zhou et al. also analyzed some urban regions in China and found that the OSM road completeness varied from 28.61% to 60.77% (Zhou et al., 2014). Zheng and Zheng assessed the OSM road dataset in China by comparing it with the dataset produced by Baidu (a commercial mapping company in China), finding that 71% of the OSM data was less detailed than the Baidu data; more than 94% of the country consisted of incomplete regions (Zheng and Zheng, 2014). Tian et al. analyzed the completeness of OSM building data in China and concluded that the building data were also far from complete (Tian et al., 2019).
The aim of this study is to reinvestigate the completeness of OSM road data in China because firstly, in China, most geographical data produced by either mapping agencies or commercial mapping companies have not been made publicly available, and it is desirable, therefore, to obtain open-source data as a supplement. OSM data, being freely available, may be used as an alternative. Second, as one of the most essential sources of data, road data can be used for many applications, including traffic flow prediction, map representation, navigation, and routing (Jiang, 2009;Li and Zhou, 2012;Zielstra and Hochmair, 2012). Nevertheless, most existing studies report that the completeness of OSM road data in China is relatively much lower than that of the data in developed countries, resulting in serious concern in utilizing Chinese OSM road data in applications. More importantly, to our knowledge, very few studies have focused on investigating what types of road are still poorly mapped in the OSM road dataset of China. This study investigates not only the completeness but also the omission roads (e.g. the unmapped or omitted roads) of OSM data in China.
This study makes two main contributions: First, a hierarchical classification scheme is proposed to analyze omission roads in OSM by referring to both Street Map and Street View. This is especially necessary for countries (e.g. China) whose authoritative datasets are not freely available. Second, by analyzing the omission roads of OSM in Hubei Province, China, the importance of different types of OSM road data in China are clarified.
This study is structured as follows: Section 2 designs a series of experiments to investigate both the completeness and omission roads of the OSM road data in China. Section 3 reports the experimental results and analyses, while sections 4 and 5 present the discussion and conclusions, respectively.

Study area and data
We studied the urban areas of the different prefecture-level divisions of Hubei province, China (Figure 1), a choice of areas predicated on the following: First, previous studies have verified that both population and social-economic factors potentially affect the completeness of an OSM dataset Tian et al., 2019). The Hubei province, located in the central region of China, was ranked 9th and 7th in 2018 among the 34 provinces of China, in terms of population and gross domestic product (GDP), respectively; this province indicates a medium level of the OSM dataset developed in China (Zhou and Tian, 2018). Second, there are a number of prefecture-level divisions in Hubei (17), which may minimize the subjectivity of using only a division as the study area. More importantly, in this study, the Baidu Maps were employed as a reference map against which the OSM dataset was compared, and the Baidu Street View, as one of the functions in Baidu Maps, was also employed for analyzing the omission roads of the OSM data. However, the Baidu Street View was only available in most of the urban areas rather than in rural areas, and thus only the urban areas in those prefecture-level divisions of Hubei were analyzed. To be specific, the datasets and maps used in this study are as follows: • OSM road dataset: The OSM road dataset in Hubei (China) was downloaded from the website: http://download.geofabrik.de/asia/china.html in Jan. 2019 and extracted from the whole country.
• GLOBELAND30 (GLC30) dataset: The GLC30 dataset is a global land cover data in 2010 at 30-meter resolution. This dataset, which can be freely obtained from the website: http://globallandcover.com/GLC30Download/index.aspx, was produced by the National Geomatics Center of China. The class "artificial surface" in the GLC30 dataset was viewed as urban areas.
• Baidu Maps: Baidu Maps (https://map.baidu.com/) is an online mapping application developed by the Baidu company offering not only a street map (Baidu Street Map) but also a street view perspective (Baidu Street View). Both Baidu Street Map and Baidu Street View were used for analyzing the completeness and omission roads of OSM road data. In addition, we found that Baidu Street Maps include more roads than OSM. This illustrates that Baidu Maps, which are much more complete, can be used as a raster reference database.

Methods
(1) Analysis of OSM road completeness Much research has focused on the use of a corresponding authoritative dataset to compare it with an OSM dataset. However, such an authoritative vector dataset for our study area was not freely available; therefore, we employed the approach proposed by Zhou and Tian (Zhou and Tian, 2018) to assess the road completeness in OSM. This approach sought to analyze the completeness of street blocks in an OSM road dataset by comparing them with a reference map. Here, a street block was defined as a closed region (e.g. A' and B' in Figure 2(a)) surrounded by several road segments. More precisely, Figure 2 was used to explain this approach. Additionally, many online mapping applications (e.g. Baidu Maps and Google Maps) provide traffic conditions for major roads in a city (Figure 3). In this study, the completeness of each street block was also determined by considering only roads with traffic conditions; that is, an OSM street block was determined as complete if, inside the corresponding street block in the reference map, there was no additional road segment with traffic conditions. For example, in Figure 2(b), street block B is viewed as complete if road segment a' in Figure 2(a) does not show any traffic conditions. The purpose of this analysis is to investigate whether major roads have already been mapped in an OSM dataset. (1) In this study, two cases of completeness values were calculated. For the first case (all roads), all roads in a reference map were considered to determine the street block completeness; and for the second case (roads with traffic conditions), only roads (in a reference map) with traffic conditions were considered.
(2) Analysis of omission roads This method seeks to design a hierarchical classification scheme to analyze the omission roads in each street block (Figure 4). This is because several attributes (e.g. road name, number of lanes, and road function) of an omission road may be visually determined by referring to the Baidu Street Map and/or Baidu Street View. More precisely, this scheme includes two levels.

Figure 4. A hierarchical classification of omission roads
At the first level, an omission road is classified into one of three types: public roads, private roads, and roads for non-motorized vehicles, which may be visually determined by referring to the Baidu Street View. That is: • "Public road": A road owned and/or maintained by a public authority (e.g. a municipality) and open to public traffic. In an urban road network of China, almost all the four road types: expressway, main road, secondary road, and branch road can be divided into public roads. The main criterion to determine a public road is that it is open to public traffic (see photos (3)-(6) in Figure 4).
• "Private road": A road owned and/or maintained by a private individual, organization, or company, rather than a government. A movable barrier is often used to restrict public traffic access to a place, such as a gated community or factory. The main criterion to determine a private road is that it is not open to public traffic, or there is a barrier to restrict public traffic access (see photos (7) -(10) in Figure 4).
• "Road for non-motorized vehicles": A road only used for non-motorized vehicles (e.g. bicycles and/or pedestrians) rather than motorized vehicles. The main criterion to determine this type of road is that such a road is a street for walking, or it is too narrow for public traffic access (see photos (11)-(12) in Figure 4).
At the second level, each omission road, marked as either a public or private road, is further divided into different subclasses: • "Public road": We investigated whether a name existed for the corresponding road in the Baidu Street Map and then such a road was divided into one of two types: "named road" and "unnamed road" (see photos (1) -(2) in Figure 4). Moreover, the number of lanes for this type of road was counted by referring to the Baidu Street View, and such a road was further divided into one of four types: single-lane, two-lanes, four-lanes, and more than four-lanes (see photos (3) -(6) in Figure 4).
• "Private road": Each road was marked as one of the four (land-use) functions: residential, commercial, industrial, and others. The Baidu Street View was used to divide private roads into different sub-types (see photos (7) -(10) in Figure 4). For example, a private road inside a residential community is subdivided into a residential road.

Experimental steps
(1) Analysis of OSM road completeness • Step 1: Extract the OSM road datasets in the urban areas of different prefecture-level divisions in Hubei; convert the OSM road dataset of each prefecture-level division into a number of street blocks.
• Step 2: Determine the completeness of each street block through visually comparing it with the Baidu Street Map. When using this reference map, consider two cases (all roads and roads with traffic conditions).
• Step 3: Count the number of complete street blocks and calculate the completeness of street blocks in each prefecturelevel division according to equation (1). For each prefecturelevel division, calculate completeness values respectively for the two cases in Step 2.
(2) Analysis of omission roads • Step 4: Randomly pick up 60 incomplete street blocks from the OSM road dataset in each prefecture-level division.
• Step 5: Overlap each of the 60 street blocks with the corresponding map in the Baidu Street Map; manually digitize all the omission roads in each of these street blocks. As it was a time-consuming process to digitize roads from the Baidu Street Map, in each prefecture-level division, only 60 street blocks were chosen as samples.
• Step 6: Visually determine various types (in Figure 4) of each omission road by referring to both the Baidu Street Map and Baidu Street View; calculate the total length of omission roads for each type. Figure 5 plots the street block completeness for the urban areas in the 16 prefecture-level divisions of Hubei (China), considering two cases (all roads and roads with traffic conditions). Shennongjia was excluded from the analysis because there were only six urban street blocks in this prefecture-level division. shows that with respect to all roads, for 13 out of the 16 prefecture-level divisions, street block completeness values were lower than 40%, and the maximum value was only 55% (Qianjiang), illustrating a lack of roads in the OSM road dataset of China. This finding is also consistent with that found by other studies (Wang et al., 2013;Zhou et al., 2014;Zhou and Tian, 2018). With respect to roads with traffic conditions, however, for 14 out of the 16 prefecture-level divisions, street block completeness values were higher than 80%, and those of the others were all close to 80%. This indicates that major roads in these prefecture-level divisions have already been mapped well. Figure 6 plots the length percentages of omission roads for the three road types, i.e. public road, private road, and road for nonmotorized vehicles. Figure 6. Length percentages of omission roads in the 16 prefecture-level divisions of Hubei (China), in terms of the three road types, i.e. public road, private road, and road for nonmotorized vehicles Figure 6 shows that in terms of road length, approximately 90% of omission roads were either public roads or private roads, and, in most cases, no more than 10% of them were roads for nonmotorized vehicles. This is probably because the Baidu Street Map was mostly mapped for motorized vehicles rather than non-motorized vehicles, and in an urban road network, there are commonly much longer roads for motorized vehicles. Moreover, in 11 out of the 16 prefecture-level divisions, most of the omission roads were public roads rather than private roads, which is evidently not the case for both Wuhan and Xiangyang, in whose prefecture-level divisions there were relatively many more gated communities and/or factories; there was still a lack of both public roads and private roads in the OSM data of different. Figure 7 plots the length percentages of omission roads for different road sub-types in the hierarchical classification scheme (Figure 4) for 16 prefecture-level divisions of Hubei (China). This figure shows: 1) In terms of public roads, more than 60% of missing roads that were not mapped in the corresponding Baidu Street Map were unnamed, a percentage which may actually be higher than 90% for both Wuhan and Xiangyang. Such an unnamed road often plays a role in connecting a residential community or commercial facility to a main road (Figure 8). More importantly, more than 90% of omission roads consisted of only a single lane (or single-lane roads), which indicates their minor importance in an urban road network.

Results of OSM omission roads
2) With respect to private roads, for 15 out of the 16 prefecturelevel divisions, most of the omission roads were residential roads. This is because there were commonly more residential lands in an urban area, and in China there existed a large number of gated communities whose residential roads were private or not open to the public. As a result, probably very few OSM volunteers had paid attention to mapping in these gated communities. This indicates that the private roads were often viewed by OSM volunteers as less important than public roads in an urban road network.

DISCUSSION
From this study, we found that the OSM road datasets in China were indeed not complete, as the completeness values were mostly less than 40% for the 16 prefecture-level divisions of Hubei (China). However, most omission roads (those not existing in the OSM datasets but in the corresponding Baidu Maps) were either private roads, or public roads unnamed and consisting of only one single lane. This means the omission roads in OSM were mostly the least important roads in an urban road network. A similar conclusion is represented by Figure 9, which plots the length of OSM roads in Hubei as a function of the development of OSM road datasets from 2013 to 2019. Two specific scenarios are considered in Figure 9. In scenario I: the total length of the OSM roads completely inside the urban areas of Hubei was calculated. In scenario II, the total length of all the OSM roads inside the administrative division of Hubei was calculated. As an example, eight typical road types, whose road lengths ranked on top, were reported. For each year, the corresponding OSM road dataset was acquired in January Figure 9 shows that for the first scenario, the lengths of at least four road types ("Primary", "Secondary", "Tertiary", and "Residential") increased rapidly from 2013 to 2017, increasing relatively slowly afterwards. This indicates that relatively important OSM roads (e.g. "Primary" and "Secondary") tend to be complete. For the second scenario, however, a different trend was found, at least for the "Secondary" and "Tertiary" road types. That is, their lengths rapidly increased year-on-year, indicating that these types of OSM road data may not be complete in rural areas, which is consistent with other studies: that is, the OSM data in urban areas were relatively more complete than those in rural areas (Girres and Touya, 2010;Neis et al., 2012). Nevertheless, it is still necessary to investigate the omission of OSM roads in rural areas in future work.
Moreover, some limitations have been found when using the OSM history datasets for analysis. For example, Figure 9(a) shows that the lengths of OSM roads for the type "Trunk" increased sharply from 2017 to 2018; this is because a large number of OSM roads marked as "Primary" had been modified as "Trunk", which resulted in a slight decrease of OSM roads for the "Primary" type (Figure 9(a)). On one hand, it shows that although OSM defines tags, tags are just labels describing objects in the real world and this association between tags and real-world objects was not always accurate and thus changes over time were needed. On the other hand, this reflects that the road types defined in OSM were quite different from those used in China. Therefore, a hierarchical classification scheme was designed to analyze omission roads in OSM.
Furthermore, although the OSM road datasets in China are not complete even in the urban areas, they may still be used for a number of applications because most of the relatively important roads (e.g. roads with traffic conditions or public roads with two lanes) have already been mapped well. These OSM road datasets may be used for analyzing traffic flow because in an urban road network, the most important roads ranked in the top 20% can accommodate more than 80% of traffic flow (Jiang, 2009); also, it was found in this study that around 80% of roads (in the reference map) with traffic conditions have already been mapped in the OSM road datasets of Hubei (China). The datasets may also be used for producing a medium-and/or small-scale road network, which is essential for map representation (e.g. in Baidu Maps and Google Maps). In order to produce such a representation, there is a need to eliminate relatively unimportant roads or only retain relatively important roads (Li and Zhou, 2012;Zhou and Li, 2017). The OSM road datasets in China may also be useful in understanding the backbone structure of an urban road network, which pays more attention to relatively important streets (Jiang B, 2007;Scellato et al., 2006). Nevertheless, as there is still a lack of public roads in the OSM road datasets of China, these datasets may not be used for analyzing an urban road network of a small size or region. Moreover, there is still a need to employ authoritative datasets for assessing the usability of the OSM road data in China for various applications.

CONCLUSION
This study carried out a series of experiments to investigate both the completeness and omission roads of OSM road datasets in China. The completeness for a number of OSM street blocks in the urban areas was determined by comparing the blocks with the Baidu Street Map, for which two cases (all roads and roads with traffic conditions) were considered. A classification scheme was designed to investigate what types of roads were omitted in the randomly sampled street blocks, determined through comparing them with both the Baidu Street Map and Baidu Street View. Sixteen prefecture-level divisions of Hubei province (China) were involved for the analyses. Results showed the following: 1) In terms of all roads, the completeness value for most of the prefecture-level divisions was less than 40%, indicating that even in urban areas, the OSM road datasets of China were not complete. In terms of roads with traffic conditions, however, 80% or more street blocks were complete in most cases, indicating most major roads had already been mapped well. 2) Furthermore, most of the omission OSM roads in the urban areas were either private roads (mostly residential roads), or public roads unnamed and consisting of only a single lane, indicating that most omission OSM roads were the least important roads in an urban road network.
This means that the OSM road datasets in China, although incomplete, may still be used for several applications (e.g. traffic flow analysis, road map representation, and backbone structure detection) as discussed in Section 4. Further research may include: (1) verification of the availability of OSM road datasets in China for specific applications; (2) investigation of