TOWARDS THE INTEGRATION OF AUTHORITATIVE AND OPENSTREETMAP GEOSPATIAL DATASETS IN SUPPORT OF THE EUROPEAN STRATEGY FOR DATA

: Digital transformation is at core of Europe’s future and the importance of data is well highlighted by the recently published European strategy for data, which envisions the establishment of so-called European data spaces enabling seamless data ﬂows across actors and sectors to ultimately boost the economy and generate innovation. Integrating datasets produced by multiple actors, including citizen-generated data, is a key objective of the strategy. This study focuses on OpenStreetMap (OSM), the most popular crowdsourced geographic information project, and is the ﬁrst step towards an exploration of pros and cons of integrating its open-licensed data with authoritative geospatial datasets from European National Mapping Agencies. In contrast to previous work, which has only tested data integration at the local or regional level, an experiment was presented to integrate the national address dataset published by the National Land Survey (NLS) of Finland with the corresponding dataset from OSM. The process included the analysis of the two datasets, a mapping between their data models and a set of processing steps—performed using the open source QGIS software—to transform and ﬁnally combine their content. The resulting dataset conﬁrms that, while addresses from the NLS are in general more complete across Finland, in some areas OSM addresses provide a higher detail and more up-to-date information to usefully complement the authoritative one. Whilst the analysis conﬁrms that an integration between OSM and authoritative geospatial datasets is technically and semantically feasible, future work is needed to evaluate enablers and barriers that also exist at the legal and organisational level.


INTRODUCTION
The digital transformation of the economy and society is at the very core of the European Commission's priorities for the period 2019-2024, centred around the twin need for a greener and more digital Europe (European Commission, 2019). This is also proven by the Recovery and Resilience Facility, recently established in response to the COVID-19 pandemic, which prescribes that at least 20% of the C672.5 billion provided to European Union Member States in loans and grants have to be used for the digital transformation (European Commission, 2021). Clearly, no digital transformation can happen without data and, reflecting this, the European strategy for data (European Commission, 2020a) envisions Europe's digital future through the establishment of a European single market for data ensuring the free flow of data, including personal and non personal, across actors and sectors, to stimulate data-driven innovation and create value for the economy and society. The vision is to establish a common European data space based on domain-specific data spaces in strategic sectors such as environment, agriculture, industry, health and transportation. To achieve this goal, an ambitious set of legislative instruments to be released by 2024 will address a number of data-related issues such as availability, interoperability, quality, governance, cybersecurity, skills and literacy as well as the overarching data infrastructures. The European strategy for data acknowledges the importance of all kinds of data, being them produced by the public sector, the private sector, academia or citizens. Hence, making it possible to combine and integrate data from different sources-by solving all the issues mentioned above-acquires primary importance for the successful establishment of data spaces. * Corresponding author This paper addresses the topic of integrating data produced from the public sector and from citizens, with a focus on the geospatial domain and within a European dimension in mind. In the European strategy for data, data contributed by citizens-a phenomenon referred to as 'data altruism'-play a central role and shall happen in full compliance with the General Data Protection Regulation (European Parliament and Council, 2016). The potential of citizen-generated data to improve policy making has been already widely recognised by the European Commission, e.g. in the fields of citizen science (European Commission, 2020b) and, more specific to the geospatial domain, Spatial Data Infrastructures, where citizen-generated data contributes to their evolution into modern geospatial data ecosystems (Kotsev et al., 2020).
This study explicitly focuses on citizen-generated data from Open-StreetMap (OSM), the most well-known and successful crowdsourced geographic information project. Started in 2004 and currently (June 2021) counting more than 1.6 million unique contributors (https://wiki.openstreetmap.org/wiki/Stats), OSM consists of a global database of geospatial vector features available under the open access Open Database License (ODbL). Thanks to the freedom of use ensured by the license, as well as its richness and level of detail, the OSM database is currently used by a variety of actors including governments, private companies and nonprofit organisations . The problem of integrating OSM with other datasets, mainly authoritative datasets produced by governmental National Mapping Agencies (NMAs)-which is discussed in this paper-has been addressed since the very early OSM literature in close connection with research on OSM quality; notable examples include Haklay (2010), Girres and Touya (2010) and Neis et al. (2012). Several experiments were carried out on specific features (roads, buildings, land use areas, etc.) and using OSM and authoritative data from many regions in the world. However, those experiences still appear iso-lated as they mostly describe specific use cases, are only tested on small (local or regional) areas, are bounded to particular authoritative datasets and often rely on data model-dependent procedures, which are hard, if not impossible, to generalise and replicate.
With this background, this work aims to be the first step towards a broad assessment of the enablers and barriers of integrating authoritative datasets from European NMAs with datasets from OSM. The overall purpose is to provide a preliminary set of recommendations on interoperability matters, not only semantic but also technical, organisational and legal, to ultimately support the establishment of European data spaces. To achieve this, the study proposes an experiment based on Free and Open Source Software for Geospatial (FOSS4G) to test the integration of country-wide address datasets from a European NMA and the OSM project, discussing the outcomes and identifying lessons learnt and general pros/cons of data integration mainly from the technical perspective. To the authors' knowledge, this is the first time the integration between OSM and authoritative datasets at the national level is addressed in literature. Evaluating the quality of OSM clearly remains a key and preliminary step to such integration, but is outside the scope of the study; an extensive review on how OSM quality has been measured so far is available in literature (Senaratne et al., 2017).
The remainder of the paper is structured as follows. After an analysis of the state of the art on the integration between authoritative and OSM datasets provided in Section 2, Section 3 describes the experiment of integration between the authoritative dataset of national Finnish addresses and its OSM counterpart, adopting FOSS4G technology. Drawing from the results of the experiment, Section 4 closes the paper by discussing implications of, and providing recommendations on, the integration of citizen-generated data (and OSM in particular) for the successful establishment of data spaces.

BACKGROUND ON INTEGRATION BETWEEN AUTHORITATIVE AND OPENSTREETMAP DATA
Being a citizen-driven project, OSM has been studied-and sometimes questioned-since its very beginning in relation to the quality of its data. This aspect was first addressed by some early studies, e.g. Haklay (2010) and Girres and Touya (2010), who described and measured various quality parameters on OSM data through in-depth assessments, e.g. attribute, semantic, positional and temporal accuracy, logical consistency, completeness, lineage, purpose and usage. Quality assessment methods are of course not only relevant to the case of OSM but, more generally, for all types of Volunteered Geographic Information (VGI) (Senaratne et al., 2017). Many other studies investigated those different quality elements, focusing on the semantic (Vandecasteele and Devillers, 2013) and positional (Cipeluch et al., 2010;Helbich et al., 2012) aspects, completeness (Koukoletsos et al., 2012), interoperability  or, more frequently, on a combination of them, e.g. Fan et al. (2014).
Most of the available studies on OSM quality adopted an extrinsic approach, i.e. they compared OSM data with reference datasets produced by National Mapping Agencies (NMAs) or local, national or international authoritative bodies that are considered as the ground truth. Fernandes et al. (2020) provided a bibliometric review of 37 studies on the integration between VGI and authoritative data, even if only 14 of them use OSM as the main source for VGI. Among them Du et al. (2012), Abdolmajidi et al. (2014), Fan et al. (2016) and Brovelli et al. (2017) developed and tested methodologies to evaluate the quality of OSM data by comparing it against their authoritative counterparts, using the road network as a use case applied at the local level (city or town) in different places around Europe (UK, Sweden, Germany and Italy, respectively). Instead of comparing OSM with authoritative datasets, other studies such as Barron et al. (2014), Minghini and Frassinelli (2019) and Madubedube et al. (2021) assessed OSM quality through intrinsic approaches, i.e. by only looking at the history of the OSM data itself (e.g. the frequency of update or the total number and nature of contributors editing the same objects).
Nevertheless, just a few authors have focused their efforts on combining authoritative and/or OSM data together to produce integrated datasets. This conflation process involves different tasks, which can include updating, change detection, enhancement and integration of spatial data (Wiemann and Bernard, 2010). Pourabdollah et al. (2013) compared OSM and the British Ordnance Survey's Vector Map District data on road network. Differently from many other authors, who focused their attention on geometrical accuracy and completeness, they focused on semantic information, conflating road names and reference codes with the main result to enrich the OSM dataset with authoritative information. The potential contribution of OSM data to the increase of mapped features of the authoritative road network in Brazil was the goal of Silva et al. (2021): their analysis confirmed that OSM is a promising source of information in areas with missing or outdated map data. Zhou et al. (2015) presented instead an extensive method used to dynamically integrate OSM data from the neighbouring states Vietnam and Pakistan into a common data model. Other studies focused on the semantic enrichment of authoritative datasets by extracting information from specific OSM tags related to building usage (residential/non-residential), e.g. Kunze and Hecht (2015). Similarly, Fonte et al. (2017a) developed an automated, FOSS4G-based application to convert OSM into land use/cover maps having the same nomenclature of authoritative products. This allowed not only to compare the OSM-derived products against the authoritative ones, but also to enrich the latter through the production of integrated datasets (Fonte et al., 2017b). However, the most frequent and structured case of integration between OSM and authoritative datasets to date is represented by so-called OSM imports, or bulk imports (https://wiki.openstreetmap.org/wiki/Import). These consist of uploading external datasets, produced e.g. by governments or other institutions and having a license compatible with the ODbL, into the OSM database. Imports are tricky operations and shall be performed based on specific guidelines issued by the OSM community (https://wiki.openstreetmap.org/wiki/Import/Guidelines); an updated list of OSM imports performed so far is maintained at https://wiki.openstreetmap.org/wiki/Import/Catalogue.

INTEGRATION EXPERIMENT: DATA SOURCES
The selection of the authoritative dataset to be integrated with OSM plays an important role in the phases of analysis and harmonisation of data models, the transformation process and its possible reuse for other areas or use cases. The dataset selected in this work to test the integration approach between authoritative and OSM data is about addresses. In addition to being usually modelled as points with a reasonably simple data model, addresses represent reference datasets for a multitude of applications. They are not only a core dataset produced and maintained by governments at all levels, but also one of the most important datasets within the OSM ecosystem, considering e.g. the wealth of OSM-based routing or emergency applications . Furthermore, addresses represent a typical case where the process of updating the authoritative dataset is traditionally expensive and not frequent and might thus highly benefit from an integration with OSM.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W2-2021 FOSS4G 2021 -Academic Track, 27 September-2 October 2021, Buenos Aires, Argentina While the study maintains a European perspective for the general issue of integrating authoritative and citizen-generated datasets, as mentioned in Section 1 the scale of the experiment was limited to a national geographical area for both computational and semantic reasons. This is in contrast with all the studies mentioned in the literature review presented in Section 2, which have been always limited to more restricted (local or regional) areas. Given the focus on address data, we identified Finland as a useful and practical example because of: i) the easy access to the authoritative address dataset, and ii) the wide coverage of OSM addresses. The two address datasets used in the experiment are described in the following Sections 3.1 and 3.2 together with their main characteristics and modes of access.

OpenStreetMap
OSM data is organised using a simple conceptual data model combining a geometric component with a semantic component (Ramm and Topf, 2011). The geometric component can be described using three types: nodes, ways and relations. Nodes are characterised by a latitude and a longitude and represent standalone point features such as points of interest, trees, street signals and benches; ways are an ordered list of up to 2000 nodes representing both linear features (e.g. roads and rivers) and areal features or polygons (e.g. buildings and land cover areas); relations are data structures used for both modelling linear and areal features with more than 2000 nodes (e.g. lakes) or describing a relationship between two or more geometry types (nodes, ways and/or other relations), e.g. transportation networks. The semantic component consists of one or more attributes, named tags and each formed by a key-value pair.
Information on how addresses are modelled in OSM is available at https://wiki.openstreetmap.org/wiki/Addresses. The keys of all the tags used to identify addresses share the common addr: prefix (https://wiki.openstreetmap.org/wiki/Key:addr). The keys associated with address information used in this experiment are described in From the geometrical perspective, there is no single way to model OSM addresses. The addr: keys can be associated to single nodes outside, inside or on the perimeter of a building footprint; or they can be directly associated to the ways representing building polygons. Such different mapping practices are usually agreed upon by local, regional or national OSM communities and may also follow rules issued by national registry/statistical services.
In the case of OSM addresses in Finland, all the abovementioned approaches are used and there seems to be no specific internal rule agreed upon by the community on how to perform mapping on this object category. In addition to that, address information in OSM can be also added to points of interest like shops, museums, offices, etc., sometimes duplicating addresses already available in other objects.
Extracting data from the OSM database can be performed in different ways, depending on the user needs. The most popular ones

National Land Survey of Finland
The National Land Survey (NLS) of Finland is the Finnish NMA (https://www.maanmittauslaitos.fi/en) . As such, it is the Finnish governmental provider of and responsible for the national geospatial information. The NLS has recently started to provide access to its geospatial datasets through the newly established OGC API -Features standard (https://ogcapi.ogc.org/features), which provides an easy and developer-friendly way to both expose and consume geospatial vector features on the web.   Background map: © OpenStreetMap contributors.
polygons. Also, it is visually clear that OSM addresses in this area, as it usually happens in urban areas (see also Section 4.2), are more than NLS addresses.

Integration process
This section describes the procedures implemented to pre-process the OSM and NLS address datasets, mainly to extract the relevant information described in Sections 3.1 and 3.2, and to integrate them into a single dataset. Given that the data model for address data is richer in INSPIRE (that the NLS dataset conforms to) than in OSM, we considered that the best approach for the integration of the two was to transform the INSPIRE-compliant NLS dataset against the OSM data model. This was a fully arbitrary choice; the opposite one, i.e. the transformation of the OSM dataset against the NLS/INSPIRE data model (corresponding to the use case of an NMA wishing to complement its dataset with information from OSM) would be possible as well. All the steps described in the following were applied as a sequence of processing algorithms within the Graphical Modeler of the open source QGIS software (https://qgis.org) and are publicly shared on an online repository (https://github.com/MarcoMinghini/INSPIRE-OSM) to maximise their re-use and improvement.
In the case of OSM, a number of steps were performed to extract the relevant information from the OSM Planet and make it available in a format suitable for integration with NLS data. The Osmium Tool (https://osmcode.org/osmium-tool) was used to filter the Planet OSM both geographically (on Finland) and semantically, the latter by only extracting objects with a non-null value for the addr:housenumber key. The resulting dataset, transformed in the GeoPackage format, included both points (OSM nodes) and polygons (OSM ways) for the reasons explained in Subsection 3.1. Polygons were converted to points using their centroids and then merged with the pointwise addresses in a unique point dataset.
Several OSM address objects did not include the key addr:city filled with a value. Thus, this information was retrieved from the Local Administrative Units (LAU) dataset, downloaded from the Eurostat GISCO website (https://ec.europa.eu/eurostat/web/gisco/ geodata/reference-data/administrative-units-statistical-units/lau) and then processed (since it originally included names in different languages) to match the existing information in OSM. Other OSM addresses were instead lacking the street name (key addr: street) and, similarly to those without the building number, were excluded from the dataset. After this process, OSM objects having the same unique combination of values for the keys addr:city, addr:street, addr:housenumber and addr:unit were considered duplicated and were removed from the dataset. Some additional minor processing steps were performed on the OSM dataset, but they are only described in the code in order not to make reading more difficult.
To transform the NLS address dataset against the OSM data model, a mapping between the NLS/INSPIRE and the OSM attributes was first required. This is shown in  The three attributes that, at a national level (i.e. inside the same country), uniquely identify an address are the city name, the street name and the address number. With regard to the address number, both the locator designator addressNumber attribute in the NLS dataset and the addr:housenumber attribute in the OSM dataset store it as a string including the number (plus additional elements such as letters, e.g. 12b). To align the two values, a simple rename of the NLS attribute was sufficient. Instead, the name of the street is documented in 3 attributes in the NLS dataset: component ThoroughfareName name fin (corresponding to the name in Finnish), component Thoroughfare Name name swe (corresponding to the name in Swedish) and, lastly, component ThoroughfareName name sme (corresponding to the name in Sami). We selected the first (see Table 3) whenever available (i.e. 99% of the times) and the second otherwise. The third one (name in Sami) was never used as it did not appear in any object. In the case of the city name, the value of the NLS dataset attribute component AdminUnitName 4 is a number representing the code id of the LAU (instead of its name). The name was thus retrieved from the LAU dataset and then substituted to the city id. To complete the transformation, the NLS attribute component AdminUnitName 1 (indicating the country) was renamed addr:housenumber and its values, all equal to Finland, were simply substituted with the ISO 3166-1 alpha-2 two letter country code in upper case (FI) in accordance with the OSM rules. As a last step, all duplicated addresses (i.e. addresses having exactly the same city, street and housenumber), which were sometimes appearing within different buildings close to each other, were identified and removed. The pre-processed OSM and NLS address datasets were finally merged into a single, integrated dataset with the basic rule to keep the attribute values from the NLS dataset in all the cases where the values of the fields addr:city, addr:street and addr:housenumber were the same in the two datasets. Figure 2 summarises the pro- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W2-2021 FOSS4G 2021 -Academic Track, 27 September-2 October 2021, Buenos Aires, Argentina Figure 2: Graphical model of the processing performed to integrate the OSM and NLS address datasets in a single dataset. cessing steps described above, which were implemented inside the QGIS Graphical Modeler.

Results
The two original datasets are collected and updated through very different procedures, thus it is not surprising that they also have large differences in the number of objects mapped and their distribution across the country. The NLS dataset, which was harmonised to the OSM data model, included around 3.3 million addresses, while the OSM dataset had just over 0.5 million (about 390000 polygons and 130000 points). The removal of duplicates brought the number of addresses down to 1.8 million for NLS and around 0.4 million for OSM.
The relative geographical distribution of the datasets is also very uneven. Considering the NLS address dataset as the reference one, Figure 3 shows that OSM data is in general much less complete, with a high variety of patterns. The 10x10 km EEA reference grid (https://www.eea.europa.eu/data-and-maps/data/eeareference-grids-2) was used to aggregate data, count the number of OSM and NLS addresses included in each cell and compute their percentage ratio. Approximately 63% of the cells where there is at least one address in the NLS dataset do not contain any address in the OSM dataset (white squares in Figure 3); the percentage ratio is less than 10% for about 24% of the cells and between 10% and 50% for another 7% of the cells. In slightly more than 6% of the cells, the percentage ratio grows between 50% and 100% and only a few cells include more addresses in OSM than in the NLS dataset (percentage ratio higher than 100%).
Some of the most densely populated areas (based on the 2019 population figures included in the LAU dataset) are among the administrative areas that are most complete in OSM: 4 among the 6 most populated Finnish cities (Helsinki, Espoo, Vantaa and Turku) have average percentage ratios ranging between 75% and 97%. This confirms some typical findings from the literature, showing that areas with higher population densities (i.e. urban areas) tend to be those where most OSM mappers add and update information as they either live of visit such areas, see e.g. Zielstra and Zipf (2010), Dorn et al. (2015) and Brovelli et al. (2016). In addition to that, in some of those cities extensive OSM imports from authoritative sources have been performed in the past, thus highly increasing the number of addresses. As an example, an import of buildings that also included address information was performed starting in 2014 in the whole Helsinki region (https://wiki.openstreetmap.org/wiki/Helsinki region building import).
The final, integrated address dataset includes around 1.92 million address points, with 96% of them being only present in the original NLS dataset and approximately 81000 of them only present in OSM. It should be clarified that this high number includes several cases where the name of streets or cities is mispelled (or spelled differently) in OSM with respect to the NLS dataset, which may highlight weaknesses in the OSM dataset rather than gaps in the one from NLS. However, there are also cases where OSM actually includes more detailed or up-to-date information and thus The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W2-2021 FOSS4G 2021 -Academic Track, 27 September-2 October 2021, Buenos Aires, Argentina improves the authoritative NLS dataset. As an example, Figure  4 shows an area in Helsinki where addresses in the NLS dataset, each associated to a single building, correspond to multiple addresses in the OSM dataset, where the building numbers are complemented by letters (A, B, C, etc.) and have a more specific position, most probably in correspondence of the single building entrances.
In addition to the QGIS Graphical Modeler workflow, the online repository at https://github.com/MarcoMinghini/INSPIRE-OSM also includes a sample of the final, integrated address dataset limited to the city of Helsinki for demonstration and testing purposes.

DISCUSSION AND CONCLUSIONS
Despite being simple, the experiment presented in this paper is useful enough to understand the complexity inherent to the process of integrating datasets which differ in nature and content. As such, the lessons learnt are a good first step to formulate helpful recommendations for the successful establishment of the data spaces envisioned in the European strategy for data (European Commission, 2020a).
First and foremost, any data integration process should be carefully prepared. This means that the datasets to be integrated shall be well-known in terms of their creation/update process, geometric representation, encoding, semantic content and quality (measured, in principle, through all the parameters that are important for the integration). If quality information is not available a priori, then a preliminary quality assessment becomes the first key step. This work deliberately assumed that the quality of the OSM address dataset across Finland was such that a comparison and integration with the NLS address dataset was actually possible without a dedicated, in-depth quality assessment. This was mainly justified by the very local nature of OSM, which allows to assume that the positional accuracy of OSM addresses is sufficiently high.
In contrast, the possible low degrees of OSM address completeness (i.e. lack of addresses in some parts of the country) and semantic accuracy (i.e. wrong or missing address information) are directly taken into account in the integration process.
From the purely technical perspective, which was the focus of this work, a number of conclusions can be drawn. Results show that the integration between the OSM and NLS address datasets could improve both datasets, since the integrated dataset was achieved by 'taking the best' from both the initial ones. In general, results show that, while authoritative data have a more homogeneous coverage and higher positional accuracy, OSM has typically an uneven spatial coverage but holds the potential to include more updated or detailed information that authoritative datasets can only achieve, if ever possible, in a much longer time. This means that, in general, both the NMA and OSM communities might benefit from such integrations for improving their data. Ideally, such integration processes could be automated and executed on a regular basis to achieve increasingly more updated and higher-quality datasets.
As mentioned earlier, one of the main contributions of this work is that the integration between OSM and authoritative data happened at the national level, in contrast to previous work that was all focused on the regional or local scale (see Section 2). The experiment also showed that, although integration procedures involving OSM data are in general hard to generalise because of the peculiar nature and characteristics of the authoritative datasets involved (see again Section 2), the interoperability ensured by INSPIRE would allow the process to be seamlessly extended to other INSPIRE-compliant address datasets available across the EU.
From the software perspective, the experiment described proved that FOSS4G, and in particular QGIS and its Graphical Modeler, is a fully suitable Extract-Transform-Load (ETL) tool to perform the data processing involved in the integration (see Section 4). However, given the focus on nationwide datasets, it is worth mentioning that the process required a minimum computational capacity as it dealt with huge amounts (millions) of address features, which-if extended to all Europe-would need a proper infrastructure in place.
As mentioned in Section 1, this experiment is the first step within a broader research framework investigating enablers and barriers for the integration between authoritative and citizen-generated (in particular OSM) datasets in Europe. As such, it only focused on some interoperability aspects (technical and semantic) required for the integration, but it did not address other aspects such as the legal and organisational ones. Legal interoperability looks at dataset integration from the perspective of their licenses and terms of use. Whilst integration might be technically possible, the lack of license compatibility might indeed represent a serious obstacle to the actual use of the integrated datasets. This applies in both directions. To be integrated in OSM, a dataset shall have a license compatible with the ODbL: examples of such licenses include CC0 (Creative Commons, 2021b), while other licenses are either not compatible or (as in the case of NLS's CC BY 4.0) not compatible in the absence of an additional waiver for reasonable attribution and unrestricted distribution (https://wiki.openstreetma p.org/wiki/Import/ODbL Compatibility). NMAs might face similar issues, since OSM's ODbL requires the release of the integrated dataset under the same ODbL license, which might be against existing national policies. In this regard, the recently published Open Data Directive (European Parliament and European Council, 2019) has pushed the publication of so called 'highvalue datasets' (i.e. data-sets the re-use of which is associated The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W2-2021 FOSS4G 2021 -Academic Track, 27 September-2 October 2021, Buenos Aires, Argentina with high economic and societal benefits) under open licenses, which should favour their integration with other data sources such as OSM. The final list of high-value datasets, together with the requirements for their provision (including the license), will be provided in a legal act foreseen for late 2021. In addition to legal interoperability, organisational interoperability both within and across organisations (including governments and OSM communities) will be key to make data integration a common, standardardised and policy-enabled process rather than an isolated and ad hoc exercise.
As a final note, readers should be aware that the definition of OSM as a citizen-generated database is increasingly challenged. Not only governments and other organisations have largely contributed to OSM through imports, but today more and more private companies using OSM for their business are heavily adding OSM data through their paid staff (Anderson et al., 2019). Hence, while still remaining a citizen-driven initiative, OSM has evolved into a broad and complex ecosystem with both the need to refine its governance and the potential to maintain and improve what is currently one of the most used global datasets worldwide.