SEMANTIC LINKING SPATIAL RDF DATA TO THE WEB DATA SOURCES

Large amounts of spatial data are hold in relational databases. Spatial data in the relational databases must be converted to RDF for semantic web applications. Spatial data is an important key factor for creating spatial RDF data. Linked Data is the most preferred way by users to publish and share data in the relational databases on the Web. In order to define the semantics of the data, links are provided to vocabularies (ontologies or other external web resources) that are common conceptualizations for a domain. Linking data of resource vocabulary with globally published concepts of domain resources combines different data sources and datasets, makes data more understandable, discoverable and usable, improves data interoperability and integration, provides automatic reasoning and prevents data duplication. The need to convert relational data to RDF is coming in sight due to semantic expressiveness of Semantic Web Technologies. One of the important key factors of Semantic Web is ontologies. Ontology means “explicit specification of a conceptualization”. The semantics of spatial data relies on ontologies. Linking of spatial data from relational databases to the web data sources is not an easy task for sharing machine-readable interlinked data on the Web. Tim Berners-Lee, the inventor of the World Wide Web and the advocate of Semantic Web and Linked Data, layed down the Linked Data design principles. Based on these rules, firstly, spatial data in the relational databases must be converted to RDF with the use of supporting tools. Secondly, spatial RDF data must be linked to upper level-domain ontologies and related web data sources. Thirdly, external data sources (ontologies and web data sources) must be determined and spatial RDF data must be linked related data sources. Finally, spatial linked data must be published on the web. The main contribution of this study is to determine requirements for finding RDF links and put forward the deficiencies for creating or publishing linked spatial data. To achieve this objective, this study researches existing approaches, conversion tools and web data sources for relational data conversion to the spatial RDF. In this paper, we have investigated current state of spatial RDF data, standards, open source platforms (particularly D2RQ, Geometry2RDF, TripleGeo, GeoTriples, Ontop, etc.) and the Web Data Sources. Moreover, the process of spatial data conversion to the RDF and how to link it to the web data sources is described. The implementation of linking spatial RDF data to the web data sources is demonstrated with an example use case. Road data has been linked to the one of the related popular web data sources, DBPedia. SILK, a tool for discovering relationships between data items within different Linked Data sources, is used as a link discovery framework. Also, we evaluated other link discovery tools e. g. LIMES, Silk and results are compared to carry out matching/linking task. As a result, linked road data is shared and represented as an information resource on the web and enriched with definitions of related different resources. By this way, road datasets are also linked by the related classes, individuals, spatial relations and properties they cover such as, construction date, road length, coordinates, etc. * Corresponding author


INTRODUCTION
Linked Data generally means semantic linking data in the data sources and using on the web (Bizer et al., 2009;Heath and Bizer, 2011).In other words, Linked Data is a method of publishing structured data according to Linked Data principles in order to perform semantic queries on data in the data sources.Linked Data provides interoperability among data sources and executing semantic queries.Linked Data technologies provide semantic processing, inference, merging and visualization of data associated with data sources.In this context, Linking Open Data Cloud (LOD Cloud) of Linking Open Data Community Project (Data, 2010) is created for the dissemination and the adoption of the Linked Data Principles (Berners-Lee, 2006).The main purpose of the project is to link data and publish on the web according to Linked Data Principles by using the existing data sources (Geonames, Dbpedia, Foaf, MusicBrainz, etc.).LOD Cloud covers different topical domains.
Large amounts of spatial data are hold in relational databases.Spatial data is an important key factor for creating spatial RDF data.Linked Data is a way to publish and share data on the Web.In order to define the semantics of the data, links are provided to vocabularies (ontologies or other external web resources) that are common conceptualizations for a domain.Linking vocabulary of resource data with globally published concepts of domain resources combines different data sources and datasets, makes data more understandable, discoverable and usable, improves data interoperability and integration, provides automatic reasoning and prevents data duplication.The need to convertion of spatial data to RDF is coming in sight due to semantic web applications.One of the important key factors of Semantic Web is ontologies.The semantics of spatial data relies on ontologies.Linking of spatial data from relational databases to the web data sources is not an easy task.Objective of this study is to research existing approaches, conversion tools and web data sources for relational data conversion to the spatial RDF.In this paper, we have investigated current state of spatial RDF data, standards, open source platforms (particularly D2RQ, GeoTriples, Ontop, etc.) and the Web Data Sources.Moreover, the process of spatial data conversion to the RDF and how to link it to the web data sources is described.The implementation of linking spatial RDF data to the web data sources is demonstrated with an example use case.We have linked road data to the related popular web data source, DBPedia.As a result, linked road data is shared and represented as an information resource on the web and enriched with definitions of related different resources.By this way, road datasets are also linked by the related classes, individuals, spatial relations and properties they cover such as construction date, road length, coordinates, etc.
A majority of the necessary spatial data for Semantic Web applications are stored in the database.Ontologies, on the other hand, are the most important role for performing "Semantic Web".The problem of determination the relationships between ontologies and databases is one of the most popular research topics in the field of the Semantic Web.This subject named as Ontology-database mapping.It brings different research topics to the agenda.In this context, there are studies in the literature such as the creation of an ontology from an existing database, the discovery of mappings between database and existing ontology and the publishing of spatial data sources as Linked Data.Especially, those projects such as OpenStreetMap, Geonames, DBpedia, etc. provide spatial data used as a based map, has make compulsory the use of these data sources for the Semantic Web applications.In geospatial arena, geospatial data were published as linked data with projects and initiatives such as LinkedGeoData (Stadler et al, 2012), GeoLinkedData1 , OS LinkedData2 , etc.These are known as Geospatial Linked Data and will enable semantic queries on the different data sources using Linked Data Technologies.From this point forward, for the publishing spatial data as Linked Data, the requirements for the establishment necessary technological infrastructure in the spatial data infrastructure should be determined.The most important requirement for publishing spatial data as Linked Data is the transformation tools.There are available open source tools, such as Geometry2RDF, TripleGeo and GeoTriples.In this study, General Directorate of Highways Road Data was published as Linked Data as well as, published Linked Data were compared and the advantage is examined relative to each other of tools.The focus of this paper is semantic linking between geospatial data and data sources in the context of Web of Data.
Real-world objects and abstract concepts are represented as RDF data.RDF data is represented as subject, predicate and object in RDF triples model and Linked Data uses this model as well.One way to think of a set of RDF triples is, an RDF graph with identifiers (URIs) for all resources.Data is linked each other in the graph and linked to external data sources with these identifiers.Linking published data to the external data sources enriches the data with other data sources information.This extends the data content and provide data integrity.As one of the source presents some part of the data and the other sources present the other missing part of the same data and so.There are some academic studies and projects about Linking data to the external data sources such as Koho et al., 2018;Zhu et al., 2017;Wetz et al., 2012;Schabus and Scholz 2017;Iwaniak et al., 2017;Qiu et al., 2017;Bischof et al,. 2018;Margan et al., 2018;Wiemann and Bernard 2016;Ballegooie et al., 2017;Ding et al., 2009. Kohlo et al., implemented  The remainder of this paper is organized as follows.In the next section, we describe semantic linking geospatial RDF data with main steps respectively.This section includes a use case for publishing geospatial data and linking RDF data to external data sources, also implementation results and conclusion.

SEMANTIC LINKING SPATIAL RDF DATA
In the web of documents, hyperlinks connect documents into global information space.Hyperlinks are called RDF links in the context of Semantic Web.RDF links describe relationships between same concepts in different data sources.These relationships enable semantic web crawlers to navigate between different data sources.In this way, RDF links are different from hyperlinks.
RDF links are classified as external RDF links and internal RDF links.While internal RDF links connect concepts within a Linked Data source, external RDF links connect concepts between different Linked Data sources.Finding RDF links is explained in Section 2.5.External RDF links are crucial for the Web of Data as they are the glue that connects a linked data source into a global data space.This provides the usage of RDF definitions by machines on the global data space.Hence, required information for semantic web applications may be found as if a single RDF graph is queried.That is, data and information from different Linked Data sources can easily be combined by using RDF data model and semantic web standards (Heath and Bizer, 2011).

Determining Geospatial Data Sources
Spatial data are stored with thematic attributes and geometric properties in vector databases.Therefore, spatial attributes must be converted to RDF in addition to the thematic attributes for query and analysis required in the semantic web applications.Publishing spatial data as RDF make it possible to implement semantic web technologies in accordance with specific standards.The main purpose of obtaining geospatial RDF data is to increase semantic interoperability between different spatial data sources and implement spatial analysis on web.Hence, we select road dataset created by General Directorate of Highways as geospatial data source.
In order to publish spatial data as RDF, data must be associated with the relevant class and attributes.Data ontology obtained from the database schema is the linking data in the data source.Hence, primarily data ontology is obtained.A triple store is needed to store RDF data.Some of them are AllegroGraph, Oracle Spatial and Graph, GraphDB, Parliament, Strabon, uSeekM, Virtuoso RDF Triple Store.

Selecting Transformation Tool
Geospatial linked data requires RDF model for representing geospatial data.Therefore, existing transformation tools have been examined and their properties have compared.Following subsections include detailed information about tools.

Geometry2RDF:
Geometry2RDF 3 is a library for generating RDF files for geometrical information (which could be available in GML or WKT).The GML and WKT is manipulated with GeoTools.The current version of the library works with Oracle geospatial databases and relies on Jena.Geometry2RDF has been developed by the GeoLinked Data (.es) team.Geometry2RDF is a Java-based tool that generates RDF triples from geometrical information, which can be available in GML or WKT.The tool takes as input any ESRI shapefiles, spatial DBMS (Oracle, PostgreSQL, MySQL, etc), transforms the data into GML (using GeoTools 4) and then generates RDF (using Jena 5) consistent with the NeoGeo vocabulary.The default CRS used for the geometry is WGS84.The architecture is modular enough to run as a standalone platform or as a library.Figure 1 represents processing flow diagram of Geometry2RDF.
It is a modular enough to work as a library (de León et al., 2010).
Road data is converted to the RDF file using Geometry2RDF.Triples can be exported according to the GeoSPAQL standard, the WGS84 vocabulary and the Virtuoso RDF vocabulary.
In addition TripleGeo allows on-the-fly reprojection between CRSs, e.g., transform geometries from GreekGrid87 (a local CRS) into WGS84 (used for GPS locations) (Patroumpas et al., 2014).TripleGeo,shp2GeoSPARQL).GeomRDF is based on a vocabulary that reuses and extends GeoSPARQL and NeoGeo so that geometries can be defined in any CRS, and represented both as structured geometries and GeoSPARQL standard compliant (Hamdi et al., 2014).

Spatial data input formats
Figure 2. GeomRDF components (Hamdi et al., 2014).GeoTriples users can use the generated mappings in the system Ontop-spatial to view their data sources virtually as linked data.Ontop-spatial is a geospatial Ontology-Based Data Access system which performs on-the-fly GeoSPARQL-to-SQL translation over spatially-enabled relational databases using ontologies and mappings generated by GeoTriples (Kyzirakos et al., 2014).

Spatial data input formats
Figure 3.The architecture of GeoTriples (Kyzirakos et al., 2014).Conversion of Coordinate Reference System: supported.

Converting Geospatial Data to RDF Data
The basic aim for the converting geospatial data to RDF Data is to supply the reusability of geospatial data.The way of making it discoverable and therefore linkable goes through linked data.
They enable crawlers of Semantic Web search engines to navigate backward and forward links.Linked data provides a standard mechanism for defining the geospatial data and meaning of relationships between entities in the geospatial data.
Linked data uses web standards and common data model for defining data.That makes it possible to perform some of the semantic web applications on the one graph.As seen as in Section 2.2, the best suitable tool is GeoTriples.That's why we select GeoTriples for converting geospatial data to RDF data.GeoTriples enables different output formats such as RDF/XML, Ntriples, Turtle and N3.RDF data are serialized as these output formats and file names are saved as Road.rdf,Road.nt,Road.ttl.

Choosing External Data Sources
For exploring and finding existing data sources, there is no a global directory or standard Semantic Web tool.But, Linked Data browsers and Semantic Web search engines (Tabulator5 , Marbles6 , Sigma, sameAs, Sindice, FalconS, Watson, Swoogle, LOD Milla, LOD Live, Aemoo, Disco Hyperdata Browser, Openlink Data Web Browser, Object Wiever, Openlink Vistuoso, MoB4LOD, OpenData Communities and etc.) can be used as useful starting points.They help to find URIs including concepts for extending geospatial data definitions.The aim of this step is to provide extensive description for concepts of geospatial data.For extending concept definitions in the geospatial data some existing data sources are classified in the following list: Geonames7 , is a geographical database available and accessible through various web services under a Creative Commons attribution license.
The Basic Geo (WGS84) vocabulary8 defines terms such as lat and long for describing geographically-located things.
DBpedia9 is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects.This structured information resembles an open knowledge graph (OKG) which is available for everyone on the Web.
We selected DBpedia as an external data source and used GeoSPARQL ontology for defining geometry types.A large majority of existing studies of link discovery have yet to exploit semantic richness of spatial data.Topology and geometry is essential to GIS and important for understanding relationships between geographical features in the spatial context.Also, integrating linked spatial data in different data sources and visualization of query results require the topological relationships ontology.

Making Links to External Data Sources
in the context of Linked Data comes from concept definitions and links between concepts.Therefore, in this section, we focused external links such as "owl:sameAs" and "owl:equivalentClass".Once suitable data sources have been identified, links can be created with related tools such as Silk 10 , Limes, RiMOM, idMash and ObjectCoref.RDF links can be set manually or automatically.The choice of method will depend on the data set and the context in which it is published.Manual interlinking is typically employed for small, static data sets, while larger data sets generally require an automated or semi-automated approach (Heath and Bizer, 2011).
We implemented linking automatically with Silk Linking editor.The Silk Link Discovery Framework is a tool for discovering relationships between data items within different Linked Data sources.Silk compares property values or sets of entities with a number of similarity metrics such as string, numeric, data, URI, and set comparison methods as well as a taxonomic matcher that calculates the semantic distance between two concepts within a concept hierarchy.Each evaluates to a similarity value between 0 or 1, with higher values indicating a greater similarity (Volz et al., 2009).
OWL extends the expressivity of RDFS with additional modelling primitives.In the context of Linked Data, OWL primitives are owl:equivalentClass, owl:equivalentProperty, rdfs:subClassOf and rdfs:subPropertyOf.Those provide powerful mechanisms for defining mappings between terms from different geospatial data sources.In this work the implementation of linking spatial RDF data to the web data sources is demonstrated with an example use case.We have linked a local road data (General Directorate of 10 http://silkframework.org/ Highways Road Data) to the DBPedia web data source.Figure 4 demonstrates our implementation architecture.We used the Silk to set RDF links from the local Road data source to the DBPedia external data source on the Web.After loading RoadTrb data source and external data source DBPedia DBRoad file to the silk workbench, linking rules are defined (Figure 5). Figure 6 and Figure 7 demonstrates our linking results.As seen, two different road instances from RoadTrb source linked to Dbpedia Road instance with "owl:same as" relation.

CONCLUSION
Linking data of resource vocabulary with globally published concepts of domain resources combines different data sources and datasets, makes data more understandable, discoverable and usable, improves data interoperability and integration, provides automatic reasoning and prevents data duplication.
In this work, we present a case study for publishing linked spatial data and linking to the external data sources.We have linked a local road data to the DBpedia external data source.As a result, linked road data is shared and represented as an information resource on the web and enriched with definitions of related different resource.Linking published data to the external data sources also enriches the data with other data sources information.As one of the source presents some part of the data and the other sources present the other missing part of the same data and so on.Also, with linked data technologies spatial Linked Data enables semantic queries on the different data sources.
One of the purposes of this study is to analyze the linked spatial data sources.There are deficiencies for publishing linked spatial data and finding internal or external RDF links.Firstly, what spatial data makes linked data is ontology and links of its content.Assuming that a data provider wants to enrich own spatial data source, he/she encounters many problems.Where are the linked spatial data sources on the web?We used LOD Cloud and semantic web search engines.LOD cloud includes few data sources for linking spatial data.The scope of LOD Cloud must be extended in the spatial context.Is there any ontology that defines the data source?In the semantic web context, the role of ontologies is to provide concept definitions.In terms of geospatial ontologies, there is not a standard ontology for defining spatial data.Therefore, each data provider tends to develop their own ontology for describing spatial data.This is rather unnecessary because there may be an ontology on the web that have the same function.In this study, we try to put forward how to publish linked spatial data and find to RDF links between data sources.
There are also studies about linked data qualities in the literature (Sam et al., 2018).In our further work we will evaluate and analyse qualities of our linking results and perform studies about linked data composition.
Moreover, many of the linked data sources use different schemas for representing their data and also they have various domains.In our future work, we will implement schema matching for more precisely data interlinking and also perform data interlinking among various themes of the same feature.
linking historical data to the external related datasets.They collected and transformed the data about prisoners of the second World War in Finland into linked data and integrated into WarSampo dataset.Zhu et al., proposed a multidimensional and quantitative interlinking approach for Linked Data in the geospatial domain according to the characteristics and roles of geospatial data in data discovery.They built data intra-links in the Chinese National Earth System Scientific Data Sharing Network (NSTI-GEO) and data-links in NSTI-GEO with the Chinese Meteorological Data Network and National Population and Health Scientific Data Sharing Platform.Wetz et al. linked   LOD entities to a local thesaurus to expand and enrich the information stored in the thesaurus.Schabus and Scholz proposed a Linked Data approach within a manufacturing organization to integrate datasets originating from different business units and heterogeneous data sources based on an ontology describing the indoor space and production processes.Qiu et al. proposed  an ontology-based approach that links environmental models and disaster-related data.Bischof et al. enriched integrated statistical open city data with linked data and computed missing values.Margan et al. used DBPedia to enrich air pollution information and specify areas having harmful levels of particulate pollution.

:
ESRI shape file.Supported output format: only RDF Non-spatial attributes: Not supported.(From version 1.3 supports thematic attributes with the support of RDF Mapping language ( : shape file, GML, spatial DBMS.Supported output format: only RDF/XML Non-spatial attributes: Not supported Supported vocabulary: only NeoGeo Vocabulary Conversion of Coordinate Reference System: supported 2.2.5 GeoTriples: GeoTriples is an open-source tool for transforming geospatial data from their original formats (e.g., shapefiles or spatially-enabled relational databases) into RDF.The following input formats are supported: spatially-enabled relational databases (PostGIS and MonetDB), ESRI shapefiles and XML, GML, KML, JSON, GeoJSON and CSV documents.

Figure
Figure 4. Implementation Architecture

Figure 5 .
Figure 5. Preparing Linking Rules with Silk Workbench Editor