SEMANTIC ANNOTATION OF EXISTING GEO-DATASETS : A CAS E STUDY OF DISASTER RESPONSE IN NETHERLANDS

Use of relevant geo-information is one of the impor tant issues for performing different tasks and proc esses in disaster response phase. In order to save time and cost, services cou ld be employed for integrating and extracting relev ant up-to-date geo-information. For this purpose, semantics of geo-information shou ld be explicitly defined. This paper presents our i nitial results in applying an approach for semantic annotation of existing geo-da tasets. In this research the process of injecting s emantic descriptions into geodatasets (information integration) is called semant ic annotation. A web system architecture is present ed and the process of semantic annotation is presented by using the Meta-Annotatio n approach. The approach is elaborated by providing an example in disaster response which utilizes geo-datasets in CityGML form at and further two languages of semantic web techno logy: RDF and Notation3. * Corresponding author: Amin Mobasheri, Delft Unive rsity of Technology, OTB research institute for the built environment, Section GIS technology. Address: Jaffalaan 9, 2628 BX, Delft, The Netherlands. E-mail: A.Mobasheri@tudelft.nl


INTRODUCTION
Disaster Response (DR) is defined as "the provision of assistance or intervention during or immediately after a disaster to meet the life preservation and basic subsistence needs of those people affected.It can be of an immediate, short-term, or protracted duration" (ISDR, 2013).The response phase starts whenever the disaster has happened.The organizations involved in disaster management search and plan to response to the disaster.Municipality, Police, Fire Brigade, and Medical Service are the main actors of disaster management in the Netherlands (Zlatanova et al., 2010).Each disaster response sector is responsible for a number of tasks which should be handled individually and/or by team-work.For handling such tasks, the actors need to be fed by sufficient, relevant and up-todate information.Large parts of this information are geoinformation related.
In order to make necessary decisions, the disaster managers and decision-makers need to have up-to-date existing geoinformation combined with dynamic field measurement; the dynamic information produced during the disaster should somehow be integrated with the existing geo-datasets.This paper addresses the issue of employing semantic web technology for integration of dynamic information with existing geo-datasets.For this issue, ontologies would be used in order to enrich existing geo-datasets with object semantic descriptions.This approach improves the current problems with information sharing (e.g.miss-communication problems and interoperability) as well as efficient management and analysis of disseminated geo-information.The other main reason for using Semantic Web technology is its ability in bringing reasoning as well as information integration capabilities to geo-services.Reasoning is especially important in disaster response since we are faced with large amount of heterogeneous geo-information and one would need to integrate and use all necessary information for performing certain tasks (e.g.evacuation).For example, in case of an evacuation, when certain routes are damaged and not used anymore, an intelligent service can evaluate other possibilities in means of offering new routes to evacuation site or even offering other kind of transportation (e.g.boat, helicopter, etc.) to the planners.This is only possible if the service is able to integrate various information and perform reasoning.Since time is a critical factor in disaster response, performing on-the-fly reasoning and geo-information integration would bring great benefits to disaster responders in means of time and cost efficiency.
A critical problem in information integration is that data sources have different data schemas (structure) and the translation and integration of these schemas are challenging.Semantic Web technology and more specifically ontologies can provide solution in this respect since they bring greater flexibility in means of data and conceptual schemas.In this sense, the main benefits would be that computer services can automatically perform the matching, translation and integration of information more efficient (cost and time) compared to humans, which is a great benefit for the disaster response domain where time is an important factor.The reason ontologies are used is that it simplifies the task of information integration for computer services since it is more flexible compared to other solutions that only use the structure of data schemas for means of integration (Chawathe et al., 1994;Arens et al., 1996).Another main advantage of using ontologies for information integration is that the results are more accurate and reliable (Wache et al., 2001).
The paper is organized in four sections.The next section presents an overview of semantic web technology and its ability to facilitate information selection and integration.Section 3 presents the process of semantic annotation along with an example in disaster response for integrating dynamic information about number of victims/injured people with existing geo-dataset of buildings.Finally, section 4 discusses the proposed approach and outlines future research and developments.

SEMANTIC WEB TECHNOLOGY
Tim Berners-Lee, the inventor of the World Wide Web, defines semantic web as "a web of data that can be processed directly and indirectly by machines" (Berners-Lee et al., 2001).The main aim of the semantic web technology is to provide structure to the meaningful content of web pages which leads to creation of an environment that web services can carry out sophisticated tasks for users (Berners-Lee et al., 2001).For this purpose ontologies play a crucial role.
One of the most cited definitions of ontology in computer science belongs to Gruber (Gruber, 1995), which defines it as "an explicit specification of a conceptualization".Referring to the definition, it is stated that the specification has explicit meaning that it does not contain any hidden assumptions.The term conceptualization refers to the fact that everything is defined in a formal manner.For instance, one can conceptualize the schema of a specific database by using Unified Modelling Language (UML).
There exist several types of ontologies, each of which developed for different purpose and applications.Various classification approaches exist for comparison of those ontologies.In a well-cited research, ontologies are classified based on level of abstraction and their usage (Guarino, 1998) (Figure 1). Figure 1.Classification of ontologies based on their level of abstraction.Adopted from (Guarino, 1998) Top-level ontologies capture general concepts useful across several domains.The main role of this ontology type is to bring interoperability between several domain ontologies, for means of comparing, aligning and merging (Niles and Pease, 2001).They are also referred to as upper ontologies and are most often based on human perception of the world (Kiryakov et al., 2001).One of the upper ontologies used in various applications is Suggested Upper Merged Ontology (SUMMO) (SUMMO, 2013).SUMMO provides definitions for general-purpose terms, and is the foundation for more specific domain ontologies.The second types of ontologies in Guarino's classification (Figure 1) are domain ontologies, which define concepts of a specific domain of interest.An example of domain ontology for this research is the ontology for geometry capturing all concepts and relations involved in geometric/spatial domain.Task ontologies define the activities of a task/process without being specified for a certain domain (Paulheim, 2011).Task ontologies can include concepts and relations of various measurement approaches, and measurement units that are common in all domains.Application ontologies on the other hand bring the potential to define specific activities by making use of domain and task ontologies.This is done by stating which entities from the domain ontology play which role in an activity defined in the task ontology (Guarino, 1998).By adopting the concept of ontology type classification based on their level of abstraction and usage, in this research, the application ontology would be an ontology for Disaster Response carrying collection of definitions and relations/rules for relevant processes/tasks involved in Disaster Response, as well as the roles of actors involved in Disaster Response activities.
Apart from these types, for the purpose of this research (i.e.Semantic annotation), another type of ontology is of interest.In a technical level of abstraction, later in the process of semantic annotation, an ontology would be created on-the-fly by carrying various concepts from different ontology types, and populated with all the necessary instances for a specific task/process.Since it carries data instances we call this type of ontology populated ontology.Populated ontologies are specific in means of their usage, but depending on the different tasks in an application usage scale, it could be both re-usable and not reusable.Note that since populated ontologies carry data instances, they are not type of knowledge modelling ontologies and hence not depicted in Figure1.In the next section, the creation and usage of domain, application and populated ontology for means of semantic annotation process will be discussed.
For means of information integration, ontologies have been used in several research studies (Kashyap and Sheth, 1996;Mena et al., 1996;Unschold and Gruniger, 1996;Stuckenschmidt and Wache, 2000;Mostafavi and Bakillah, 2012).Information integration can be subdivided into two different types.On the one hand, ontologies are mapped together in order to integrate concepts in two or more domain of interest (Inter-Ontology mapping) (Kashyap and Sheth, 1996;Mena et al., 1996;Preece et al., 2000;Stuckenschmidt and Wache, 2000).This type of integration is not the aim of this paper.On the other hand, the second type of information integration similar to the research study of (Bakillah et al., 2007), which is also of interest in this paper, is to map information provided in an ontology (e.g.concepts) to geoinformation available in sources at the instance and object level.
Generally, ontologies can be related to the database schema, or even to single terms used in a database.Various approaches have been used to establish a connection between ontologies and information sources.Three general approach for this task are structure resemblance (Chawathe et al., 1994;Arens et al., 1996), definition of terms (Stuckenschmidt and Wache, 2000) and structure enrichment (Kashyap and Sheth, 1996).In the first approach the aim is to use a one-to-one relationship between two sources of information (e.g.ontology and data source) by using the structure (e.g.syntax) of both data schemas (Arens et al., 1996).The second approach employs the semantics of terms defined in the ontology as well as a set of rules for making the link with data sources (Stuckenschmidt and Wache, 2000).Furthermore, in the structure enrichment approach both above-mentioned approaches are considered to build a logical model that resembles the structure of the information source and contains additional definitions of concepts (Kashyap and Sheth, 1996).In addition to these approaches, Meta-Annotation is a rather new approach that aims to add semantic information to an existing information source.This approach is becoming prominent with the need to integrate information present in the World Wide Web (Wache et al., 2001).An example of approaches developed in this way are Onto-broker (Fensel et al., 1998) and SHOE (Heflin et al., 2000).
In this research we aim for developing an approach based on Meta-Annotation and specifically address the applicability of this approach for integration of geo-information and working with object instances in geo-datasets, since the approach is still new in the Geographic Information (GI) domain.

SEMANTIC ANNOTATION OF EXISTING GEO-DATASETS
In this research, the process of injecting semantic descriptions into existing geo-dataset (information integration) is called semantic annotation.This process is carried out by employing ontologies which carry the semantic description.This section provides the system architecture and an example for elaborating the semantic annotation process.

System overview
The system web architecture of semantic annotation process (Figure 2) employs different ontologies ranging from application ontologies (e.g.disaster response ontology) to task/process ontologies as discussed in section 2. These application ontologies carry extra information specifically related to the application they capture.The extra information is necessary for decision-making purposes and is missing in existing geo-datasets.
On the other hand, geo-datasets carry great amount of information, and are mostly up-to-date which serve the basis needs for planning and decision-making.As depicted in Figure2, geo-datasets are used to create populated ontologies.This is done via using data schema and instance property values.In the next step, two different ontologies; application and populated ontologies are integrated together by injecting features and property values from application ontologies into populated ontologies.The result of such injection would be a populated ontology enriched with meaning-full information from application ontology.
Information management in Disaster Response can benefit from such populated ontology since it carries various information relevant for planning tasks (e.g.evacuation) as a whole in both conceptual and instance level without changing/making data schemas of the individuals.This brings great flexibility in means of adding information sources with dynamic nature (and each with different data schemas) that is essential in disaster response planning.The enriched populated ontology could be used in a semantic execution environment where different tasks such as search and reasoning could be performed for answering context-aware queries asked by users.SPARQL (W3C, 2008) and SQWRL (Bakillah et al., 2012) can be used for this purpose.
In addition, the enriched populated ontology could be translated into GML format, resulting in an enriched geo-dataset which can be further used in planning and decision-making activities.
Note that users can be either human users which need enriched geo-datasets or translation of various concepts that can be understood and used for handling their duties or other services which need to perform other application-related tasks (e.g.Decision Support Systems).In the next section an example is explored in order to elaborate how the system can work in practice.

Ontology for disaster response
Since this research aims for disaster response application, there is only one application ontology designed carrying information of main concepts, their properties, and relationships in disaster response domain.Currently, such an application ontology is not available for the Netherlands, but there exist solid research work (Xu et al 2008;Dilo and Zlatanova, 2010;Zlatanova 2010;Fan and Zlatanova, 2011) that have captured main classes, relationships and concepts involved in Dutch Disaster Response system and is made available in Unified Modelling Language conceptual models.In simple words, ontology is a set of relationships between concepts.Such a set can be mapped to a set of Resource Description Framework (RDF) triple sets and then translated into a standard syntax of the Semantic Web.Thus, in order to create an application ontology for disaster response, different data models which capture the important terms/information in disaster response are used.The information are individually translated to RDF triple sets and then converted into Notation3; a standard syntax of the Semantic Web for designing ontologies.For example, Figure 3 is part of our previous work that demonstrates the main classes in disaster response in Netherlands in Unified Modelling Language (UML).This UML diagram represents the top level classes and their interrelationships involved in disaster response in the Netherlands.The UML diagram is used to define the necessary RDF triplets (Table 1).An RDF triplet consists of a subject, a predicate, and an object.For instance based on conceptual diagram of DR (Figure 3), Incident-is managed by-Process is one possible RDF triplet following the structure of subject-predicate-object.The same procedure is carried out for all classes and relationships and their attributes, as well as for conceptual diagram for dynamic data in disaster response (Figure 4).
In this paper, Notation3 (Berners-Lee & Connolly, 2011) is used to represent the schema of the application ontology for DR by utilizing the RDF triplets created in the previous step.The RDF triplets were used and merged together in order to create application ontology for Disaster Response (DR) (see: Code 1).Note the DR: prefix, the dot, and the semicolon at the end of the triples.Prefixes are defined to achieve globally unique identifiers and they are defined in the first lines of script to refer to another ontology (namespaces).In this example, DR prefix refers to the Disaster Response application ontology.The semicolons are used so that subjects and predicates can be grouped.The dot sign shows the end of a concept/group definition.A well-known namespace standard for defining RDF triplets is rdf.rdf:type is a property providing elementary typing system in RDF (W3C, 2004).rdf:property represents connections between an RDF resource and either another resource or a literal (W3C, 2004).Literals are XML schema data type values such as integers and strings.Although the designed Disaster Response application ontology is a small ontology with few concepts in DR, but it is sufficient for the example carried out in this paper.It is also necessary to mention that the process of extracting RDF triplets and using them for ontology design has been done manually, although it is believed that this process could be handled fully-automated.

Domain ontology
The proposed standard for encoding documents composed of structured data in the web, as proposed in the semantic web stack (Berners-Lee, 2000) is XML.Therefore in this example Geography Mark-up Language (GML) (OGC, 2013b) is used since its technology is based on XML and is designed for the purpose of storage and dissemination of geo-spatial data.More specifically, our example employs CityGML dataset.
CityGML is an open data schema designed for storing and exchanging 3D urban objects (OGC, 2013a).It is an application schema based on GML format.Since CityGML includes more concepts and level of details in means of information that can be captured and also because it has the ability to capture information in third dimension (which is important for disaster response) we use this standard in our example.There already exist different data schemas for GML (e.g.topology.xsd,coverage.xsd,etc.) defined by OpenGIS (OGC, 2013b) which could be employed to capture information for creating domain ontology.In addition, datasets also carry instance values for various attributes of feature classes that should also be used for creating domain ontologies.The process of domain ontology creation is the same process used for application ontologies.RDF triplets are extracted from data schemas.In this example, to make it simple, we only used one geo-dataset.However, the approach is generic and could be applied to several XML-based datasets.
Code 2 (the un-bold texts) shows a sample CityGML dataset populated only with one building identified as ID_147_D.Code 3 (the un-bold texts) shows the populated ontology of this geodataset created manually by using the same procedure for making application ontology.Note that this process could also be automated by checking the data schemas used for creating and maintaining the geo-dataset (by checking the namespaces that the dataset refers to).

Example with an Earthquake scenario
As a simple scenario, consider that after an earthquake several buildings have been destroyed and the organizations involved in disaster response need to have dynamic information collected on the field for planning the task of evacuation.Such information for a specific building could be, for instance: 12 people trapped, 3 people dead, 25 people need to be evacuated.This dynamic information is integrated with information available in geo-dataset (e.g.building location) by simply adding relevant RDF triplets from application ontology into the definition of the properties for a specific feature in populated ontology.For instance, Code 3 shows some description with property values (bold text) injected to data ontology showing that in Building identified as ID_147_D, task1 (which is a wellknown task in disaster response application ontology) is performed after a disaster has happened.Based on its definition, task1 is a task performed by disaster responders for measuring damaged people and/or animals.Therefore, task1 has three main attributes (named trapped, people2Evacuate and people2Contaminate) which captures statistical dynamic data for a certain feature (e.g.building) about people/animals that are trapped and need to be rescued, people/animals that are ready for evacuation, and people/animals that are dead and need to be contaminated, respectively.This information could be easily added by disaster responders in the field via mobile devices (e.g.PDA's), or send to an information centre which is responsible for updating geoinformation.Note that the data ontology in Code 3 refers to both namespaces for geo-data (e.g.gml:), and namespaces for disaster response application ontology (e.g.DR:).Also, note that the name of the geo-dataset Amsterdam_Data along with the ID of the selected building ID_147_D is used as an identifier in the geo-data ontology which is necessary when information from several geo-datasets are integrated together.In this example we only used one dataset but in real life applications several datasets should be integrated with dynamic information.Therefore, a mechanism for matching and mapping objects in different datasets together based on their spatial attributes should be developed.In addition, the approach is capable of covering the missing concepts in a given conceptual schema (e.g.CityGML schemas).For instance, when there is no concept of floors in a building in the CityGML conceptual schema, by using the proposed approach in this paper the data ontology could be extended in order to cover missing concepts (e.g.floor) that should be there (the same as concepts related to application ontologies).Referring back to our example, the result would be that for instance, the statistics of people/animals that need to be evacuated are grouped in different floors in the building (improving levels of details).
The enriched data ontology can further be used in a semantic execution environment (Figure 2) where SPARQL queries could be executed on it that results in answering to queries such as "the number and location of people/animals to evacuate from a certain region".Please note that the same information can be extracted from a data source that contains the number and location of people/animals, but the important concept behind this approach is that the query is not executed on a single data source but rather on a populated ontology which itself is a set of integrated information from various sources.This provides the basis for executing more complicated and tied queries on the populated ontology, on-the-fly, which brings the capability of spatial reasoning to services.
Finally, as depicted in the system web architecture (Figure 2), if necessary, the enriched geo-data ontology could be translated to GML format (Code 2), resulting in an enriched geo-dataset (enriched parts in bold) ready to be used by external services (e.g.Web Processing Service) for other purposes.The benefit of the enriched geo-dataset is that other external services (e.g.Geo-visualization software) that cannot work with semantic web standards (e.g.Notation3) and yet are used by organizations involved in disaster response can read and make use of this enriched geo-datasets.

DISCUSSION AND FUTURE WORK
This paper presented an approach for integrating dynamic information with existing geo-datasets using semantic web technology.Ontologies were employed to annotate existing geodatasets in CityGML format with meaningful description of dynamic information in disaster response.The reason ontologies are used is that it simplifies the task of information integration for computer services since it is more flexible compared to other solutions that only use the structure of data schemas for means of integration.This could later on bring the potential of automated information integration by web services.
A system web architecture was provided and by a step-by-step example, the process of designing ontologies from conceptual data models and datasets, and employing them for means of information integration was elaborated.For the sake of simplicity, in our example a light-weight application ontology (disaster response) and one geo-dataset were used.The approach was performed manually, but looks promising and has the potential to be performed automatically, too.
The approach used in this paper brings more flexibility for means of information integration compared to approaches that use data schema and structure-based approach for integration.In other approaches such as Structure enrichment, the aim is to define strict rules for information integration and provide a data schema relevant for all the individual data sources (Calvanese et al., 2002).Therefore, whenever other sources of information are needed to be used by the system a human user should define logic rules in order to update the mapping mechanism to serve the new data source.The approach presented in this paper, has more flexibility since it does not try to make a data schema for the whole, but rather makes a mediated schema and refers to individual schemas by making use of the Unified Resource Identifier (URI) and Namespace (NS) standards of the semantic web.Therefore, this approach is in line with the concept of Linked Open Data concept: to use RDF links in order to interlink data from various sources (Yu, 2011).
In a traditional Information System, UML conceptual diagrams are used in order to define the schema behind a database/dataset.However, in the world of Semantic Web, UML diagrams cannot be employed for this matter.In fact, based on the vision of linked open data, the information of concepts and relationships behind datasets should be defined as separate pieces of objects that are linked together via URI's, and formalized in a Semantic Web language such as RDF and Notation3.
Compared to the previous tools that use Meta-Annotation approach such as Ontobroker, the work presented in this paper has two main differences: first, Ontobroker develops an extension to the HTML syntax to enable the ontological annotation of web pages.Our presented approach makes use of the eXtensible Markup Language (XML) as the language of data exchange in the Web because many existing geo-datasets are provided in Geography Markup Language (GML) which in turn is based on XML technology.Therefore three primitives to annotate semantic information in geo-datasets have been provided: • An object identified by a URI can be defined as an instance of a certain class/feature.

•
The value of an objects' attribute can be set.

•
The relationship between two or more objects may be established and queried.
Second, Ontobroker heavily relies on a non-semantic web standard (HTML), and therefore is useful for annotating texts and image graphics, where our approach uses RDF as the basis language for generating and using information from other sources for annotation.Therefore, our approach works under the concept of defining everything as an object and can be used by all automated search mechanisms that can read RDF and can make use of an ontology.In a nutshell, our approach is one step further in the direction of a knowledge web.
Since different organizations are involved in disaster response, and due to the fact that each organization has its own goals, levels, and tasks the realistic system should be able to match various applications, task, and organizational ontologies as well.The proposed approach does not cover this issue and leaves it for future work.Apart from that, the concepts and their interrelationships are defined in our example application ontologies but the next step for future would be to make the ontologies more formal by defining the rules and constraints as well.This would help other humans/computers to fully understand the newly defined concepts and would prove the benefits and flexibility that our approach brings to the web service.It is also important to note that this approach has been performed manually and the next step for future work is to implement and test it as a service.
Another aspect is to improve the semantic execution environment (Figure 2) where SPARQL is used to generate and execute queries on enriched geo-data ontologies to provide semantic search and reasoning capabilities to end users.For this case, it is also necessary to design more formal ontologies (application, domain, etc.) by including logical constraints and rules.These rules can further be used for means of semantic search and reasoning of geo-information for assisting disaster managers with automating the tasks of discovery and integration of geo-information.

Figure
Figure 3. Conceptual diagram of top-level classes involved in DR in the Netherlands.Adopted from(Xu et al., 2008).

Figure 4 .
Figure 4. Part of conceptual schema for dynamic data in DR in the Netherlands.Adopted from(Dilo and Zlatanova, 2011)

Table 1 .
RDF triplets of conceptual diagram shown in Figure3.Therefore, the last part of the ontology defines these attributes in order to be used in the process of semantic annotation of geodatasets.
Notice that DamagedBuildings and DamagedPA itself are classes which own different attributes.For Instance: DamagedPA keeps values of several attributes such as: trapped, people to evacuate, people to contaminate just to mention a few.Code 1. Disaster Response application ontology in Notation3 syntax.