TOWARDS A COLLABORATIVE KNOWLEDGE DISCOVERY SYSTEM FOR ENRICHING SEMANTIC INFORMATION ABOUT RISKS OF GEOSPATIAL DATA MISUSE

The aim of this research is to design and implement a knowledge discovery system that facilitates, using a web 2.0 collaborative approach, the identification of new risks of geospatial data misuse based on a contributed knowledge repository fed by application domain experts. [Context/Motivation] This research is motivated by the irregularity of risk analysis efforts and the poor semantic of the collected information about risks. In the context of risk analysis during geospatial database design, the knowledge about risks of geospatial data misuse is typically held by domain application experts. The collection and record of that knowledge are usually considered as optional activities. It is usually performed through face-to-face risk assessment meetings and reports. Such techniques end up by restricting the scope of risk analysis to a set of obvious risks usually already identified. Besides, little consideration is devoted to the storage of risk information in an appropriate format for automatic reasoning and new risk information discovery. As a consequence, many foreseeable risky aspects inherent to the data remain overlooked leading to ill-defined specification and faulty decisions. [Principal ideas/results] In this paper, we present a contributed knowledge discovery system that aims at enriching the semantic information about risks of geospatial data misuse in order to identify foreseeable risks. The proposed web-based system relies on a systematic and more active involvement of users in risk analysis. The approach consists of 1) providing an overview of the related work in the domains of risk analysis within the context of geospatial database design, 2) presenting an ontology-based knowledge discovery system that helps experts in risks identification based on an upper-level risk ontology and on a structured representation of the domain-specific knowledge and, 3) presenting the components of the proposed system architecture and how it may be implemented and used in practice, and finally 4) we conclude by discussing the approach. [Contribution] A major outcome is that the proposed platform can help discovering implicit domain knowledge, and facilitating the identification of foreseeable risks of geospatial data misuse in a way to preventively improve the resulting fitness-for-use.


INTRODUCTION
Typically, risk analysis process requires the contribution of experts of the application domains.In the context of geospatial database design, experts need to identify the risks that may arise while using the data (Grira et al. 2010).Risk identification is a prerequisite to risk analysis: according to risk ISO standards (ISO 2009), it implies the use of all "available information" to identify intended use and reasonably foreseeable misuse.The geo-IT experts help, due to their domain-specific knowledge, identifying such information.However, in the rare cases this information is collected, almost no attention is paid to its representation and storage in an appropriate format for risk knowledge extraction.This leads to overlook valuable knowledge that might be relevant for the risk analysis process.
Besides, different stakeholders are required to be involved in the risk analysis process.Literature outlines the differences in the stakeholders' backgrounds in terms of expertise and skills: this may undermine the efficiency of the risk analysis process.Hence, there is a need to a common understanding of the considered risky issues.In fact, geospatial data projects need to be able to rely on a structured experts' knowledge to better analyze risks of geospatial data misuse.
The aim of this paper is to present an ontology-based knowledge discovery system that facilitates, using a collaborative approach through a web 2.0 platform, the identification of new risks of geospatial data misuse, the representation of the knowledge about these risks, and allows semantic reasoning on the resulting knowledge.As formal knowledge representation models, ontologies can render invaluable help in this regard.In the next section, we first present an overview of the related work about the risks of geospatial data misuse and its management.Next, we present the concept of ontology-based knowledge representation and discovery in relation to our context of risk analysis.Then, we describe the methodology adopted to improve the knowledge about the identified risks.Subsequently, we expose in the next section the architecture of the knowledge discovery system integrated to a collaborative platform used for the collection of experts and end-users contributions.The last section describes how the output of knowledge discovery system can be used for geospatial database design (i.e.CASE tools integration) and for decision making (i.e.web-based dashboard).Finally, we conclude in the last section by discussing the contribution and presenting possible future work.

RISKS OF GEOSPATIAL DATA MISSUSE
Nowadays, many users de facto perceive geospatial data as reliable for their usages (Grira et al. 2012).They usually assume data is safe and not risky regardless of the intended usage context (Grira et al. 2013).However, perceptions about the fitness-for-use of the data may diverge from untrained users, geo-IT experts and application domain experts (Grira et al., 2009;Devillers et al. 2010).Accordingly, the assumption of safety usage of the data has led to a number of accidents and other adverse consequences that remind the need to protect users against the risks of data misuses (Larrivée et al, 2011).In this regard, literature outlines the increasing number of incidents involving risky usages of geospatial data.In fact, existing geospatial technologies are known to lack effective approaches to warn end-users, usually having limited expertise in the geomatic domain, of possible risks that could emerge from reusing geospatial data.

Ontology definition
The notion of ontology is very useful in various fields such as semantic reasoning, artificial intelligence and knowledge management.Although there is not a universal consensus on a common definition of ontology, it is generally accepted that it represents a specification of a conceptualization (Gruber 1995).Ontologies are typically defined as abstract models with a formal semantics.Domain ontology is one kind of ontology: it is defined as a specification of a shared conceptualization of a domain of interest (Gruber 1995).Domain ontologies are used to represent the knowledge for a particular type of application domain (Dittenbach et al., 2004).They represent a common formalised knowledge of a domain as they are assumed to reflect the agreement of experts about that domain (Gruber 1995).
Our interest to ontologies relies to its potential to represent and share knowledge.Ontologies help achieving a common understanding of artefacts representing human knowledge in a community, i.e. the concrete representation of a model of consensus within a universe of discourse.In fact, an ontology is known as a container for capturing semantic information of a particular domain: literature outlines the usefulness of ontologies as a means to define a common vocabulary to share information in a domain (Noy et al. 2001).This includes machine-interpretable definitions of basic concepts in the domain and relations among them.
For the purposes of this research, the following formal ontology definition is considered: Where C = a set of concepts; R = a set of relations over concepts; I = a set of instances of concepts; H = a hierarchy of subsumption relations; A = a set of axioms bringing constraints on Ci and Ri.
The definition presented in (1) converges with the widely accepted definition of ontology considering that it is "a formal explicit specification of a shared conceptualization" (Gruber, 1995) where formal implies that the ontology should be machine-readable and shared that it is accepted by a group or community.In the case of a domain ontology, it is usually assumed that it conveys concepts and relations relevant to a particular task or the application domain, which is the case we are interested in.

Ontology and knowledge discovery
Literature already outlined the effectiveness of using ontologies for supporting the knowledge discovery process: evidence exists about the role of ontology in establishing correspondences and  (Marinica et al. 2012, Petasis et al. 2011).
However, none of these works have considered an ontologybased knowledge discovery process coupled to the Delphi method in order to bring end-users and application domain experts within a web 2.0 collaborative approach.They mainly focused on data representation interoperability and data interoperability.Therefore, in the context of this research, ontology-based knowledge discovery is used to extend the previously introduced collaborative risk identification and analysis process (Grira et al. 2012).It enriches a subset of the attributes of the ontology defined in (1).Therefore, ontology usage helps extending the collaborative knowledge repository with additional individuals, constraints and semantic relations through the contribution of untrained users and designated application domain experts.The overall proposed methodology consisting of five different steps: (i) ontologies; (ii) risk knowledge repository; (iii) semantic knowledge discovery; (iv) collaborative risk analysis during geospatial database design; and (v) interpretations and usages.Each of these steps is illustrated in the Figure 1 and explained briefly in the following sections.

Ontologies
This step proposes different ontologies with the goal of supporting the analysis of semantic information about the considered risks.As shown in Figure 1, ontologies include an (1) upper level ontology for risks, and as many as needed (2) application domain ontologies.

Risk ontology: an upper level ontology
The first ontology outlined in Figure 1, i.e. the Risk Ontology, corresponds to an upper level ontology.It is designed in conformance to the risk concepts defined is ISO risk standards (ISO, 2009).It stores concepts of risks, semantic hierarchical (e.g., is-a, part-of) and associative relations (e.g., similar-to, cause-of).
For example, as shown in the Figure 2, the ISO concepts of likelihood and consequence of a risk are implemented as risk criteria that define the risk concept.Both of these two concepts are defined in the ISO standard about risk management principles and guidelines (ISO, 2009).

Application Domain Ontologies
The domain ontology represents the data from the perspective of subject matter experts in that domain.This ontology relates the concepts and relations of the considered domain that the user understands.The primary purpose of the domain ontology is to represent the concepts using an appropriate vocabulary for the application domain.
Application domain ontologies represent the perspective of the user community.For example, geospatial ontology should represent the knowledge of the geospatial community that uses the geospatial information.As such, it could be made up of, or derived from, other ontologies containing for example spatial, thematic and temporal aspects relevant for the geospatial domain.
In the context of the present work, we conceptually represented application domain ontologies in conformance to the concept of "RiskProfile" defined in the ISO-31000 standard (ISO, 2009).
As it appears to the Figure 2, risk profile is designed as an extendable concept: it may be reused and extended to an infinite set of domain ontologies.
In relation to the geospatial domain, geo-ontologies describe the semantics of geographic data and its attributes.Literature offers a variety of contributions about the design and reuse of ontologies as a tool for system integration through the definition of a core geospatial knowledge vocabulary (Kolas et al. 2005).
Sboui and Bédard proposed the use of ontologies to address the problem of semantic interoperability (Sboui and Bédard. 2011).
Other research leveraged ontology as a semantic reference system (Egenhofer, 2002;Kuhn and Raubal, 2003) and as knowledge representation formalism for a particular type of application domain (Dittenbach et al., 2004) Our work goes beyond the modeling aspects of geo-ontologies and provides a framework to utilize upper level ontologies and domain-specific ontologies in order to efficiently address risks of geospatial data misuse that have already been collaboratively identified (Grira et al., 2012).

Risk Knowledge Repository
In relation to geo-ontologies defined in (1), the risk knowledge repository (RKR) contains the instances derived from their concepts, their relations and their attributes; hence, according to the definition given in (1), the RKR may be formally expressed as follows: As illustrated in Figure 3, the risk knowledge repository is collaboratively enriched throughout a knowledge discovery process.This process involves end-users who express their needs, intentions of use, objectives, etc.It also involves domainspecific experts who are responsible of identifying risks and new requirements based on their expertise and on the knowledge discovery system.

Semantic Knowledge Discovery
Expert knowledge and automatic, or semi-automatic, knowledge discovery have recently attracted a lot of interest.Together, expert knowledge and automatic discovered knowledge are increasingly perceived as complementary (Grira et al. 2013).In many disciplines, experts are required to provide their opinions about the system generated knowledge (Devillers et al. 2007).The two approaches, i.e. knowledge discovery and knowledge elicitation from experts, complement rather than oppose each other.
Semantic knowledge discovery in our context consists on an expert assisted process to elicit requirements about risks.A rule-based reasoning engine is used for a basic knowledge discovery enriched with domain experts' inputs.The cooperation between expert knowledge and discovered knowledge relies on context-based interpretation of the intended usages of the data.
In our context, knowledge discovery is based on a semantic reasoning engine as illustrated in Figure 1.However, other actors (i.e.end-users and application domain experts) and components (i.e.ontologies, rules and constraints components) help identifying relevant knowledge about risks of geospatial data misuse.For example, experts are part of the knowledge discovery process: they contribute with their domain-specific expertise to identify rules and patterns that may feed the risks knowledge database.Similarly, end-user may contribute with basic usage scenario, which expresses the objectives and intentions of use of geospatial data.As illustrated in Figure 3, knowledge is input by domain experts in order to enrich the knowledge database.For example, experts may define some thresholds to be included in the rule engine: any instance which is candidate for the knowledge database should respect the pre-defined threshold.Otherwise, its related risks will neither be detected nor analyzed.
In our context, the contribution of domain experts is required in order to define domain specific rules, identify new requirements and constraints, and assess risks related to the usage of the underlying data.In the knowledge discovery domain, literature outlines many techniques with almost no interaction with a human actor.Most of these techniques assume a clear definition of the concepts and requirements, which is often not the case.Hence, discovery knowledge techniques usually failed at incorporating valuable expert knowledge into the knowledge discovery process (Keim 2002).
Following the example of the expert-defined constraints, a threshold is considered as a constraint on an instance of a concept.Constraints are a type of rules defined by experts.For example, an expert-defined threshold may apply on a risk that experts decide to accept.Therefore, the threshold may be expressed as follow: where δ: is an expert-defined value a: an axiom corresponding to an instance of A defined in (1) po : an instance of the concept PO co : an instance of the concept CO PO : the concept "probability of occurrence" defined in Figure 2 CO : the concept "consequence of occurrence" defined in Figure 2 The defined threshold is a typical example of an expert-defined rule that triggers the enrichment of the risk knowledge database.The definition of these rules is performed within a collaborative process.The next section describes how experts contribute in that process, especially for defining requirements for risk analysis.

Collaborative Risk Analysis in a Context of Geospatial Database Design
Geospatial database design is a process where at least some adhoc activities of risk analysis have to take place: risk analysis is important in the software design phase and constitutes a prerequisite for evaluating criticality of the system (Boehm, 1991) and taking the necessary countermeasures.However, different expertises are required to perform the risk analysis within the design phase (Bédard Y, 2011;Grira et al., 2012): application domain expertise (e.g.ecology, epidemiology, transportation, security), information technologies expertise (e.g.system engineers and database designers) and Geospatial Information Technology -GeoIT expertise (e.g.geomatics engineers, GIS developers, geographers).Considering the differences in perspectives, backgrounds and objectives between the different experts, there may exist a divergence in the way they analyse risks.Accordingly, there is a need to bridge the gap between those that are experts about the domain and its requirements (i.e.application domain experts), those that are experts in the design and the construction of the artifacts that together satisfy the domain requirements (i.e.GeoIT experts), and those that are experts in software and database design (i.e.IT experts) in order to get a common understanding of the implied risks related to the geospatial data to be used (Grira et al., 2010).
In this context, Delphi has been identified as a method that makes explicit a set of requirements and produces a collective estimation of cross impacts of risky issues (Grira et al., 2012).Delphi helps end-users and experts to exchange about design alternative solutions and arguments until a compromise, or a consensus, is reached.It provides the design team with an agile mechanism that helps incorporating new risks "on-the-fly" into the project risk management scope (Grira et al., 2013).
As illustrated in Figure 1, Delphi method is not used as a standalone approach.Many researches contend that using the Delphi method as only part of a wider process may well prove a means to enhance its utility (Rowe and Wright, 2011).Following these researches, we used the Delphi method to involve experts in providing their judgment about the risks but we also involved end-users in a second collaborative step using a web 2.0 collaborative platform in order to collect nonfunctional requirements (i.e.goals, objectives and constraints) and assess their cross impacts on risk analysis (Grira et al., 2012).Literature outlines that the use of Delphi method with other techniques (e.g.collaborative workshops, Nominal Group Technique, focus groups and face-to-face meetings) usually produce more satisfactory results (Bañuls and Turoff, 2011;Landeta et al., 2011) and make them more coherent (Nowack et al. 2011).

Interpretation and Usages
In our context, Delphi is used in order to reach a consensus about an expert-defined threshold.In relation to the ontology definition given in (1), a threshold is an axiom corresponding to a constraint as expressed in (3).Using the Delphi collaborative process to decide about the value of the threshold, the resulting set of acceptable risks may be expressed as follow: where: raccepted : is the set of instances of accepted risks RKR: the Risk Knowledge Repository defined in (2) r: is an instance of risk in the RKR defined in (2) a: axiom defining the expert-defined constraint in (3) A typical usage of the proposed approach consists on enriching an existent system used to support decision making: the decision aid system for coastal erosion risk assessment is an example.Is such a system, each risky zone is identified according to its risk probability.The latter is calculated according to many factors such as the distance separating the road from the riverside, the soil nature, the slope, etc.If a threshold of 90% for example defines a risky road portion in winter, this probability should not remain the same in summer.In fact, the threshold should be configurable because of the changing impact of seasons on water, soil and also on the road itself.Accordingly, experts could change the probability threshold (i.e. the value of "a" in the RKR defined if (4)) of risky road portions as soon as they judge that the climatic conditions have significantly changed in a way to impact decision making.The decision about the effective date of the threshold change is determined through the Delphi collaborative method (Figure 1).Once the threshold parameter changed in the RKR, the result given by an ETL operation about the risky road portions will be different in respect to the new contributed parameter.

ACHITECTURE
Figure 4 illustrates an applicative architecture for implementing the ontology-based risk knowledge discovery system.This architecture consists on three main layers: a presentation layer; a control layer, and a core layer.

Presentation Layer
The presentation layer consists on all the front-ends that manage user interaction and display results to the user.For example, the dashboard is part of a Web 2.0 dynamic application whose role is to display indicators relevant for decision making.As illustrated in Figure 1, collaborative indicators and risk indicators are displayed on the dashboard: such relevant information may influence the decision making process.
The interface of a CASE tool is also considered as part of the presentation layer: once connected to the risk repository through web services, relevant information about the model to be designed is displayed.As illustrated in Figure 1, a properties tab displays up-to-date information coming from the repositories: when an entity of the model is selected (e.g. a class, a table or an attribute), the content of the tab is dynamically refreshed.
The typical users of CASE tools are IT or geo-IT experts.Knowing that some design entities are subject to risky usages may be a valuable information leading designers to take important design decisions.For example, if users express their intentions of using the data for emergency operations, the required positional and temporal accuracy becomes critical.Besides, designers could have to change their specifications according to the newly identified intentions of use of the geospatial data.

Control Layer
The Control Layer corresponds to the layer where business logic is implemented.In the context of our knowledge discovery system, the business logic is split into (1) a web service component, (2) a workflow engine and, (3) a reasoned engine.
Web services are exposed by the Core Layer for other tier systems: the CASE tool and the dashboard are typical consumers of these web services.For example, the dashboard displays risk information grabbed from the repositories: this information is displayed at the Presentation Layer level.
The workflow engine consists on the implementation of the Delphi collaborative process.It consists on a configurable system where surveys are built around a risky issue raised during the database design stage.The different steps of the workflows determine how many iterations users and experts should go through in order to get fixed about a design decision about the risky issue.Next, the result of the collaborative process becomes available and provided by the web services of the same layer.
Finally, the reasoner engine consists on the components that get experts-defined rules and constraints and infers new knowledge based on the repository information and the experts opinions.Many other techniques for discovering knowledge (e.g. based on similarity, rules, patterns, etc) may be used in order to improve the knowledge discovery processing, however the experts opinion remains important in our context as illustrated in Figure 3.
Metamodels represents the ontologies used in this work, i.e. upper level and domain ontologies.Domain models represent domain-specific data models.The instances of the considered ontologies and the domain models correspond to the records of the repositories: in fact, the aim of repositories is to structure and store knowledge and requirements.The structure of the knowledge repository is derived from the risk ontology and the application domain ontologies.

CONCLUSION
This paper has described the design and the architecture for the implementation of an ontology-based knowledge discovery system that facilitates, using a web 2.0 collaborative approach, the identification of new risks of geospatial data misuse.The risk identification is based on a collaborative process that involves application domain experts.The collaborative process is based on the Delphi method to collect experts' opinions in regards to the considered risky issues.
We then proposed an ontology-based knowledge discovery system in order to support the collaborative process and provide facilities to improve the knowledge of experts about risks of geospatial usages.
Finally, we designed an architecture that helps bringing endusers and experts into the design process in a way to attenuate the risk related to potential inappropriate usages of geospatial data.
Nevertheless, a considerable practical challenge lies in an evolution from a "one size fits all" risk analysis process to a collaborative approach based on the consideration of the different potential usages of the data.The adoption of our approach relies of the motivation of the CASE tools providers to offer interfaces that interact with external collaborative platforms: our approach based on ISO standards and widely accepted principles of software architecture makes the integration more feasible.

Figure 1 .
Figure 1.Enriching Semantic Information about Geospatial Data Misuse: a collaborative, ontology-based Approach

Figure 2 .
Figure 2. ISO compliant Standard Risk Ontology

Figure 3 -
Figure 3 -Semantic Knowledge Discovery Based on Expertdefined Rules

Figure 4 -
Figure 4 -Architecture of the Ontology-Based Risk Knowledge Discovery System