HANDLING IMPRECISION IN QUALITATIVE DATA WAREHOUSE: URBAN BUILDING SITES ANNOYANCE ANALYSIS USE CASE

Data warehouse means a decision support database allowing integration, organization, historisation, and management of data from heterogeneous sources, with the aim of exploiting them for decision-making. Data warehouses are essentially based on multidimensional model. This model organizes data into facts (subjects of analysis) and dimensions (axes of analysis). In classical data warehouses, facts are composed of numerical measures and dimensions which characterize it. Dimensions are organized into hierarchical levels of detail. Based on the navigation and aggregation mechanisms offered by OLAP (On-Line Analytical Processing) tools, facts can be analyzed according to the desired level of detail. In real world applications, facts are not always numerical, and can be of qualitative nature. In addition, sometimes a human expert or learned model such as a decision tree provides a qualitative evaluation of phenomenon based on its different parameters i.e. dimensions. Conventional data warehouses are thus not adapted to qualitative reasoning and have not the ability to deal with qualitative data. In previous work, we have proposed an original approach of qualitative data warehouse modeling, which permits integrating qualitative measures. Based on computing with words methodology, we have extended classical multidimensional data model to allow the aggregation and analysis of qualitative data in OLAP environment. We have implemented this model in a Spatial Decision Support System to help managers of public spaces to reduce annoyances and improve the quality of life of the citizens. In this paper, we will focus our study on the representation and management of imprecision in annoyance analysis process. The main objective of this process consists in determining the least harmful scenario of urban building sites, particularly in dense urban environments.


INTODUCTION
Data warehouses and on-line analytical processing (OLAP) constitute the main elements of decision support systems.A data warehouse means a decision support database allowing integration, organization, historization, and management of data from heterogeneous sources, with the aim of exploiting them for decision-making (Kimball, 2002;Inmon, 2005).OLAP refers to the technology that allows users to efficiently retrieve the information stored in a data warehouse.To conceptualize data in a data warehouse, the multidimensional model is used.This model organizes data into facts (subjects of analysis) and dimensions (perspectives of analysis).A fact is composed of numerical measures and dimensions which characterize it.A dimension is organized into hierarchical levels of detail.Based on the navigation and aggregation mechanisms offered by OLAP tools, facts can be analysed according to the desired level of detail.In some real world applications, the subject of analysis may be subjective and consequently its measures are provided in qualitative fashion.In addition, sometimes a human expert or a prediction model such as a decision tree can be used to provide a qualitative evaluation of some phenomenon based on its different parameters.This arises in many applications such as customer satisfaction, process control, consumer products, and annoyance evaluation.Conventional data warehouses are thus not adapted to human reasoning and have not the ability to deal with qualitative data.In previous work, we have presented an original work that aims at making it possible to handle raw qualitative measures and providing a more flexible method for the multidimensional analysis over that type of data.Based on computing with words methodology, we have introduced qualitative measures and aggregates as an extension of multidimensional data model of a data warehouse.Using these measures and aggregates, OLAP queries allow the decision maker to manipulate data in a qualitative fashion using linguistic terms.In this paper, we will extend this model to deal with both qualitative and quantitative measures, which leads to handle imprecise data in a data warehouse.Compared to the state of the art, there exists several research works addressing aggregation over imprecise and uncertain data, among which those proposed in (Laurent, 2001;Molina, 2006;Burdick, 2007;Delgado, 2007).Our study will focus on the fuzzy fusion of qualitative and quantitative measures in the context of data warehouses.To illustrate the problematic and our proposal, we will consider throughout this paper the case of urban building sites annoyance.This paper is structured as follows.In the second section we present our work motivation and the use case related to urban building sites annoyance evaluation and analysis.In the section 3, propose the data model allowing the combination of qualitative and quantitative measures in the context of imprecise multidimensional databases.In the section 4 we present the experimentation framework that consists of a Spatial Decision Support System (SDSS) designed to the annoyance analysis.Finally, in the last section we conclude and present some perspectives.

MOTIVATION AND USE CASE: URBAN BUILDING SITES ANNOYANCE
Although indispensable for the development and renovation of cities, urban building sites are often a source of various kinds of nuisance.These nuisances have not negligible impacts on quality of life of urban citizens.This issue is crucial and becomes more complex in cities with high population density.
The main objective of our work is to develop a spatial decision support system (SDSS) dedicated to reducing the annoyance generated by urban building sites.We make the observation that, in human reasoning, the annoyance is evaluated subjectively and qualitatively by using an ordinal scale of linguistic degrees.Therefore, for a perfect match with the human expert reasoning, we propose in this paper a qualitative model of annoyance evaluation.In our previous studies (Amanzougarene, 2012), we have presented a quantitative model that allows evaluating urban people annoyance due to the noise.By comparison, in the present work, we generalize our previous model by privileging a qualitative data handling of annoyance.We also extend our previous model of annoyance evaluation to other types of nuisance than noise, which strengthens the interest of multidimensional analysis.Indeed, an urban building site is generally likely to cause many nuisances.

Notion of Annoyance:
As several studies show it, annoyance is an unpleasant sensation experienced by an individual facing deterioration in the quality of her/his environment (Guski, 1999;Nordin, 2006;Moser, 2006;Robin, 2007).The annoyance may be caused by various nuisances (noise, odour, vibration, traffic congestion, air pollution…).
According to various factors (intensity, moment, type…), a nuisance is likely to cause a more or less important annoyance to individual.Note that the level of annoyance caused by one or more nuisances can be different from one individual to another, depending on various factors (sensibility, age, acceptability…).This means that, a phenomenon which is not at all annoying for an individual can be extremely annoying for another individual.That reflects the subjective character of the annoyance notion.Thus, for the rest of our study, we propose the following definition for the annoyance notion.
Definition1.In a spatiotemporal environment, annoyance is subjective relationship between an individual and a harmful phenomenon.
In other words, an individual can be only annoyed, in the presence of one or more harmful phenomena for this individual.Thus, a human expert can evaluate subjectively the degree of annoyance, according the various factors (Amanzougarene et al. 2011) These three categories of factors will be used to evaluate the annoyance and to determine the scenario of building site which produce the minimum of annoyance.

Dimensions of Annoyance:
In practice, the choice of factors to be considered for the annoyance evaluation depends on the human experts' appreciation.In our case study, the experts have retained some factors related to individual, nuisance and environment.The latter is actually a combination of space and time dimensions.This leads to a multidimensional representation described by Figure 1 below and including the dimensions: (1) category of population grouping the factors related to the individual, (2) nuisance grouping the factors related to the nuisance, (3) space, and (4) time.Notice that the choice of the dimensions is application-dependant, and could add or ignore some factors such as the building type or gender.Our model adapts to other schemas as well.

Categories of Population:
In our case study, the categories of population exposed to nuisances are represented by a typical individual.These are determined by a combination of factors already presented in Table 1.For instance, a category of population could be "healthy senior manager" which means implicitly an adult individual of male gender, whose category socio-professional is manager, in good health condition.Another category of population could be "housewife mother", meaning an unemployed adult of female gender.A third one could be "child breathing problems", meaning a young individual who is not in good health condition.

Nuisances:
Nuisances to which are exposed the different categories of population concerned by the carrying out of an urban building site can be classified into three categories: (1) Nuisances related to the living environment: characterizing unwanted changes in habits of the population impacted (2) Nuisances related to the landscape insertion of urban building sites: describing changes in the visual environment, and (3) Sensorial nuisances: such as noise, dust, odour and vibration.

Time:
People are not annoyed in the same way according to the moments of the day and the periods of the year.For example, given residential area, a height noise can be accepted during the day but not at all at night.In our case study, we define a hierarchy of time.This hierarchy consists to divide year on two periods: (1) rainy period and (2) non-rainy period.The weekdays are divided into three moments: morning, evening, and night.

Space:
The annoyance of an individual may vary depending on his distance to the source of nuisance.Indeed, nuisances generated by urban building sites are not present in a uniform manner inside the influence area.It is thus important to decompose this area into several sub-areas.For example: the immediate vicinity, the influence area, and the boundary of the influence area.For this dimension we will use an a priori geographical zoning.

Annoyance Evaluation
In human reasoning, the subjective evaluation of annoyance is done qualitatively by using a finite scale of linguistic degrees, such as, "low", "high"….Generally, the human subject uses ordered scales with 5 or 7 linguistic degrees (Yager, 2007).In our case study, the evaluation process is as follow (For the sake of space, we will briefly describe the evaluation process): 1. Define four combinations of dimensions: (Category of population-Time), (Time-Nuisance), (Category of population-Nuisance) and (Intensity-Nuisance). 2. For each combination, a scale of 1 to 4 is used.3. The value of a given evaluation is the product of the values corresponding to the precedent combinations.4. Thus, the interval of evaluation is .This interval is divided to five subintervals: [0-10], ]10-30], ]30-60], ]60-100] and ]100-256].To these subintervals, it is associated respectively the following linguistic terms : Note: thereafter, in the interests of simplifying notations, we will represent these linguistic degrees respectively by

Example of Annoyance Evaluation:
Let us consider a given location L1 where one has three nuisances, noise, odour, and dust.An extract of the annoyance evaluation carried out by the human experts is shown in Table 2 below.We note that, this evaluation takes into account only the following factors: (1) socio-professional category (SPC), (2) age, (3) type, (4) intensity, (5), time of day, and (6) period of year.In this evaluation, 5 levels of nuisance intensity are considered.Level 1 corresponds to the absence of nuisance, which means that the degree of annoyance is i.e. not at all annoyed.This table is an extract of the decision matrix carried out by the experts based on different dimensions of annoyance.This matrix will serve as knowledge base to populate the data warehouse designed to contain data related to annoyances.This warehouse constitutes the core of our SDSS.In this section, first, we will describe the multidimensional model of annoyance that will be used as a running example for the rest of the paper.Then, we will describe our proposed model to represent and mange imprecision in context of data warehouse.

Multidimensional Data Model of Annoyance
In our case study concerning urban building sites, the subject of analysis corresponds to the annoyance.This subject is analyzed according to the dimensions we have presented in section 2.2.Namely: nuisances, categories of population, time, and space.
To model data of urban building sites, we have used a star schema represented by Figure 2. It is actually a schema of a spatiotemporal data warehouse, since space and time dimensions with hierarchies (for sink of simplicity, we omit the detail of dimension attributes in the figure).To represent it, we use the graphical formalisms proposed by (Malinovski, 2008).
We have defined a fact  In the current model, we are faced to two problems: 1.The model expert used to evaluate degree of annoyance provides a qualitative crisp values.Thus, this model does not capture the imprecision inherent to this measure.2. The fusion of measures: Indeed, the managers of public spaces are interesting by the analysis of the impact of annoyance.This measure is derived from degree of annoyance, which is qualitative measure, and population density, which is numerical one.We recall that degree of annoyance is the annoyance level of an individual type representing a given category of population.Thus we define the impact of annoyance as the overall level of annoyance of a given category of population, taking into account the density of this category.For the first problem we propose to use fuzzy set to represent the imprecision which is inherent to data.

Figure 2 .
Figure 2. Multidimensional data model of annoyance

Table 1 .
. The most relevant factors can be classified in three categories: (1) factors related to individual, (2) factors related to nuisance, (3) factors related to environment.The table below shows these three categories, with the main factors.Main factors of annoyance