DEVELOPMENT AND COMPARISON OF UNCERTAINTY MEASURES IN THE FRAMEWORK OF A DATA CLASSIFICATION

In the analysis and visualization of spatial information, quite often a data classification is applied. The choice of different methods, together with the choice of a different number of classes, the consideration of open classes and the treatment of outliers, can produce very different results. Hence, it is desirable to quantify the uncertainties that inevitably arise in this process. So far, almost only nonspatial properties have been considered. In addition to an extension of this set of statistical measures, this article also aims to define those which are concerned with the preservation of spatial patterns (e.g., local extreme values) as well as with visual perception. An empirical study will investigate the behavior of all these measures, for example depending on the classification method used or the number of classes. Also, correlations between the uncertainty measures and between the measures and statistical properties of the input data are examined. Finally, is will be shown that the uncertainty measures can not only be used individually or combined for pure evaluation purposes, but also for a-posteriori improvement of classification methods.


MOTIVATION
In the analysis and visualization of spatial phenomena, data classification can be helpful in many regards, for example, to reduce the volume of data, to emphasize spatial differences or to facilitate the reading of values.A typical example of data classification arises in the course of the production of choropleth maps for the graphical representation of areal quantities.Various classification methods are available that group the original values according to equidistance, quantiles, natural breaks or other criteria.It is well known that the choice of different methods, together with the choice of a different number of classes, the consideration of open classes and the treatment of outliers, can produce very different results.This leads in the case of choropleth maps to very different visualizations and thus to different cognitive impressions and decisions.
The overall aim of this paper is to quantify the uncertainties that inevitably arise in the generalization process of spatial data classification.
Based on a brief literature review (chapter 2) existing but also supplementary measures are presented that cover the various tasks occurring in the analysis and visualization workflows (chapter 3).Firstly, an extended set of non-spatial measures will be presented.However, these statistical parameters do not satisfy all requirements.Therefore, additional measures are presented that assess the preservation of certain spatial characteristics (e.g., preservation of local extreme values).Furthermore, key figures are presented that describe the visual perception uncertainty.Most of these measures give not only a global rating for a specific classification (i.e., with a predefined method and number of classes), but also specific measures for each individual class.
In chapter 4 an empirical investigation is performed in order to describe the behavior of the uncertainty measures depending on the number classes and used classification methods, the correlations between uncertainty measures as well as differences between uncertainty measures based on various input data characteristics.Chapter 5 demonstrates possible further usages of these measures, either in the context of a multi-criteria analysis, or as starting point for class-specific a-posteriori optimization of a given classification result.

PREVIOUS WORK
In the literature, the topic of data classification for cartographic purposes is comprehensively treated -here, reference is made to the textbook by Slocum et al. (2009) as well as the reviews by Cromley & Cromley (1996) or Coulsen (1987).
A number of empirical studies are also concerned with the comparison of methods for addressing typical map-use tasks (e.g., Goldsberry & Battersby 2009, Brewer & Pickle 2002, Mersey, 1990).In addition, interactive tools have been developed to find the "optimal" configuration for a given application; for example, the use of linked views between the data histogram and the choropleth map (Andrienko et al., 2001).While the methods implemented in common software work on a data-driven basis, relatively little was published on the neglect of the spatial context or the use of a task-oriented approach; Armstrong et al. (2003) give an overview.
The description of the uncertainty of a data classification refers almost exclusively to non-spatial properties (e.g., Jenks & Caspall, 1971;Armstrong et al., 2003;Andrienko et al., 2001).Examples of this (e.g., Tabular Accuracy Index or Goodness of Variance Fit) are picked up in section 3.1.Bregt & Wopereis (1990) investigate the visual assessment in comparison to various complexity measures.

UNCERTAINTY MEASURES
There are a number of proposals for statistical measures related to data classification in the literature.In most cases they describe global statistical behavior neglecting uncertainties related to spatial patterns or visual perception.In the following an extended set of measures will be presented that will consist of within-class as well as global descriptions of a classification.

Non-spatial measures
First, the within-class homogeneity can be described, i.e. the variation of values belonging to one specific class.The underlying idea is to keep the variance as low as possible so that all values with one class value are really as similar or identical as the association to one class value implicates.Examples for homogeneity measures are the  Tabular Accuracy Index (TAI), taking the absolute deviations from the class mean value into account (Jenks & Caspall, 1971);  Goodness of Variance Fit (GVF), considering the squared deviations from the class mean (Dent, 1999);  Entropy approach, applying the logarithmic function to the absolute deviation from the mean (Andrienko et al., 2001).
For all measures the deviation from the class mean values can also be replaced with the median in order to get less influence of outliers.Our empirical testing (see chapter 4) showed significant correlations between these measures so that a reduction to one indicator appears reasonable.For the remainder of this paper we use the Goodness of Variance Fit according to Dent (1999) as class homogeneity measure.The GVFc for a specific class c and GVFg for the total data set are calculated as follows: In literature, less attention is paid to the deviation of the original values from the corresponding class value (i.e., the average of upper and lower class limits).Even if the withinclass homogeneity is very low it can happen that the values are very close to the one or other limit value so that the representation is rather weak.The within-class matching values MATc for a specific class c and MATg for the total data are introduced as follows: where: : class value (average of upper and lower class limit value) Again, a MAT value towards +1 corresponds to a good matching.
In many applications, special attention is paid to the design and expressiveness of border classes.One goal can be the isolation of global extreme values in order to make these uniquely visible in choropleth maps.Hence, the Global Extreme Value index (GEX), which is a global measure, looks for the number of elements in the first and last classes as follows: where: : number of data values in lowest class : number of data values in highest class A large GEX value corresponds to a small number of values in the border classes, with the maximum value of +1 that points to the exclusive appearance of one and only one value in both border classes.Vice versa, a quantile classification produces a (near) equal distribution of values among classes so that GEX will be (close to) zero.

Spatial measures
The above mentioned classification methods are data-driven, i.e., the intervals are determined solely on the basis of the present frequency distribution of the original values.The spatial context of the underlying data, which is relevant for many applications, is completely neglected when using such divisions.Accordingly, so far there are hardly any measures that describe the uncertainty of a data classification looking for the preservation of spatial properties.One exception is the Boundary Accuracy Index (BAI) that evaluates values that are separated by common class boundaries (Armstrong et al., 2003).Edges are defined as polygon boundaries that show a significant value difference.Because the aforementioned BAI measure shows some disadvantages, the alternative Edge Preserving Index (EPI) has been developed (Schiewe, 2016).It compares the preservation (or even, enhancement) of neighboring values compared to resulting class values.

Visual perception measures
When it comes to visualization of classified data, visual perception is strongly influenced by dominant colors.This can be desired in the case of an actual imbalance of class occupation, or undesired if large regions are perceived more prominent than small regions and the task requires that regions should be perceived equally.Hence, the global visual balance over all classes should be quantified.Armstrong et al. (2003) propose a Gini coefficient to express this.Alternatively, for each class the area fraction is calculated (i.e., the total area of all class members divided by the total area of the data set) and compared to an equal area fraction (i.e., the total area divided by the number m of classes).For normalization purposes the sum of all class related differences is related to the worst case, i.e. the situation that only one class covers the entire area (i.e. the absolute difference amounts to 1-1/m) and all remaining classes are empty (i.e., (m-1)  1/m)leading to the factor (1/(2-2/m)).With that, the Global Visual Balance (GVB) is calculated as follows: where: : total area of data set : total area of class GVB values towards +1 correspond to an equal area distribution within the visualized classification result.
A typical problem with choropleth maps is the within-class visual imbalance that is caused by huge area differences within one class ("Russia vs Andorra effect").For this purpose, only the largest and smallest areas within a class have to be considered.Since this effect is evident especially in border classes (represented by the most intense hues), a weighting towards these classes is performed in the course of determining a global measure.Here, a distinction can still be made between the use of a sequential color scheme (increasing the weight from lowest to highest class) and a bi-polar color scheme (increasing the weight from middle to border classes).In the following formulas, a sequential color scheme as well as an exponential increase is modeled for the Class Visual Balance measure for each single class (CVBc) and the overall data set (CVBg): where: , : maximum and minimum value within class c , : maximum and minimum value within entire data set CVB values towards +1 correspond to low area differences within classes.

Methodology and Data
Based on the aforementioned compilation of the different uncertainty measures an empirical investigation is performed in order to describe the behavior of measures depending on the  number classes and used classification methods as well as the correlations between measures and differences between measures based on various input data characteristics.
In addition to the typical classification methods equidistance, quantiles and natural breaks, also an own method designed for the preservation of local extreme values is usedas a representative for methods that considers spatial properties.This aChor method is described in detail in Schiewe (2017).
For each method classifications are calculated for 4, 6, 8, 10 and 12 classes, which cover the typical range for choropleth maps taking the visual perception of human users into account (Slocum et al., 2009).
Three data sets with different (geo-)statistical properties are used for the analysis.Table 1 summarizes the respective characteristic values.For illustration purposes, figure 1 shows classifications for the "Rainfall" data set using the four different methods and two selected class numbers.

Results and Discussion
Figure 2 presents a graphical summary of resultsshowing the various uncertainty measures for the data sets under investigation.
If All in all, most of the measures show improved properties with increasing number of classes (i.e., increasing measures towards +1).In practice, however, this contradicts to the requirement of a rather small number of classes that is needed to guarantee sufficient visual differentiability.
Secondly, the behavior of each of the uncertainty measures depending on the data classification method is analyzed:  Within-class homogeneity (GVFg) shows rather similar trend behavior and values (generally, deviations smaller than 0.1) for all methodswith Jenks being always the best method and quantiles always the worst one.In the latter case this can be justified with the absence of an equal distribution of values of all data sets (see histograms in table 1).Summarizing this set of tests, it can be concluded that there is no "perfect" classification method.Instead, a task-oriented selection is needed.With the data sets used here, non-spatial measures have been met well with equidistance, while for preserving spatial patterns (like local extreme values) specific algorithms (like aChor) should be chosen.With respect to visual perception, contradictory results can be observed: By definition, quantiles deliver best results for global and worst results for within-class balance.
For practical purposes, it is desirable to reduce the huge set of uncertainty measures and to avoid dependencies between them, respectively.Hence, investigations are also conducted to evaluate the correlation between different uncertainty measures:  Within-class homogeneity (GVFg) and within-class matching (MATg) show correlation coefficients of (close to or exactly) +1.This is understandable because the mean value of class and the class value have a constant offset.


High correlations (larger than +0.9) can also be found between GVFg and LEX, MATg and LEX, GVFg and CVBg, MATg and CVBg.


On the other hand, the global visual balance shows variable (and mostly low) correlations with other measures.From this it can be concluded that general evaluations can reduce the set of non-spatial measures to one or two measures (e.g., GVFg and NN).However, if a more detailedand classspecific evaluation and post-processing of the classification is desired (see chapter 5), the whole set of parameters is still of interest.

Measure
Also for practical purposes it is desirable to predict uncertainty properties directly from the input data set.For this reason the differences between uncertainty measures for the three data sets are calculated and brought into connection with statistical parameters of input data (see table 1).


Large differences (larger than |-0.5| on average) can be observed between data set "Social Index" and the other two for within-class matching (MATg).The significant lower value for "Social Index" can be justified with the smaller RMSE/span width ratio (0.07 against 0.17 or 0.18, resp.).This leads to smaller class intervals in the middle of the data set and broader classes towards to the boundaries.Due the large total number of polygons (n) in this data set these border classes are still well occupied and show very large internal variations that affect the overall measure.A similar effect (but with smaller differences in the order of 0.1 to 0.3 on average) can be observed for within-class homogeneity (GVFg).


The As these described examples show that there are significant and logical dependencies between data set parameters and uncertainty measures.A thorough analysis of input data statistics (as given in table 1) can also speed up the analysis process and help to some extent to find appropriate classification methods (see chapter 5).

Multi-criteria analysis
So far, the various uncertainty measures have been treated independently.Apparently, the analysis and selection of classification parameters (especially, method and number of classes) is a multi-criteria process.Using the measures, a linear combination as target function with user-defined weights can be applied.If the weights are normalized (i.e., their sum is set to 1), the overall measure still has a maximum ("best") value of +1.Obviously, setting the weights needs pre-knowledge that is dependent on the application requirements.
The problem of the reduction to an overall criterion is that the individual target values are strictly not interconvertible and the weighting factors are subjective.Therefore, evolutionary or, even more specific, genetic algorithms serve as alternative approaches.For example, Pareto optimization performs a separate optimization for all combinations of weighting factors.
The idea is pursued that one criterion is only improved until another is worsened.The uncertainty measures presented here can also be introduced into such a Pareto optimization.

Optimization of classified data sets
The presented uncertainty measures can not only serve the aposteriori evaluation of the classifications, but also to improve them.A simple, but brute-force, method is to compute multiple classifications and select the best variant based on the derived uncertainty measures.
Alternatively, a calculated variant can also be optimized by considering the class-specific uncertainty measures in an aposteriori manner.

SUMMARY AND FUTURE WORK
This paper aimed to define an extended and task-oriented set of uncertainty measures to quantify generalization effects as part of a data classification process.Not only non-spatial statistical properties were considered, but also the preservation of spatial patterns as well as uncertainties in the course of visual perception.The presented set of measures is not completeespecially for the latter two aspects extensions and further empirical investigations are possible and necessary.
A recommendation for a standardized procedure or selection of uncertainty measures based on the tests carried out here is not possible.It can be expected that due to the variable usages for the classified data, an adapted application-specific selection must take place.For this purpose, user-friendly instructions or interfaces are still missing.
It was also pointed out that uncertainty measures can be used not only for pure evaluation purposes but also for the optimization of existing classification results.Here the classspecific uncertainty measures are very helpful.Future work will address the design and implementation of appropriate evolutionary algorithms that address relevant cases (e.g., optimization of between-class heterogeneity).
where: : i-th data value : mean value within class c : number of classes : number of values in class c : total number of values in total data set : standard deviation of values of total data set An increasing GVF value (with a maximum of +1) corresponds to increasing within-class homogeneity.Instead of considering the within-class homogeneity it is also possible to evaluate the between-class heterogeneity, i.e. the goal is to create well distinguishable classes.Sun et al. (2013) propose a probability-based separability measure for units (e.g., polygons) that are associated to different classes.In the following, a simpler approach will be pursued: It is based on the demand that each value should have a smaller difference to its current class (c) value compared to the difference to the two adjacent classes (c+1; c-1).The respective differences of a value x amount to where: : class value of current classification (average of upper and lower class limit value) : class value of subsequent class : class value of preceding class From this a comparison is made between the allocations to the current class and the closer adjacent class.If d is larger than zero the value is placed best in the current class.Finally, the ratio of values that are placed best in the current class to the total number of values can be computed for each class (NNc) and the entire data set (NNg) as follows: A large value of NN (with a maximum of +1) points to a desired large number of values that are categorized to the current class, which corresponds to a good between-class heterogeneity.By definition, the equidistance method always shows NN values of +1.
Hence, further spatial patterns of interest should be considered during the data classification stage.Within the project aChor (Schiewe & Chang, 2018) new classification methods are developed that also use measures describing the success rate for preserving specific spatial patterns:  Local extreme values are defined as polygons that show a larger (smaller) value compared to all neighboring polygons.Consequently, the data classification should not aggregate neighboring polygons into the same class.The Local extreme value preservation rate (LEX) calculates the ratio of preserved local extreme values after classification compared to the total number of local extreme values (Schiewe, 2017). Hot or Cold spots are defined as polygons with high (low) values that are surrounded in a certain neighborhood with polygons also showing high (low) values.The hot / cold spot polygons are determined by standard methods such as the Getis Ord index (and derived z-values in conjunction with the application of a corresponding threshold value; Getis & Ord, 1992).From this a binary classification (and labeling) in hot / cold spots as well as other polygons is possible.Similar to the LEX measure, the Hot/Cold Spot preservation rate (HCS) is defined by the ratio of preserved Hot/Cold spots in comparison to all detected spots.

Figure. 1 .
Figure. 1. Example choropleth maps for one and the same data set ("Rainfall"), using different classification methods (from left to right) and class numbers (from top to bottom)

Figure 2 :
Figure 2: Graphical summary of empirical investigation results: Uncertainty measures (from top to bottom) using different data sets (from left to right).Right axis of each diagram shows number of classes (ranging from 4 to 12, using an interval of 2), upper axis shows uncertainty measure (always with maximum value of +1, representing the "best" case)

Fig. 3 .
Fig. 3. Improvement of given solution (here: aChor, 6 classes) through shifting class breaks (two on the right hand side) and redistribution of values aforementioned histogram appearance of "Social Index" (accumulation in the middle, strong reduction towards boundaries) also leads to a better preservation of global extreme values compared to the other two data sets. Differences up to |-0.5| are detected for the local extreme value preservation (LEX) between "Rainfall" and the other two data sets.This reduced preservation rate is based on the large RMSE in this data set that leads to a stronger variation of values around local extreme value polygons and thus to a more difficult definition of class breaks that are able to separate all neighboring polygons.