MINING CO-LOCATION PATTERNS WITH CLUSTERING ITEMS FROM SPATIAL DATA SETS

: The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.


INTRODUCTION
As an important research direction in the field of spatial data mining, Co-location patterns discovery has a wide range of applications, include ecology, Earth science, biology, public health, transportation, etc.Similarly, the co-location patterns with clustering items are also of great significance to the above fields of application.Algorithms of traditional co-location mining cannot be used for mining co-location patterns with clustering items directly.Therefore, we conduct a detailed study of the novel co-location patterns and present an algorithm for mining it.

Related Works
In previous works on co-location patterns discovery，the concept of co-location patterns with clustering items has not been discussed.For traditional co-location patterns, the previous literature proposed different mining algorithms.Huang, Shekhar and Xiong (2004) proposed a general approach: Join-based approach.At the same time, they defined participation index that has an anti-monotone property.Furthermore, they showed the relationship between the participation index and a spatial statistics interest measure, the cross-K function.Yoo and Shekhar developed the partial-join (2004) and the joinless (2005) approaches to mining co-location patterns, the two algorithms greatly reduce the computational cost.Huang, Pei and Xiong (2006) addressed the problem of mining co-location patterns with rare spatial events.In their paper, a new measure called the maximal participation ratio (maxPR) was introduced and a weak monotonicity property of the maxPR measure was identified.Xiao et al (2008) introduced the density based co-location pattern discovery.The concept of the negative co-location patterns was defined by Jiang et al (2010).Based on the analysis of the relationship between negative and positive participation index, they proposed methods for negative participation index calculation and negative patterns pruning strategies.Zhou et al (2012) applied co-location patterns to the decision tree, they developed a called co-location decision tree (CL-DT) method.

Our Contributions
In this paper, the definition of clustering items is given, and we present a novel co-location pattern, i.e. co-location patterns with clustering items.First the basic concepts of co-location patterns with clustering items and rules are defined.Second, we study the problem of efficiently mining co-location patterns with clustering items systematically.Through the review of the previous approaches, we propose a novel approach for mining co-location patterns with clustering items based on the join-based approach.Finally, we conduct experimental evaluation use a synthetic dataset.The results show that our algorithm is correct and efficient.

BASIC CONCEPTS
Definition 1 (clustering items) X = {  ,   } is a clustering item if   and   satisfy the neighbor relationship.We also write this clustering item as     .
(2) PI() ≥ min _.(2) Definition 9 ( rules of co-locations with clustering items ) X → Y is a rule co-locations with clustering items if X ∪ Y is a prevalent co-location pattern with clustering items and the conditional probability of X → Y is more than a conditional probability threshold (min_conf) defined by users.

Review of Join-based Approach
Huang, Y., Shekhar, S. and Xiong, H. ( 2004) proposed an instance join-based co-location mining algorithm.First, after finding all neighbor pair objects(size 2 co-location instances) using a geometric method, the method finds the instances of size k ( > 2 ) co-locations by joining the instances of its size k-1 subset co-locations where the first k-2 objects are common.

Clustering Items
Algorithms of traditional co-location mining cannot be used for mining co-location patterns with clustering items directly, because some definitions of co-location patterns with clustering items had been redefined (such as the definition of spatial neighbor relationship between X and Y) and some methods must be redesigned (such as how to calculate the PI).
Our approach for mining co-location patterns with clustering items is extended by join-based approach,still using the principle of instance join.It has five phases.The first phase finds all clustering items.The second phase computes neighbor relationships between spatial instances and clustering items.The third phase generates size-k candidate co-locations with clustering items.The fourth phase is pruning.The fifth phase generates prevalent co-locations with clustering items and rules of co-locations with clustering items.

Discover All Clustering Items in Spatial Database
This is the basic step of this algorithm.As in Definition 1, we use Euclidean distance to measure whether two instances satisfy the neighbor relationship.Once two instances satisfy the neighbor relationship, we call them a clustering item.In this step we need to find out all the clustering items.
For co-locations with clustering items in this paper, T = X ∪ Y is a co-location of size k, if || = 2, || =  − 1.And we note that co-locations with clustering items of size 1 are different from traditional co-locations of size 1.All the clustering items which we discovered are co-locations with clustering items of size 1.All co-locations with clustering items of size 1 are also prevalent and we need not calculate their prevalence measures, because the value of participation index is 1 for all co-locations with clustering items of size 1.

Generation of Candidate Co-locations
Similar to the traditional algorithm, we could rely on a combinatorial approach to generate size k+1 candidate co-locations with clustering items from size k prevalent co-locations.Specially, a clustering set and an instance are combined to generate a size 2 candidate co-location with clustering items.

Pruning
In traditional algorithm, Candidate co-locations can be pruned using the given threshold min _prev on the prevalence measure.
The min _prev can also be used in our algorithm to prune candidate co-locations with clustering items.This kind of pruning method is called prevalence-based pruning by Huang[2].
For a candidate co-location with clustering items T = X ∪ Y( = {  ,   }), not only do we have to calculate PI(), but we also have to calculate PI(  , ) and PI(  , ) .Because a prevalent co-location with clustering items must meet the condition (1) and condition (2) proposed in Definition 7.
In addition, we prevent a novel pruning method: due to condition (1), X which contains   cannot be combined with   into a prevalent co-location with clustering items, if {  ,   } is a prevalent co-location i.e.PI({  ,   }) ≥ min _.This method can greatly reduce the unnecessary calculation time.
For example, in Fig. 1, {AC,B}, {BC,A} cannot be prevalent co-location with clustering items since {A,B} is a prevalent co-location.We can prune directly and there is not necessary to calculate PI({, }), PI({, }).

EXPERIMENTAL EVALUATION
We evaluate this algorithm using synthetic datasets.Synthetic datasets were generated using a spatial data generator similar to the data generator used in Shekhar and Huang (2001).The number of spatial feature types is 20.Three parameters, namely number of spatial instances (n), prevalence threshold (min _prev), and spatial neighbor distance threshold (d), were varied during the experiments for verifying the effects of parameters and the performance of the algorithm.

Effect of Number of Spatial Instances
We examined the performance of the algorithm with the number of spatial instances.We used a spatial frame of 1000*1000.Once the number of spatial instances changes, the density of the data will change.As is shown in Fig. 4, the execution time of this algorithm significantly increased with the increment of the number of spatial instances.This is very similar to the join-based algorithm, because as the number of spatial instances increases, a large number of joins are required.The following experiment examined the effect of parameter min _prev for running time.In the experiment, the number of spatial instances is 10K, and the parameter d is set to 20.As is shown in Fig. 5, when the prevalence threshold changes from 0.4 to 0.8, the execution time does not change much.However, when the prevalence threshold decreases from 0.4, the running time starts to increase rapidly.

CONCLUSION AND FUTURE WORK
In this paper, we discuss the concept of co-location patterns with clustering items and design an algorithm for mining co-location patterns with clustering items.This algorithm is correct.We evaluate the performance of the algorithm by experiments.In order to mining co-location patterns with clustering items more efficiently, we will continue to study it.

Figure 1 .Example 3
Figure 1.An example data set Example 2 In Figure 1, C.3 and clustering item A.1B.1 satisfy the neighbor relationship, because C.3 and A.1,B.1, respectively, to satisfy the neighbor relationship, and A.1B.1 is a clustering item.Definition 3 ( row instance ) T is a co-location pattern with clustering items, a neighborhood instance I of T is a row instance of T if I contains instance of all events in T and no proper subsets of I does so.The table instance of T is the collection of all row instance of T. Definition 4( co-location patterns with clustering items )T is a co-location pattern with clustering items, if T = X ∪ Y,where X is a clustering item,Y is a set of spatial features,|| = 2, || ≥ 1,  ∩  = ∅.Definition 5 ( the PR of co-location patterns with clustering items ) The participation ratio PR(,   ) in a co-location pattern with clustering items T = {  , ⋯ ,   } is a fraction of feature   which participate in any row instance of co-location pattern with clustering items T.

Figure 3 .
Figure 3.The algorithm for mining co-location patterns with clustering items

Figure 4 .
Figure 4. Effect of number of spatial instance