3 D Building Models Segmentation Based on K-means + + Cluster Analysis

3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model) 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid) and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that Kmeans++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results. * Corresponding author, Bo Mao, email maoboo@gmail.com


INTRODUCTION
Multimedia has gone through three waves so far: sound, image and video.In recent years, with the development of 3D scanning and relevant technologies, 3D digital geometry models have become a new type of multimedia (Sun, 2005a), which have been intensively used in many fields such as industrial manufacturing, entertainment, biological medicine, architectural design, visualization in scientific computing and etc.How to reuse and modify existing models according to some local shape features，namely segmentation of 3D models, has become a very significant research subject (Sun et al.,2005b).Specifically, the meaning segmentation of 3D building models can help the analysis of building structure that is essential for building generalization operation such as typification [Li et al., 2004].By replacing the mesh data with parameterized CSG component, the data volume of 3D city models can be reduced heavily and the render efficiency will also be improved.
However it is a difficult to automatically segment the 3D buildings in mesh model.Mesh model itself lacks enough structural features and semantic information, so the understanding of the model becomes an urgent problem to be solves.In this paper, we deal with Collada file which is a type of triangular mesh model.Segmentation of 3D mesh model is usually based on certain standards of division, so that the original 3D model can be divided into a set of meaningful simple shape.For example, a building can be divided into roof, tower and wall.A partial decomposition not only provides a way to abstract the semantic information about the underlying object, but also can be used to guide several types of mesh processing algorithm, including reconstruction, compression, texture mapping, parameterization, mesh deformation, model retrieval, and generalization (Mangan et al., 1999;Qin, 2013).
In fact, most of 3D mesh segmentation algorithms are inspired by 2D image segmentation, and then are extended to 3D mesh space.In past two decades, researchers have proposed a large number of segmentation algorithms for different segmentation purposes.Originally, (Vincent et al., 1991) extend watersheds in image processing to the segmentation of 3D model with arbitrary topology, which opens the prelude of mesh segmentation.(Xiao et al., 2003).propose a approach for the segmentation of human body scans based on discrete Reeb graph.(Shamir, 2008) classifies the segmentation algorithms into two categories.One is part-type segmentation, which is aimed to segment mesh models into meaningful or semantic parts (MORTARA et al., 2006); the other is surface-type segmentation, which generates surface meshes by means of plane or curvature of mesh.(Cao et al., 2008) segment and simplify triangular mesh models and reflects excellent value of application.The more relevant and detailed research work can be found in the literature (Sun et al., 2005b).
Mesh models are also widely used in 3D building models.The concept of Cyber City has been proposed for more than ten years.Our city has experienced the span from 2D city to 3D city (Li et al., 2011).3D city building models have been applied to many fields, especially in the aspects of map and urban planning.It will have significant effect that if we are able to segment, simplify, generalize and rebuild 3D building models to map or urban planning.In this paper, we propose a method to segment Collada 3D building models into different parts using cluster analysis.Common clustering methods segment 3D mesh models by K-means, as illustrated in Fig. 1, whose efficiency heavily depends on randomized initial seed points (i.e., centroid) and different randomized centroid can get quite different results.
To overcome it, we use K-means++ (Arthur el al., 2007) clustering algorithm to solve this problem and finally achieve good and meaningful results.The rest of the paper is organized as follows: Section 2 briefly reviews related work.After introducing our solution and algorithms in Section 3. Experiment results are reported in Section 4. Finally, Section 5 concludes this paper.

RELATED WORK
There are about three algorithms dealing with clustering segmentation for 3D mesh models.
Hierarchical Clustering Algorithm.Every surface patch is initialized to clustering composed of itself.In the process of clustering, every group of clustering need to be computed a merged cost, and then combine the group of clustering that has lowest cost.Repeating the above steps can get final segmentation results.Researchers take different merged cost measurement and receive different Hierarchical Clustering Algorithms, such as fitting primitives characteristics measurement (Attene et al., 2006), compactness measurement (Sander et al., 2001), sliding analysis feature measurement (Gelfand et al., 2004).Although they all can get satisfying segmentation results, the number of the final segmentation results cannot be determined in advance.
Iterative Clustering Algorithm.It is based on K-means algorithm and its key is convergence problem.No matter in the process of classifying elements or in the process of calculating the centre of clustering, distance must be carefully selected to make iteration converge in the end.(Shlafman et al., 2002) take a measure of weighted distance to represent any distance between two surface patches.In order to easily generate segmentation that two mesh models can be compatible, literature (Mortara et al., 2009) proposes a K-means algorithm based on face clustering.As mentioned in Section1., model segmentation efficiency by K-means algorithm heavily depends on randomized initial seed points (i.e., centroid).Nevertheless, K-means algorithm is the basis of iterative clustering algorithm, therefore, it also has this disadvantage.
Spectral Analysis Clustering Algorithm.The idea of it is derived from the theory of spectral graph partitioning.(Liu et al., 2007) adopts the top-down binary cutting method to segment the mesh models.What's more, he uses spectral embedded technology to project the meshes to 2D plane and extract the contour line.(Shapira et al., 2010) fit the Gaussian Mixture Models to achieve the patches fitting probability according to the shape diameter function of patches.Then the spectral graph cut algorithm is used to determine the segmentation boundaries.The segmentation method based on spectral analysis is easy to implement and can get segmentation results.Its advantage is efficiency.But it is needed to compute the angular distance, geodesic distance between the face when constructing the close matrix, Because of the defects of these three clustering algorithms, in this paper, we introduce the K-means++ clustering algorithm to segment 3D building mesh models.It can not only determine the number of the final segmentation results, but also solve the problem of selection of randomized initial seed points.It can also simply the extraction of the feature value.Moreover, we compare the performance with K-means.

METHODOLOGY
In this section, we first introduce our solution for segmentation of 3D building mesh models, then explain the given algorithm in details.

Solution overview
Classic iterative clustering algorithm is K-means.It also belongs to a kind of cluster analysis algorithm, which mainly iteratively classifies N objects based on their attributes or features into K number of clusters.K is a positive integer (K=2, 3, 4, …).A series of studies has adopted it to achieve the excellent results of segmentation.For example, (Sun et al., 2006) propose a method to segment points cloud of 3D models using K-means clustering algorithm, as illustrated in Fig. 2.However, K-means algorithm's performance heavily depends on randomized initial seed points(i.e., centroid) and different randomized centroid can get quite different results.Therefore, we introduce K-means++, as illustrated in Fig. 3, which is a simple and linear-time randomized initialization clustering algorithm, to solve this problem.We implement the k-means++ algorithm whose detail we will be introduced in Section 3.2., and evaluate the quality of the clustering produced on Collada 3D mesh building models.Collada is a type of triangular mesh model where every polygon is composed of several triangles and then constitute the entire model.

Algorithms
We first define some notations as follows.In a 3D building model M, each 3D point p∈PM is a 3D location.After model normalization, we get a new model M2.What's more, considering the characteristic of Collada 3D building models, which is a kind of triangular mesh model, we choose each triangle's centre of gravity g∈GM2 as our features after lots of experiments.The algorithms, integrated with normalization and K-means++ clustering, are summarized in Alg.1.

model normalization:
In general, data pre-processing is necessary for 3D models before clustering(i.e., model normalization), which includes maintaining their translation and scaling invariance.Above two steps are equivalent to PCA(Principal Component Analysis).
Translation Invariance.The purpose of the translation for models is to make its centre of gravity coinciding with the origin of the coordinate system.This method has better robustness in term of translation invariance (Xing, 2006), denoted by where M1 is the model after translational transformation, M is the original model, and G represents model's the centre of gravity.From the formula, it is seen that centre of gravity G is a key issue and has a great influence on the final result of translational transformation.Therefore, the process of computing G is defined as where Gi is the centre of gravity of the triangular patch i, si is the area of the ith triangular patch, and S represents the sum areas of all triangles of 3D model.
where Ai, Bi, Ci are three vertex coordinates of a triangle respectively.
Scaling Invariance.The aim of models scaling is to scale them to the unit of space.The following is our scaling formula: where C is scaling coefficient whose selection is not unique.In this paper, we search for a point of building model that is the furthest away from the centre of gravity.Furthermore, we treat the reciprocal of the maximum distance as scaling coefficient C.
(5) A large number of applications show that this method has a good robustness.

K-mean++:
At the beginning of Section 3.2, we have defined a set of feature vectors GM2={g1,g2,g3,…,gn } and we will start to segment for 3D building models using GM2.The final number K of clusters is fixed in advance.For general 3D model, the number of clusters will be less than 10.Based on it, we let K=5 because, generally speaking, each building model has one roof and four walls.The following is the specific steps of K-means++ clustering algorithm: Step1.Randomly select a point z∈GM2 as the seed point(i.e., centroid).
Step2.For each point gi, compute the distance D(gi, zj ) from gi to nearest centroid zj.And then add these distance up to Sum(D(gi, zj ) ).We use Euclidean distance to get D(gi, zj ), as illustrated below: Step3.Take a value V from Sum(D(gi, zj ) ) in a random manner, then iteratively V=V-D(gi, zj ) until V≤0.At this moment, gi is the next centroid z.
Step4.Repeat Step2 and Step3 until K centroids are selected.
From Step1 to Step4, we have solved the problem that K-means algorithm's performance heavily depends on randomized initial centroids right now, and next let's start to segment 3D building model.
Step5.For each point gi, compute the cluster that it belongs to: where uj is mean value of the jth cluster.

Solution overview
To evaluate the proposed method, we test it on a large-scale dataset of 2000 Collada 3D building triangular mesh models that are downloaded from Internet by a web crawler.For each building model, we only capture its structure without texture, as shown in Fig. 4.
What's more, we run the proposed method on an Intel Core i7-3770 3.40GHz machine.

Evaluation of experimental results
The 3D models are respectively segmented by K-means and K-means++ clustering algorithm based on centre of gravity of triangular patches.Every model is divided into five meaningful parts(as shown in Fig. 5), and from the Fig. 5, it obviously indicates that the final effect adopting K-means++ clustering is much better than K-means clustering, in particular with regard to edges.K-means clustering is not good at dealing with edge details, which produces a lot of jagged edges like picture (b3) and (b4).Because the partitioning results of fig.In addition, we also respectively compare the effect of 3D point cloud using K-means++ and K-means clustering algorithms(as shown in Fig. 6.).From the pictures, we can clearly find that the left column is much better than the right.For instance, in picture (a), the left building model's pointy roof is divided into two parts, black part and yellow part.On the contrary, the pointy roof of right model actually is also divided into two parts, but the red part is surrounded by black and we hardly can recognize the black part.This again proves that K-means++ clustering is better than K-means in terms of segmentation suggested by the above comparison.

CONCLUSION AND FUTURE WORK
We have presented an efficient approach for 3D mesh model segmentation based on cluster analysis.On one hand, the proposed method leverages the centre of gravity of triangular patches to achieve great and meaningful segmentation results, and on the other hand, experiment results also demonstrate that it is feasible for 3D mesh models through dynamically updating the centroid and classification of feature vectors.
Although the 3D model segmentation has been carried out many and different applications oriented researches, there is not a segmentation algorithm suitable for all applications and most of them are put forward to solve the specific problems.Moreover, the significance of 3D model segmentation and recognition is the most challenging task, and the introduction of pattern recognition and artificial intelligence is the necessity of the next research.Last but not the least, we segment the 3D models one by one in this paper, and actually, it can be improved by cosegmentation to promote efficiency (Zhang, 2015).Figure 6.The left column is the results of using K-means++.The right column is the results of using K-means.

Figure 3 .
Figure 3. K-means++ algorithm process diagram Figure 4. Original building models without texture Cosegmentation is a new train of research thought for 3D model segmentation, which segments a set of 3D models as a whole into consistent parts and infers more knowledge than from an individual model.Special Fund for Grain-scientific Research in the Public Interest (201513004) and the project of the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Nanjing University of Finance and Economics.