RECONSTRUCTION OF CONSISTENT3D CADMODELS FROMPOINTCLOUD DATA USINGAPRIORI CADMODELS

Abstract. We address the reconstruction of 3D CAD models from point cloud data acquired in industrial environments, using a pre-existing 3D model as an initial estimate of the scene to be processed. Indeed, this prior knowledge can be used to drive the reconstruction so as to generate an accurate 3D model matching the point cloud. We more particularly focus our work on the cylindrical parts of the 3D models. We propose to state the problem in a probabilistic framework: we have to search for the 3D model which maximizes some probability taking several constraints into account, such as the relevancy with respect to the point cloud and the a priori 3D model, and the consistency of the reconstructed model. The resulting optimization problem can then be handled using a stochastic exploration of the solution space, based on the random insertion of elements in the conﬁguration under construction, coupled with a greedy management of the conﬂicts which efﬁciently improves the conﬁguration at each step. We show that this approach provides reliable reconstructed 3D models by presenting some results on industrial data sets.


Context
3D CAD models of industrial environments such as power plants, oil and gas refineries or pharmaceutical installations are used to serve several purposes: simulation of maintenance operations, training, etc. Some of these applications require the 3D models to match precisely (up to one centimeter) the actual geometry of the facility they represent. Such models are called "as built" models and are usually constructed from 3D point clouds. In our case, the point cloud P that we use come from terrestrial LI-DAR acquisitions. The 3D CAD models which are considered here are CSG models made of simple primitive shapes such as planes, cones, tori and cylinders, that are truncated and assembled. Each primitive shape is a parametric entity defined by its position, orientation and some other geometric parameters inherent to its type.
Besides the point cloud P, we consider that an a priori 3D CAD model M0 is also available. While the point cloud describes the actual state of the scene to be reconstructed, the a priori 3D model stands for a theoretical state of the facility : it is a rough estimate of the solution we are searching for (cf. Figure 1a). However, there may be some significant differences between M0 and the actual scene represented by P : some elements of M0 may be present in the facility at a different place, with a different orientation or with a slightly different geometry. Moreover, the number of primitive shapes required to describe the whole actual scene may be different from the number of shapes in M0. Indeed, some elements of M0 may be missing or duplicated in the facility. Inversely, some components in the facility may have no match in M0. The a priori model M0 can be an existing generic 3D representation of the site (a 3D model that describes the structure of a typical facility), but it can also come from the "as built" reconstruction of another facility which looks like the one to be processed.
We focus our work on the reconstruction of right circular cylinders parts, because many shapes in industrial environments are cylindrical and such components are very frequently used in piping and equipment design. Moreover, recognized cylinders can be subsequently used as key constraints for the detection of other shapes (especially tori and cones).
Hence given P and M0, we aim at constructing a reliable 3D model X made of cylinders, such that : • X fits well to P : each cylinder is close to its points, • each cylinder of X matches a cylinder from M0 : we consider that the cylinders which are to be reconstructed must appear in M0 up to some specified change tolerances, so as to provide reliable results, • X is consistent : the connections between its cylinders are relevant with respect to the constraints observed in industrial facilities. These connections constraints are also some kind of a priori knowledge about the scene to be reconstructed, besides the model M0.
Our contributions to the reverse engineering field are the expression of a new probabilistic framework for this reconstruction problem, as well as a new approach to perform the optimization stated in this framework. Our method introduces the ideas of checking the reconstructed model consistency and guiding the reconstruction by using the a priori 3D model. Hence, the reconstructed model does not only fit well to the point cloud, but is also relevant with respect to the industrial environments constraints, which is usually not guaranteed by existing approaches, and its similarity with the a priori 3D model can also be ensured.

Related work
To our knowledge, the work of (Bosché, 2010) is the only reverse engineering approach that uses a priori CAD models as an initial estimate of the point cloud to be reconstructed. The 3D model is fitted to the point cloud using a registration approach based on the ICP algorithm, which is only supposed to converge when the data to be registered are close enough. So this approach cannot handle significant changes.
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXVIII-5/W12, 2011ISPRS Calgary 2011 August 2011, Calgary, Canada each segment is displayed with its own random color. These results can be compared with the segmentation provided by the algorithm of (Schnabel et al., 2007) shown in Figure (d). (Schmittwilken and Plümer, 2010) also use strong prior informations such as the size of windows to classify distinct parts of building facade, but their work is very specific to the facade parts classification problem, although interesting ideas could also be used in another context. There also exists some researches that drive the surface reconstruction from point clouds by using prior knowledge (Pauly et al., 2005, Jenke et al., 2006, Gal et al., 2007. These works use 3D meshes to model the surfaces whereas we would like to build the scenes with primitive shapes, which gives a level of representation more adapted to the targeted applications. Two approaches are widely used to handle the shapes recognition problem : the Generalized Hough Transform (GHT) and the RANdom SAmple Consensus (RANSAC) algorithm. The GHT is a vote based-approach: each point votes for a set of shapes in a discretized parametric space, and the shapes whose associated parametric description correspond to a peak of votes can be considered as present in the point cloud. The main drawback of this method, when naively used, is its prohibitive complexity since the whole shape parametric space must be scanned (5 dimensions for the cylinder). The complexity can be lowered in the case of the cylinder by resolving some parts of the problem in lower dimensions spaces (Rabbani and Van Den Heuvel, 2005). This method provides nice results on part of industrial scenes, but the detection of accurate cylinders with different sizes (especially the smallest ones) in large scenes remains challenging. The RANSAC algorithm (Fischler and Bolles, 1987) is a stochastic approach which consists in randomly picking minimal points sets from which candidate shapes are built. Without improvement, most of the generated candidates are not relevant and the number of trials that is required to find a correct shape can be huge. (Schnabel et al., 2007) propose an efficient algorithm that decreases the minimal number of trials that ensures shapes recognition. The authors also provide statistical tools that speed up the score computation of candidate shapes. This approach has shown good results on the data sets we have tested, although there are still some precision issues that lead to irrelevant shapes.
We finally would like to mention the work of (Fisher, 2004) that points out the interest of using a priori knowledge in the reverse engineering process, so as to provide high quality results. The authors notably propose to take shapes relationships into account, by adding constraints in the localized shape fitting problem.

Overview
The use of a prior knowledge lends itself quite well to the Bayesian formulation (Diebel et al., 2006) of the reconstruction problem which states this task as the research of the most probable configuration (cf. Section 2.2). The underlying probability must be defined to set the requirement that we want the reconstructed model to fulfill. Therefore we propose a probability that embeds several kinds of constraints (cf. Section 2.3), such as the quality of the fitting to the point cloud, the similarity with the a priori 3D model and the quality of the connectivity between shapes in the configuration.
Once the probability has been defined, we have to find an efficient way to compute the optimal configuration among the huge and complex set of possible solutions. For that purpose, we propose a new iterative optimization approach based on the stochastic exploration of the configuration space (cf. Section 3.1). A candidate cylinder is randomly generated at each step, in order to insert it into the configuration. As its insertion may decrease the probability of the current configuration, we propose a greedy method to handle conflicts with the cylinders already included in the configuration: the candidate cylinder is accepted if it increases the probability once the conflicts have been resolved. We also present a cylinder generation algorithm that favors the relevant cylinders appearing, by using both P and M0 to guide the random generation (cf. Section 3.2).

Introduction about data and notations
Let P = (p 0 , n0) , . . . , p n−1 , nn−1 be a point cloud that stands for the actual state of a facility. Each pair p j , nj ∈ P consists of a 3D point p j and its associated normal vector nj, which can either be provided with LIDAR data or computed using a local plane fitting approach (Mitra and Nguyen, 2004). The a priori 3D CAD model M0 = C 0 , . . . , C m−1 is a set of primitive shapes, among which we only consider right circular cylinders. Each cylinder C considered in this paper is truncated and defined by its center of mass cC, its axis direction aC, its radius rC and its length lC. We aim at finding the most probable configuration X = {C0, . . . , Cx−1}, in which each Cj is a cylinder.

Reconstruction as an optimization problem
Considering the reconstruction problem from the probabilistic point of view, we are searching for the configuration that maximizes some posterior probability π(.) which is proportional, up to some constant normalization factor, to the product of a Data likelihood term PD and a Prior term PP according to the Bayes rule: "∝" stands for the proportionality relation : the constant normalization term is not involved in the optimization problem. PD(P|X , M0) is in fact independent of M0 since P does not depend on M0. It quantifies how well P fits to X , whereas PP defines some prior density over the whole space of possible configurations. We have to ensure that the prior term PP favors consistent configurations and takes the similarity with M0 into account. Using the densities: we have to find the configuration X that minimizes the energy: where λD specifies the relative importance of the data fitting energy ED with respect to the prior energy EP . From equations 2 and 3, we can see that the Data fitting energy ED measures the quality of fitting of P to X , while the Prior energy EP measures the compliance of X with the a priori expectations.

Energies formulation
2.3.1 Data fitting term Given a threshold ǫ which corresponds to an estimate of the noise in P, we define the data fitting energy ED as follows: remind n is the number of elements in P.
is the distance between p and Ci, then we define eD as follows: Any point p that falls into a cylinder medium-range neighborhood [ǫ, 3ǫ] brings a constant positive penalty ωD. Otherwise, if the point is close enough to a cylinder, it brings a negative energy which tends to its minimum (−1) as the distance between the point and its closest cylinder decreases. The points that do not meet one of these two cases do not contribute to the energy. This measurement encourages cylinders to be close to their matching points, but it also penalizes those that do not fulfill an emptiness constraint in their medium-range neighborhood.

Prior term
The prior density quantifies the relevancy of X regardless of P. It induces a probability over the whole configuration space, and thus defines the relevancy of any configuration based on a priori criteria. The two main criteria we introduce in the reconstruction process are the following: a configuration is likely if its elements connect well one to each other, and if it "looks like" the initial estimate M0 up to some specified tolerances σA, σR and σC . Therefore it is quite natural to split the prior energy EP into two separated terms: a Topological term ET that handles objects connectivity, and a Geometrical term EG that handles the similarity with M0: where the factors λT and λG modulate the relative importance of each term.
We define EG element-wise: each cylinder of X is compared with its most likely match in M0 and brings the corresponding energy: where eG splits into three terms (one for each of the main cylinder parameters): The angular term e A G decreases as the axes directions align: The radius term e R G decreases as the radii tend one to the other: The position term e C G decreases as Ci position gets closer to C j axis: In the above expressions, "·", "×" and " " respectively denote the usual scalar product, cross product and Euclidean norm in R 3 .
The topological energy ET is much more complex because the set of connections that should be encouraged and those that should be penalized is harder to define. We make the following assumptions based on expert knowledge: • the axes of two connected cylinders should "intersect", • connected cylinders should have a negligible overlapping volume, • the angle between two connected cylinders should be either right (90 • ) or flat (0 • ), • the connections involving two cylinders with the same radius are relevant, although connections between two cylinders with different radii are not forbidden (e.g. T-junctions).
Thus the energy ET does not depend on M0, and is defined as With: is the maximum between the proportion of Ci volume intersecting Cj and the proportion of Cj volume intersecting Ci. It can be grossly approximated using a Monte Carlo approach: a few hundreds of points are randomly picked in each cylinder, and we count the proportion of these points that falls into both cylinders simultaneously (the maximum of the two resulting proportions is kept). This way, overlapping cylinders can be strongly penalized using the positive constant ωT . The cylinders that do not intersect are not taken into account for the topological energy computation. When the cylinders connection seems to be relevant, we define an energy that favors connections which fulfill the remaining requirements mentioned above (axes intersections, right angles, similar radii): Again, we have three distinct terms handling several constraints. As the cylinder center cC i (resp. cC j ) and its axis direction aC i (resp. aC j ) define a 3D line, and provided a distance function d (cC i , aC i ) , cC j , aC j between two 3D lines, the axis intersection constraint is quantified by the following energy: a being an angular tolerance specified by the user.
Eventually, the radius term g R τ favors connections where the involved cylinders have the same radius, and is quite neutral when radii are different. For that purpose, we consider that the radii difference should not be much greater than the ǫ parameter introduced in section 2.3.1: The terms g L τ and g A τ define hardcore constraints: when one is not fulfilled, the corresponding energy tends to its upper bound (3), and the sum gτ is turned into a positive penalty because each of the two other terms is greater than -1. This explains why we use the values 3 and 4 in Equations 16 and 17, although they could be substituted for an other pair (γ, γ + 1), as long as γ > 2.

Greedy optimization
We have to generate some configuration X that minimizes the energy E (cf. Section 2.2), but we do not precisely know beforehand how many cylinders there are in X . Usual numerical optimization tools can hardly handle such a task. In (Descombes et al., 2008, Lafarge et al., 2010, the authors solve similar problems by using approaches based on time reversible Markov chains in a simulated annealing scheme. Indeed, the configurations space can be explored by randomly jumping from a configuration to a new one, each jump being the insertion or the removal of a random element ("birth and death" approach), and randomly accepting these transitions depending on the energy evolution. This process stands for a reversible Markov chain whose stationary distribution corresponds to the energy to be minimized, which ensures the convergence of the process to the global optimum.
As far as we are concerned, we think that the generation of an element (cylinder) should be driven by the available knowledge M0 (cf. Section 3.2). The blind generation of a cylinder by uniformly sampling its parameters would cause the resolution to take a prohibitive time, since such a cylinder is unlikely to optimize the energy. But by introducing a bias in the birth kernel, we do not ensure the time reversibility of the resulting Markov chain. Hence there is no guarantee about the process convergence. Moreover, the simulated annealing algorithms rely on a cooling schedule whose role is critical in the convergence, and which requires the user to specify additional parameters, which is really difficult for someone who has not a deep knowledge of it.
Algorithm 1 Greedy energy minimization algorithm Require: P: point cloud to be reconstructed M0: 3D model, gross estimate of P Ensure: X fits to P, "looks like" M0 and is consistent 1: X ← ∅ 2: repeat 3: for all C i ∈ M0 do 4: Randomly build a cylinder C from P using C i 5: compute the set K of cylinders in X that intersect C 7: sort K in decreasing eT (C, Cj ∈ K) order 8: for all Cj ∈ K do 10: end for 19: until X has not been changed Therefore instead of using a probabilistic minimization algorithm, we propose a greedy minimization method (Algorithm 1) based on the stochastic exploration of the solution space : the decision whether a transition should be accepted or not is taken in a deterministic way, so as to minimize E at each step. When a new cylinder C is generated (cf. Section 3.2), we try to insert it in the configuration X . This insertion may increase the energy although C may be a perfect candidate, due to conflicts with some cylinders in X . Therefore before taking any decision, we have International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXVIII-5/W12, 2011ISPRS Calgary 2011Workshop, 29-31 August 2011 to resolve these conflicts. Let K ⊂ X be the set of cylinders intersecting C. We have to find the subset K * ⊂ K that minimizes E(X − K * ∪ {C} , P, M0), i.e. the optimal removal of conflicting cylinders. Theoretically, we should test all the parts of K to find K * . To avoid combinatorial explosion, we use a greedy heuristic: the cylinders in K whose topological energy eT against C is high are removed first. By removing the conflicting cylinders in this order, we can keep track of the set which gets to the smallest energy in a time which is proportional to |K * | instead of 2 |K * | . Finally, once the conflicts have been resolved, the cylinder is accepted only if it decreases the energy.

Cylinders generation
In our case, an estimate of the solution we are searching for is available: this a priori knowledge M0 can be used to generate relevant cylinders from the point cloud, and thus increase the convergence rate of the optimization process. Given a cylinder C i ∈ M0, we can build a candidate cylinder using a RANSAClike algorithm: 1. use the a priori cylinder C i to locate the area where the candidate is going to be generated: (a) take a first point q 0 close to C i , using a rejection test: randomly pick points until the current one is close enough to C i and its normal is relevant with respect to C i , (b) shift q 0 along its normal vector n: c ← q 0 ± r C i .n.
Then take the spherical neighborhood of c with a radius proportional to r C i . We use KD-Trees to handle such queries, 2. build a candidate from a non-minimal points set: (a) among the selected neighbors, choose κ that are likely to lie on the same surface as q 0 . This likelihood can be estimated using a computation of the local cylinder axis based on the Gaussian map (Rabbani and Van Den Heuvel, 2005), then projecting the points into a plane orthogonal to this axis, and performing a circle detection among projected neighbors (using a RANSAC algorithm for instance). The probability for each neighbor to be picked is then computed based on the distance of its projection to the detected circle with respect to the expected noise ǫ. (b) build the cylinder C that minimizes the mean square distance to the κ selected points. The length of this cylinder is computed using its ǫ-inliers in P (points whose distance to C is smaller than ǫ).
This algorithm randomly generates relevant cylinders that tend to "look like" the input cylinder C i (thanks to step 1), and fit well to P (step 2). Indeed, we use C i parameters to drive the points selection step. And since the point selection process is reliable, we can build cylinders from many points (we use κ = 20), while usual RANSAC algorithms use minimal sets of points (2 to 7). It makes the generation step more precise and robust against noise and normals uncertainties. Thus, the probability for a cylinder to be generated increases as its energies ED and EG decrease. Moreover, the set of cylinders that can be built with this method is finite and countable, and its cardinality is bounded by the number of parts of P having κ elements. In particular, the cylinders that do not minimize the mean square distance to κ points of P have a zero probability to be generated. So the proposed method defines a probability triple over the cylinders space, with a finite countable set of events having a non-zero probability, and it can be used for the efficient exploration of the solution space in order to solve the Bayesian problem stated in this paper.

A few words about the parameters setting
We have introduced several parameters in the previous sections. Some of them have to be specified by the user, since they bring knowledge about the problem that could hardly be computed. Thus ǫ specifying the noise in P, σA, σR and σC specifying uncertainties about the initial estimate M0, and the weighting parameters λD, λG and λT have to be set by the user at the beginning of the algorithm. However, it is difficult to set the energies weights λD, λG and λT , and some approach that automatically estimates these parameters would be helpful.
Some other parameters can either be guessed based on assumptions, or can be set regardless of the input data P and M0. For instance, a topological tolerance τ = 0.1 can be used : in any case, the distance between connected cylinders axes should not exceed a tenth of the smaller cylinder radius (cf. equation 16), and a = 5 • should also be convenient for most cases (cf. equation 17). Assuming that at most 10% of any cylinder inliers may lie into the medium-range neighborhood, we propose to set ωD = 12. Finally, setting ωT = 4 enables the removal of overlapping cylinders even if they have up to 4 perfect connections with some other cylinders, which should be sufficient in many scenes (otherwise the greater ωT , the better).  Figure 2: Evolution of the energies during the reconstruction process, for the dataset presented in Figure 1. Here, we use the weighting coefficients λD = 0.1, λT = 0.7 and λG = 0.2.

Results on industrial scenes
We present the results that we got on an industrial dataset shown in Figure 1, in order to prove the ability of our method to generate consistent model from LIDAR data. We use an a priori 3D model which is roughly registered with the point cloud (the greatest registration errors are about 10 cm), and which is different from the 3D point cloud: some pipes are missing, and other ones are significantly modified (reoriented and shifted). Therefore we use permissive change tolerances: σA = 30 • , σR = 0.2 and σC = 3.5 m. The reconstructed configuration contains pipes that were initially missing, as well as those that were displaced. Moreover, we can see that it is consistent: its cylinders are well connected, and there are no undesirable overlaps. It also fits quite well to the point cloud. Another interesting aspect of this algorithm is that it enables the reconstruction of a large variety of cylinders: we observe that it detects cylinders of radius 10 mm as well as a cylinder of radius 2300 mm, although the detection of the smallest elements is not very precise because their radii are smaller than the specified noise (here, we use ǫ=30 mm).
The evolution of the energies during the resolution process is shown in Figure 2. It takes about 140000 iterations (by iteration, We also compare our approach with a reference RANSAC algorithm (Schnabel et al., 2007) which detects tori, cylinders, planes and spheres without using any prior knowledge, and which is faster than ours (about 30 seconds to perform the detection). The segmented cylinders are shown in Figure 1d, but we do not consider the other shapes handled by the RANSAC algorithm. We can see by comparing the segmentation results that our approach manages to correctly detect the biggest cylinders, whereas the RANSAC algorithm detects several pieces of overlapping shapes (many colours standing for a single cylinder), especially for the cylinder at the center of the scene. The following table shows a few comparisons between the RANSAC method (we only consider cylinders) and our approach. We can see that our method works better in fitting the cylinders to their points (second line), as well as in avoiding conflicts between shapes (fourth line) and optimizing the distribution of points on the shapes surfaces (third line). Finally, we have run our point cloud reconstruction approach with a part of the a priori 3D model ( Figure 3). As expected, the algorithm manages to find only the parts that match the incomplete input CAD model. Figure 3: Reconstruction from a partial 3D model. As it searches for a model that is similar to the a priori (green), our approach only retrieves the appropriated parts (grey) in the point cloud (red). We use the same parameters as for the scene in Figure 1.

CONCLUSION
We have stated the reconstruction problem as the search of a 3D CAD model that fulfills several requirements, among which some are based on a priori knowledge. We have more particularly focused our approach on the use of an existing 3D CAD model which roughly describes the scene we aim at reconstructing. This a priori knowledge is used to improve the reliability of reconstructed models. Indeed, the existing reconstruction techniques mostly deal with the quality of fitting to the input point cloud, whereas the Bayesian formulation that we propose allows us to embed higher level constraints, such as the consistency of the resulting model or the compliance with the a priori CAD model. Moreover, we use the a priori CAD model to focus the reconstruction on parts of the scene that we know to be relevant.
The posterior probability that we have proposed define the three requirements we want the reconstructed model to fulfill. Besides, we have proposed a new approach which searches for the optimal model with respect to this probability. This approach has shown good results on the industrial datasets that we have tested so far: the resolution method actually provides solutions that have high probabilities (low energy), and the resulting CAD models fit well to the point clouds, their shapes are well connected and each shape actually matches one from the a priori CAD models.
To conclude, we think that our approach could be extended to other kind of shapes, such as planes. But handling the components interactions gets much more complex as the number of shape types increases.