LARGE SCALE 3D POINT CLOUD MODELING FROM CAD DATABASE IN COMPLEX INDUSTRIAL ENVIRONMENTS

Abstract. In this paper, we present a method to build Computer Aided Design (CAD) representations of dense 3D point cloud scenes by queries in a large CAD model database. This method is applied to real world industrial scenes for infrastructure modeling. The proposed method firstly relies on a region growing algorithm based on novel edge detection method. This algorithm is able to produce geometrically coherent regions which can be agglomerated in order to extract the objects of interest of an industrial environment. Each segment is then processed to compute relevant keypoints and multi-scale features in order to be compared to all CAD models from the database. The best fitting model is estimated together with the rigid six degree of freedom (6 DOF) transformation for positioning the CAD model on the 3D scene. The proposed novel keypoints extractor achieves robust and repeatable results that captures both thin geometrical details and global shape of objects. Our new multi-scale descriptor stacks geometrical information around each keypoint at short and long range, allowing non-ambiguous matching for object recognition and positioning. We illustrate the efficiency of our method in a real-world application on 3D segmentation and modeling of electrical substations.



Introduction
Nowadays, 3D LiDAR technology is widely used for mapping complex industrial environments such as power stations, electrical substations, manufacturing plants, pipeline networks, among others. LiDAR produces massive data with high precision which can be exploited for offsite inspection, measurement, assets management and virtual visits. LiDAR acquisition systems are constantly evolving, allowing faster, more accurate and more dense scans at increasingly lower prices. However, the processing of such massive stream of data is the bottleneck that prevents the expand of this technology. Automatic processing of 3D data is still an open problem in the scientific community. In fact, several current industrial applications are based on manual or semi-automatic processing. A typical application consists in creating a 3D model of an industrial plant from 3D LiDAR point clouds [17,7].
3D models of complex industrial environments are great information sources to improve site security and optimize industrial functioning. The creation of 3D models is generally carried out by specialized designers based on 2D plans, photos and 3D measurements of the industrial environment. In the case of electrical substations, access is dangerous due to electrical shock risk. Thus, a remote sensing technology such as LiDAR is an appropriate candidate to perform those 3D measurements. Moreover, several manufacturers provide precise 3D CAD models of their industrial equipment (insulators, electrical transformers, etc.), then the modeling processing can be defined as the process of creating a 3D virtual environment where CAD objects from a catalogue are positioned exactly as in the real world. In this context, 3D data offers great precision landmarks where modeling processing can be based on.
The main drawback of this pipeline is that 3D data manipulation and interpretation may not be trivial. Visual object identification and manual positioning may be painfully time-consuming and expensive. The use of automatic methods to process 3D point clouds will make possible the creation of 3D models in large scale applications.
This paper focuses on the automatic modeling of electrical substations based on a twofold framework: i) electrical elements are automatically segmented from the 3D point cloud (see Figure 1); ii) each segmented object is compared with every CAD model in a catalogue in order to determine its corresponding 3D model and its location in the environment (See Figure 2).  The automatic processing of 3D data from electrical substations has been previously studied in the literature [2,1,3,21]. These approaches are based on segmentation followed by model retrieval, use prior knowledge and exploit the repetitive patterns of the environment.
Modeling this kind of industrial environments is very interesting from both the application and the technical point of view. On one hand, from an application point of view, 3D modeling of electrical substations is fundamental to perform tasks such as asset management, inventory, virtual training, emergency planing, preventive maintenance, renovation planing, among others. On the other hand, from a technical point of view, this use case presents very interesting challenges such as: i) contrary to simulated environments, 3D point clouds from complex industrial environments present irregular local shapes, anisotropic density, occlusions and noise [11]; ii) since electrical substations are composed of a few number of element categories (e.g. disconnectors, transformers, circuit breakers, insulators and wires) with low intra-category variability, object matching may be ambiguous; iii) since many electrical elements are nearly symmetric with respect to a vertical plane, correct positioning is a challenging task.
Our proposed method handles all these issues. Even if we chose to focus on the specific use case of electrical substations, the proposed method is highly generic and could be applied to a wide range of applications. Our contribution is two-fold: • We design a novel segmentation algorithm based on a region growing process and edge detection. This method is proven to be robust to 3D point clouds of complex environments; • We propose a matching method for recognizing and positioning CAD models onto 3D point clouds. This method is based on 3D features estimation on both segmented objects and CAD models.
The pipeline of our method is illustrated in Figure 3.

Region growing
Over-segmentation correction  The rest of the paper is organized as follows. Section 2 presents a revision of the state-of-the art. Section 3 explains our two-steps method for modeling an industrial environment from a 3D point cloud and a catalogue of CAD models. Section 4 shows our experiments and results. Finally, Section 5 presents conclusions and perspectives.

Related work
Manual segmentation of 3D LiDAR data is not only expensive and time-consuming but also non repeatable in terms of quality since it depends on the fatigue of operators. Therefore, developing automatic segmentation methods is an active field in the scientific community. Several segmentation methods have been proposed using graph-cuts [6], image processing techniques [15], multi-scale and dimensionality features [8,20], deep learning [18], among others. In the case of electrical substations, [2] proposed a segmentation method based on the principal direction of points' distribution. This is done by forming and analyzing 9 different directions in 3D space. The assumption is that electrical components from a substation environment are made of vertical or horizontal parts and are mostly straight. While this observation can be made for most electrical components, it does not hold for cables (catenary shape) and very small parts (insulator rings). [21] proposed a reconstruction method of a sub-electrical station based on the detection of regularities of the scene. An electrical substation can indeed be considered as a collection of multiple instances of a few classes of vertical objects on a regular grid. While demonstrating promising results on the reconstruction of these environments, this methods can fail when disparities appears, such occlusions or some unexpected variations (additional cables, boxes, open/closed insulators). It also fails at objects of arbitrary orientations. In this work, we make the simpler assumption that the objects are made of geometrically simple parts, and disregards the orientation of those parts in space. We believe that this approach will not only be more robust to variation of the entities of the electrical substation, but also it could be more easily adapted to a variety of industrial environments. The goal of this proposed context-agnostic method is providing a collection of geometrically simple objects which can be easily clustered with some prior knowledge of the nature of the scene.
In the literature we find keypoint extractors that assume implicitly that the processed object is generally smooth and presents some protuberant areas, characteristic of the object that should be detected as key zones. Such processes are efficient for most common objects but fail as soon as the query object is everywhere irregular and salient, such as industrial pieces. [5] and [13] introduced novel local descriptors that are used both for matching purposes and feature point extraction. In these methods, feature points are selected as points whose feature is rare with respect to all the features computed on the object. Consequently, they are not necessarily repeatable over various partial views of the same object, neither robust to segmentation errors or incomplete data. Besides, the extraction method of [5] keeps only a handful of keypoints, which shrinks ambiguity for the matching step but is not suitable for detecting keypoints on irregular and noisy point clouds from real industrial acquisitions. [16] proposes a generalization of the Harris corner detection in 3D which can perform a relevant keypoint extraction on objects whose angles are rare, but is not adapted for crenellated objects such as ours. In [22], authors extract keypoints in salient zones based on a criteria of maximum ratio between the eigen values of the scatter matrix on spherical neighbourhoods around each point. This method is efficient for extracting feature points on objects that are almost everywhere smooth, but is not reliable for objects that are volumetrical at every scale such as complex objects in industrial environments.
Feature-based matching relies often on local surface description of keypoints. The main assumption is that keypoints are located on regions which are locally unique over the objects. In the industrial context, key zones are rarely unique since industrial object have often planar or axial symmetries. [19] proposes the SHOT descriptor based on histogram features for local surface description. [13] introduced the FPFH descriptor for gathering angular information within a spherical neighbourhood, but even if keypoints are well positioned and descriptor are powerful, a local description would not solve reliably the orientation problems inherent to the geometry of industrial pieces. A multi-scale solution is considered by [10] for object recognition but is still limited to local description without sufficient contextual information. All these considerations led us to present two contributions: first, a keypoint extractor able to capture key areas on industrial objects that are everywhere detailed; second, a multi-scale feature for describing these key zones both locally and globally, as chunks of an object rather than local surfaces without context.

Segmentation
In this section, we propose a generic segmentation approach based on shape analysis and region growing. Since this segmentation process produces geometrically coherent regions, usually corresponding to over-segmented objects, an agglomeration process is also proposed in order to get objects of interest in 3D point clouds of electrical substations. This agglomeration process relies on context-based knowledge.

Shape analysis in the eigenvalues domain
PCA (principal component analysis) aims at transforming a data set of a given dimension to a set of smaller dimension. PCA can be computed by the covariance method. Eigenvectors are extracted from the matrix V that diagonalizes the covariance matrix C, so that V −1 CV = D, D being a diagonal matrix that encodes the eigen values. While eigen vectors give us information about the spatial orientation of a set of points (tangents and normals), eigen values describes the overall shape (length, width and thickness). In this work, we propose edge detection and segmentation methods based on the analysis of eigenvalues. Our main assumption is that the intersection between two surfaces produces an edge which can be located by analyzing the eigenvalues variation at a local scale. Let us denote by P a set of 3D points and λ 1 , λ 2 , λ 3 the eigen values of the covariance matrix of P, sorted in ascending order. Let us consider the normalized vector, denoted byλ(P), of the following vector (λ 1 , λ 2 , λ 3 ). The magnitude of each eigenvalues depends on the scale. Thus, the normalization of this vector gives overall proportions in each dimension. Asλ(P) is normalized, the point (λ 1 , λ 2 , λ 3 ) belongs to the unit sphere. It can be transformed in polar coordinates (φ, θ, r) which can be projected into a 2D space since r = 1. Thus, φ and θ for a given set P of eigenvalues (λ 1 , λ 2 , λ 3 ) are: The 2D domain G ofλ is bounded by the triangle given by the following three top points L, P, V , as shown in Figure 4b. One can notice thatλ(P) = L would be obtained if P is a set of points evenly distributed along a line. Respectively, P would be obtained from a set of points evenly distributed along a plane and V from an evenly distributed set of 3D points. As λ 1 , λ 2 , λ 3 are sorted and their vector is normalized, the extremities of the triangle can intuitively be assessed as (1, 0, 0), . Thus, their respective projection in G are (0, π/2), (π/4, π/2) and (π/4, arctan √ 2). In this context, we can define the shape instability of a point with respect to its location inside the triangle LP V . A point with low stability will be located far from vertices L, P or V , the most unstable point being located at the circumcenter of the triangle. Those points are very interesting since they represent transition zones between different surfaces. Let us denote N p the set of points neighbouring the point p given some neighbourhood relationship. In this work, we only consider the case of radial neighbourhoods. For each point p, we computeλ(N p ). We define the shape instability δ(p) as the minimal distance in G fromλ(N p ) to L, P or V : Now, let define the most stable point s p and the less stable point i p in neighbourhood N p respectively as: The span ρ(p) of a point p is then defined as the difference between the instability of s p and i p for a given p: The span is correlated to the presence of a transition zone in the 3D space. On the one hand, Figure 4a illustrates a cylinder slightly bent at its middle at a 10 • angle. This slight transition is confirmed by a low ρ value (low dispersion of blue points) in Figure 4b. On the other hand, Figure 4c presents two cylinders intersecting at a 70 • angle. This transition is observed in the G domain in Figure 4d where ρ value is higher. In Figure 5, the span ρ of a point cloud with various objects (steel beams, cables, boxes, ground) is shown. We observe a high value of ρ when different surfaces intersect and a low value otherwise.

Region growing
We propose a region growing process based on the previously introduced definitions of shape instability and span. Region growing process starts from carefully selected seeds and propagates to adjacent points given some region membership criterion. In our work, those seeds are points whose shape instabilities are low. The membership of two adjacent points to the same region is constrained to the presence of instability in their neighbourhood. We will define this region membership criterion based on a set of points called edges, whose shape instabilities are high. Thus, we give the following definitions of seeds and edges in our region growing process. We define the set of seeds S as the set of stable points whose span is lower than a given threshold t 0 : Then, we define the set of edges E as the set of unstable points whose shape instability is greater than a given threshold t 1 : Our region membership criterion is based on the presence of edges. We will say that two adjacent points are from the same region if both points are closer to each others than to any edge point. Our proposed method is similar to image segmentation methods based on region growing. It is an iterative process and the goal is to partition a point cloud into regions {R}. At a given iteration, a seed s is randomly selected and a new region R = {s} is grown. R is grown from each neighbour of points from R that satisfy the proposed region membership criterion. A new seed is selected and a new region is grown when no more points are added to R.
The results of this region growing method are illustrated in Figure 6. In Figure 6a, one can observe that our process selects seeds from regions with no prominent features. Oppositely, edge points are all picked from regions of intersecting objects. In Figure 6b, the result of the region growing is shown and on can see the resulting regions are geometrically coherent, despite having a slow curve (as cables) or small shape disturbances (due to miscalibration of the acquisition).
(a) edges and seeds (b) over-segmentation Figure 6. Point selected from the process described in section 3.1.2 in (a). Seeds are shown in green, while edges are shown in red. Over-segmentation generated by the proposed region growing algorithm in (b), colourized with random colours. Objects correspond to geometrically coherent regions. One can see that cables, ground patches, poles and objects of primitive forms are correctly labelled. More complex structures such as steel beam are regrouped into one object.

Over-segmentation correction
The previous region growing process produces geometrically coherent regions. In general, these regions correspond to over-segmented objects since it is only based on shape criteria. Depending on the application, an agglomeration process is needed in order to get objects of interest. In the case of electrical substations, this agglomeration process is straightforward. First, the ground is usually flat and can be extracted as the largest connected component from the region growing result. Second, cables are the most elongated objects of the over-segmentation. They can be detected by the averaged local linearity measure over the object. A linearity threshold, set by visual inspection, is then used. Finally, we suppose that the remaining points are from electrical elements, as substations are rarely occupied with any other type of object. In a substation environment, electrical elements are always a few meters apart in order to prevent short circuit. Once ground and cables are removed, a simple neighbourhood search with a carefully selected radius is able to reconstruct whole electric elements.

Feature based matching
The results of the latter segmentation method provides object hypothesis that may have a CAD counterpart within a catalogue of modeled object expected to be found in the scene. Our work brings a twofold contribution for achieving feature-based matching between these segments extracted from real acquisition data and reconstructed models. We firstly designed a generic keypoint extractor that is suited for dealing with shape auto-similarity and captures precisely the structural details that are characteristic of the object's orientation. Secondly, we introduce a novel multi-scale descriptor, aimed at describing the local neighbourhood of each keypoint as well as gathering information about its global localization over the whole object in an efficient way. In order to have a unified method for extracting keypoints and features on both CAD models and 3D point clouds, we work on point cloud versions of each CAD model in the catalogue obtained by uniform sub-sampling over the faces of the mesh. Sampling resolution is roughly the same as the 3D scene. Once features are computed on both CAD models and 3D scene segments, we perform a brute-force matching and estimate the 6 DOF transformation that best fits the CAD models on each segment and compute a score for identifying the best corresponding CAD model.

Keypoints extraction
For our use case, keypoints must be located in structural zones of the object that are uncommon and salient. Industrial pieces have details everywhere, thus we designed a keypoint extractor based on a filtered dimensionnality analysis. We compute the PCA of a spherical neighbourhood N i around each point of the query object at a characteristic radius r σ of the object's geometry, and compute the a 3D coefficient introduced in [4], that measures the saliency σ i of the query point p i where λ 1 ≥ λ 2 ≥ λ 3 ≥ 0 are the eigen values of the covariance matrix. Note that the eigen values of the neighbourhood characterize the expansion of the point cloud along its principal axis; thus, the greater the σ i , the more volumetric the neighbourhood. Since key zones on objects are mainly volumetric (such as corners, junctions and protuberances), relevant keypoints rely in zones of high saliency. On Figure 7 we show a specific part of an industrial piece with the saliency as scalar field. On that Figure, areas 1 and 2 would be relevant to extract as they are structural and characteristic of the object's orientation, whereas area 3 is fully auto-similar and would lead to ambiguity in the object's orientation in later processes. On such a complex object with milling and bars, the more salient points are not necessarily on key areas, thus applying a rigid threshold for extracting relevant zones would fail. This is shown on Figure 7 where no trade-off is affordable for keeping 1 and 2 only. However, points that are locally excessively salient compared to their neighbours are structural, even if they are not absolutely salient. Therefore, for each point p i , we build a distribution on the saliency of neighbouring points and estimate the average saliency and standard deviation in N ī We extract the query point p i as a keypoint ifσ ī σ i > ρ σ where ρ σ is a rigid threshold (typically 5% is sufficient) for avoiding picking points on homogeneously salient zones such as zone 3 on Figure 7, and if σ i >σ i +σ i . Thus, points in structural zones of objects such as junctions, corners and protuberances are captured in patches. Rather than searching for isolated and very accurately located keypoints such as [5], we keep dense patches of key zones, which is much more reliable for real and noisy data. A qualitative comparison with other keypoint extractors is presented in Section 4.2. The second step of the keypoints extraction consists in selecting points sparsely and regularly on the shape. This step is not compulsory but it can help aligning coarsely the point clouds for the further RANSAC registration, while the key patches extracted with our method contribute to orienting correctly the point clouds.

Features extraction
The next step of our method consists in describing the keypoints through 3D features. The SHOT and PFH descriptors introduced by [19] and [12] respectively are efficient for shape description, but they do not capture any contextual information, and therefore are not so reliable for registering auto-similar objects such as those we can find in industrial environments. We designed a hybrid descriptor inspired by the SPFH descriptor introduced in [13] that gathers local and contextual information. To capture information present in neighbourhoods of radii {r i } 1≤i≤p of a keypoint, we firstly generate p subsampled versions {P i } 1≤i≤p of the input cloud with spatial resolution β i × r i (β i ∈ [0, 1]). The parameter β i controls the density of points in a neighbourhood of radius r i searched in P i . For instance, setting β i = 0.1 for a searching radius r i = 0.5m will result in a neighbourhood where the distance between each point is at least β i × r i = 5 cm, which is sufficiently dense to describe local geometry in a neighbourhood of radius r i . At scale k ∈ 1, p , we compute for each keypoint the Local Reference Frame introduced in [19] for circumventing the normal's orientation issue and having rotation invariant descriptors, and then compute the SPFH descriptor of a spherical neighbourhood of radius r k searched in P k . Unlike the original authors of this feature, we do not decorrelate the three computed angles for simplifying the feature's expression, because it actually brings ambiguity and reduces significantly the descriptive power of the SPFH descriptor. On Figure 8, we have connected by coloured lines the points used to compute the pair features. The green lines for instance connect the query keypoint to points of its spherical neighbourhood of radius r 3 = 2.50 m, showing that a third of the feature's information is exclusively dedicated to gathering contextual information. Repeating the operation at every desired scale, we finally stack the SPFH descriptors computed at each scale in one single feature vector. The resulting features are descriptive of the local geometry around the keypoints thanks to SPFH with short radius and also contains contextual information gathered into the histograms of long searching radius, allowing highly discriminating power and non ambiguous matching: in our use case, a local description of the keypoints, as structural as they are, would not be completely reliable because of the auto-similarity aspect of the objects.

Feature matching and pose estimation
Going through each segment, we match its features with those of each CAD model by brute-force matching using χ 2 distance, better fitted for histogram comparison than L 2 distance (see Figure 9a), using a weighting on the different scales. If the feature f i = (f 1 i , ..f N i ) was computed over N scales, we compute the weighted distance with f j as follow : for giving priority to local description while adding long range matching constraints. As we kept patches of keypoints, we cannot filter the matches as in [9] which excludes matches that are not significantly better than the second best match in the feature space, under the assumption that features are unique. Instead, we only keep matches with distance to the nearest neighbour lower than µ nn + σ nn where µ nn and σ nn are the average distance to the nearest neighbour and standard deviation on the object. We estimate the 6 DOF transformation that best fits the query CAD model on the segment by RANSAC (Figure 9b) on the pairs from the matching step and finally apply a point-to-plane ICP registration (Figure 9c). The overall rigid transformation T CAD is applied to the CAD object. The object shown is fully symmetric except on the area zoomed on Figure 9d : our keypoint extractor caught this very specific zone allowing the right pose estimation. We compute the Figure 9. Steps for estimation the rigid transformation between a CAD's point cloud (red) and a 3D scene segment (blue) by feature matching and RANSAC. On this example the keypoints were extracted only with our saliency analysis.
intersection over union (IoU ) as our overlap measure : for each point of the 3D segment P T LS , we search for its nearest neighbour in the CAD cloud P CAD within a distance r overlap . The intersection I is defined as the number of point having at least one neighbour, the union U is defined by U = |P CAD | + |P T LS | − I, and the overlap is given by IoU = I U . The parameter r overlap defines the tolerance on the IoU . We set a rigid threshold on the minimum IoU for being a good match, because some segments may not have their CAD counterpart in the catalogue and we do not want to assign them to any 3D model. The CAD model that maximizes the IoU is considered as the right model and is positioned in the scene by applying the corresponding T CAD .

Evaluation of the segmentation
We compare our method with two region growing algorithms. We aim at demonstrating the advantages of our approach based on eigenvalues compared to approaches based on eigenvectors and dimensionality features as defined in [4]. In both cases a PCA is computed at a local scale on each point using a radial neighbourhood. The first region growing method exploits the tangent, the main component of the eigen vectors of this PCA. Seeds are selected from the most linear points. Two adjacent points are from the same region if the angle between their corresponding tangents are smaller than a given threshold. The second region growing method is similar to the first but the linearity is replaced by the planarity for the seeds selection, and the tangent by the normal. We try by visual inspection of the results, to set the threshold of the previous region growing methods as to obtain the best possible examples. As for our method, we have chosen the thresholds values for the seeds and edges as defined in section 3.1.2 to t 0 = 0.25 and t 1 = 0.48. We obtain the results of Figure 10. The linear region growing in Figure 10a is able to extract every cable and two insulators chains but performs poorly on electrical elements. It is also unable to extract the ground. The planar region-growing in Figure 10b performs poorly overall, as there is no plane except for the ground. The ground and the bigger structure are not split as expected, as the edge between the two features a low slope. Selecting a different radius for the radial neighbourhoods may hold different results. Only our method in Figure 10c is able to extract all cables. Electrical structures are over-segmented, but their parts are geometrically coherent as expected. The chains of rings at the top are slightly over-segmented, but it is a complex structure and the two top chains are badly acquired in this particular case. In Figure 10d, the result of the over-segmentation correction is shown, each object colourized with a random colour. It can be compared to the ground truth in Figure 10e. The ground is correctly extracted, along with all the cables. Each electrical objects is correctly extracted along with the three bigger structures. The scanner positions have been extracted as circular planar regions on ground. While this might be problematic in other applications, it does not disrupt the object matching process as it does not resemble to any other entity in the catalogue.

Qualitative keypoint extractors comparison
For comparing our keypoint extraction method to well known methods, we used the [14] PCL implementation of the [22] ISS and [16] Harris3D keypoint extractors with the most adapted parameters, found qualitatively by trial and error. We also implemented the persistent feature point extraction of [12] based on prior FPFH estimation. Results are shown on Figure 11. For each experiment, the same parameters where applied for extracting keypoints on all point clouds. In our use case, the Harris3D extractor detects keypoints on different zones, depending on the noise level and of the cloud density. The ISS extractor catches salient points everywhere on the structure, so it is nearly equivalent to a uniform sampling for such objects. The persistent FPFH points are relevant on the CAD object but are much disturbed by real measurement noise such as on object TLS 1, and the keypoints are not repeatable on the "crowned" object which was sub-segmented in TLS 2. Consequently they are not necessarily located on same areas on TLS segments and CAD objects, leading to inaccurate matching. Finally, our keypoint extractor is robust to noise, under-and over-segmentation, and offers the best located keypoints reliably: we capture each area that is relevant for feature matching with high repeatability whether the piece is extracted from the 3D scene or the CAD model.

Feature-based matching evaluation
We dispose of a ground truth on an electrical substation near to Castellane in France. Each CAD model from a catalogue was manually positioned on the 3D scene. Thus we can evaluate the recognition rate as well as the accuracy of the positioning on the 6 DOF transformation. For these experimentation, we chose to extract salient keypoints with a radius r σ = 0.30m, completed with a coarse sampling of radius 15 cm. We describe keypoints with two radii r ∈ {0.30m 2.0m} for covering short and long range, given that the object's dimensions are roughly 1m × 1m × 5m. Each SPFH descriptor is built with 5 3 bins, resulting in a 250 dimensional stacked feature vector. In our experiments, the catalogue includes 3 CAD objects, and 81 segments where identified on the scene. The threshold on IoU is set to 40%. For evaluating quantitatively the accuracy of our positioning method, we compare the computed transformations {T i } to the ground truth {T i } by evaluating log(T −1 iT i ) = (dω, du) T ∈ R 6 for each object, where dω and du are respectively the errors vectors in rotation and translation, and we compute the translation error ζ t = du 2 and rotation error ζ ω = dω 2 . For dealing with axial symmetry, we withdraw π to angular error greater than 90 o but we count the number of flipped pieces. We have confronted our method for this use case to the well-known local methods FPFH [13], SHOT [19], and to a single scale SPFH feature. We show the betterment of our keypoint extraction method by carrying the matching experiments with both uniform sampling (denoted by u on Table 1 ) of radius r = 15 cm, and with our keypoints, for the same description methods. We compare the recognition rate τ , maximum errors in translation ζ m t and rotation ζ m ω and the percentage of flipped pieces τ f lipped . Runtimes are between 10 and 20 minutes for all experiments, which is not limiting for this use case. In particular, the multi-scaling method does not induce significant runtime increase.
As presented in Table 1, our keypoint selection method increases with no ambiguity the number of rightly oriented pieces. Moreover, our method increases for most description method the recognition rate. We see that our SPFH implementation outperforms the PCL FPFH descriptor because, as mentioned in Section 3.2.2, we do not decorrelate the computed angles. Our method proves to be the most reliable for object recognition, as we reach 100% recognition rate, and have the lowest errors in translation. Indeed, our features are more efficient for describing keypoints as being part of an object rather than the local descriptor chosen, thus global shapes are better matched. The maximum errors in rotation for SHOT and FPFH are due to the fact that the overlap reached was sufficient for matching segments that were incomplete because of data inherent issues, but local description failed at managing nearly axial symmetry. With our method, incomplete pieces are at worst flipped on their quasi-symmetry plan, but no significant rotation error occurs. Our experiments show that allying a simple feature such as SPFH with a multi-scale method and an accurate keypoint extraction can compete state of the art description methods such as SHOT. Figure 12. On top, the 3D scene where the ground (green), the wires (yellow) and other objects (purple) were segmented. On bottom, objects having their CAD counterpart in the catalogue were replaced by our modeling method.

Conclusions
In this work, we proposed a general framework for modeling complex industrial environments from 3D point clouds and a catalogue of 3D CAD models. Our contribution is two-fold. The first step consists in a robust segmentation of all objects within the 3D point cloud. This segmentation method is based on a shape analysis in the eigenvalues domain. Segmentation result is a set of geometrically coherent regions which can be agglomerated according to some application-dependent prior knowledge. The second step queries the catalogue and evaluates the best correspondences between segmented objects and CAD models. For this purpose, salient key areas are determined and described by a multi-scale feature on both CAD models and 3D segments. This multi-scale descriptor gathers local and contextual information, which allows managing the high degree of auto-similarity of industrial pieces. Extracted features are thus matched and the best fitting transformation is estimated for positioning automatically the corresponding CAD model directly on the 3D scene, as it is shown in Figure 12.
Although we presented the specific use case of electrical substations, the proposed method is highly generic and could be applied to a wide range of applications. Our multi-scale method gathers information at short and long range from any geometric descriptor, insuring robust registration for complex objects. In the future, we plan to use more efficient local descriptors and decline them into multi-scale descriptors for reaching even more robust results on larger datasets. Other use cases such as pipelines and indoor industrial environments will be studied.