UNSUPERVISED OBJECT-BASED CLUSTERING IN SUPPORT OF SUPERVISED POINT-BASED 3D POINT CLOUD CLASSIFICATION

: The number of approaches available for semantic segmentation of point clouds has grown exponentially in recent years. The availability of numerous annotated datasets has resulted in the emergence of deep learning approaches with increasingly promising outcomes. Even if successful, the implementation of such algorithms requires operators with a high level of expertise, large quantities of annotated data and high-performance computers. On the contrary, the purpose of this study is to develop a fast, light and user-friendly classification approach valid from urban to indoor or heritage scenarios. To this aim, an unsupervised object-based clustering approach is used to assist and improve a feature-based classification approach based on a standard machine learning predictive model. Results achieved over four different large scenarios demonstrate the possibility to develop a reliable, accurate and flexible approach based on a limited number of features and very few annotated data.


INTRODUCTION
The current 3D research activities -which extends to various domains and applications -are dominated by classification endeavours. The semantic segmentation (or more commonly classification) task is a significant challenge for 3D unstructured datasets acquired with active or passive sensors (Xie et al., 2020). Recognising elements composing a scene is a crucial step, especially for Digital Twins (Stojanovic et al., 2018), Smart Cities (Nys et al., 2020) and Building Information Modelling (Bassier et al., 2017). In this context, there is a great demand for automated processes that can speed-up and improve the reliability of existing classification frameworks. However, we can safely say that there are no reliable and generalised methods for all the different scales and scenarios one can encounter. A classification method can hardly fulfil all domains since the semantic definitions attached to objects can differ depending on the domain. Most semantic segmentation methods aim to advance performances in one specific context, such as indoor structural recognition (ceiling, wall, floors, chair, etc.) (Dai et al., 2017;Stojanovic et al., 2019) or outdoor classification (street, building, car, vegetation, etc.) (Özdemir et al., 2019;Hu et al., 2020). This is also because modern approaches mainly rely on supervised methods based on neural networks (Guo et al., 2020), which necessitate annotated context-specific datasets such as the ones provided by Armeni et al. (2017) and Tan et al. (2020). These approaches are often fully supervised, rarely unsupervised and require a high level of expertise on top of high computing resource demands.

Aim and structure of the paper
This paper aims at an easy-to-implement and user-friendly supervised method generalisable to several contexts and domains. To achieve this, we explore how unsupervised objectbased features (Poux and Billen, 2019;Poux and Ponciano, 2020) can help a supervised point-based classification (Grilli et al., 2019;. The goal is to combine two different classification approaches to maximise results' accuracy, minimise human efforts and deliver a 3D classification method that is case-and context-independent while usable by non-experts. Experimental results are conducted on heterogeneous datasets (Section 1.2), including multi-scale urban areas (aerial LiDAR and photogrammetry), indoor buildings (RGB-D sensor) and architectural scenarios (terrestrial photogrammetry). The achieved results demonstrate the method's reliability and replicability. In the age of deep learning, the suggested method relies on a standard machine learning algorithm (Random Forests) to achieve fast and accurate point cloud classification, using reduced annotated samples and a minimal number of automatically computed features. Thus, primary purpose of this study is not the direct comparison with state-of-the-art methods but rather a study to evaluate how to improve 3D classification results by merging features and methods used in  and Grilli et al. (2019). Following a summary of related works in Section 2, we define our methodology in Section 3. Section 4 presents the results over four heterogeneous scenarios to demonstrate the efficiency of the presented approach. Finally, in Section 5, we give some closing remarks as well as suggestions for future work.

Considered scenarios
The presented method was evaluated on the following scenarios: • Large scale urban point cloud (700 m x 700 m), derived with a hybrid aerial sensor over the city centre of Bordeaux (Toschi et al., 2021)

RELATED WORKS
The main methods for 3D classification reported in the most recent literature can be divided into two big macro-categories: machine and deep learning approaches. The feature engineering phase is one of the primary distinctions between standard machine learning methods and advanced deep learning methods.
In the first case, the operator studies and selects the features, whereas in the second case, neural networks learn features after being fed large amounts of annotated data. According to the application's purpose of this research, this section is first focused on the point cloud feature selection, then on the existing approaches based on a combination of clustering and point-based classification. Feature selection. Establishing the features to be used in the model is a critical step in the supervised classification analysis.
Most of the similar studies depend on geometric features to classify the points of a considered point cloud based on their local neighbourhood. The neighbourhood to be used can be defined using either an established radius, which can be spherical (Lee and Schenk, 2002) or cylindrical (Filin and Pfeifer, 2005) or a K number of nearest neighbours (KNN) (Linsen and Prautzsch, 2001). The sampling rate resulting from data acquisition and the items of interest influence the choice of an acceptable value (radius or K). For this reason, all these neighbourhood types have been and are still broadly explored in literature within single or multi-scale approaches (Weinmann et al., 2015). On the other, multi-scale approaches have proved to the most efficient, whether used for spherical/cylindrical neighbourhoods ( Weinmann et al. (2014Weinmann et al. ( , 2015Weinmann et al. ( , 2017a. Classification methods. The semantic segmentation methodsdenoted classification in this article-can vastly differ depending on the feature set provided as an input to the machine learning classifier. In the literature, we usually distinguish point-based classifiers (that reason from a per-point feature set) from segment-based classifier (per-segment labelling). The later usually relies on a segmentation step, where the point cloud is partitioned into subsets of points called 'segments'. In addition to neighbourhood definitions found at the point-level to achieve point-based classification, such as shown in Bremer et al. (2013), other characteristics can describe each segment to guide the process. The result is a set of internally homogeneous segments, i.e. groups of points representing the basic units for classification. In many cases, segmentation procedures aim to produce relatively small segments (over-segmentation), representing only object parts (sub-objects) rather than the final objects of interest directly. In Chehata et al. (2009) first, a supervoxel-based segmentation is used to segment point cloud data, then different machine learning algorithms are tested to label the point cloud. Luo et al. (2018) proposed a supervoxel-based classification; their method used Conditional Random Field matching to classify supervoxels. Sun et al. (2018) used a Random Forest classifier to classify point cloud based on supervoxels. Some authors rely on a region growing algorithm for segmentation of point cloud flowed by an object-based classifier such as SVM (Yang et al., 2017) or a Bagged Tree Classifier (Bassier et al., 2020). use segment-based shape analysis relying on semantic rules. This article investigates the merging of both clustering and pointbased classification to develop a user-friendly approach. A small number of similar attempts have been proposed in the literature, such as presented in Weinmann et al. (2017b) where segmentbased shape analysis relies on semantic rules. This approach motivates the gains of a "higher level" understanding of the scene translated into features that can help achieve better inference. Other works which relies on "segment features" in point-based classification frameworks can be found in Guinard et al. (2017), Landrieu et al. (2017) and Landrieu and Simonovsky (2018).

METHODOLOGY
This section first describes the different steps of our framework (Section 3.1) and then explains the features used in our combined classification approach (Section 3.2).

Framework
The combined approach presented in this study follows these main steps: • Apply unsupervised clustering, following the approach presented in  to segment the datasets. • Extract of a small set of geometric and covariance-based features which are effective for heterogeneous scenarios: the feature selection and their computation within heuristic neighbours is automatically performed to by-pass the otherwise laborious feature design process (Section 3.2). • Manually annotate a reduced portion of the point cloud, facilitated and supported by the clustering results ( Figure 1): although the datasets used in this paper contains fully annotated point clouds, the data were divided into training (30%) and test (70%) sets ( Figure 2). Our idea is that, when training is not available, the training's size should be as limited as possible and the annotation step rapid and user friendly. • Assess the achieved point-wise classification outcomes through quality metrics extracted for the entire test set: among the several metrics existing in the literature (Goutte and Gaussier, 2005), the Overall Accuracy (OA) is used to evaluate the classifier's ability to predict labels based on all observations. In addition, it is considered the F1-score, as it's a good measure of how well the classifier performs, being an average of Precision and Recall.

Feature engineering
We aim to design a reduced number of meaningful features that can be used and adapted in different scenarios for point cloud classification. These features can then be fed into standard classifiers to train machine learning predictive models. Three main categories of features are combined: a) radiometric (RGB values), b) clustering and c) geometric features. Clustering features (Figure 3). Clustering methods have two major advantages: (i) they don't use prior knowledge on discriminating variables and (ii) they find answers directly in the data. This allows exploring fed variables and highlight unsuspected (or suspected) relationships. The clustering features are computed following an unsupervised scheme, where the point cloud is partitioned into subsets of neighbouring points called segments Poux and Ponciano, 2020;Bassier et al., 2020). We aim at a set of internally homogeneous segments that will host the cluster features at four different aggregation levels. The procedure aims to yield relatively small segments, representing only object parts (sub-objects) rather than the final objects of interest directly, which, sorted, constitute the Cluster feature Level 1. Then adjacent segments with similar properties are merged to spatially contiguous objects (Cluster feature level 2) by considering the results of the first clustering pass instead of the initial point cloud as input. Such a step-wise procedure, based on an initial over-segmentation, permits reducing the risk of combining multiple real-world objects in one segment (undersegmentation). The principle of the approach is to limit any domain knowledge and parameter tweaking to provide a fully unsupervised clustering featuring. Finally, level 3 is based on a k-means clustering using the Hartigan's rule (Chiang and Mirkin, 2010), whereas level 4 is a graph-based centrality-measure of the clusters weighted over the first three levels.
The initial parameters involved in the definition of these multilevel clustering features are automatically extracted through an automatic heuristic determination of three RANSAC-inspired clustering parameters: • a distance threshold for neighbourhood definition (ε); • the threshold for the minimum number of points needed to form a valid planar region (τ); • the decisive criterion for adding points to a region (α). (Figure 4). Covariance features (Blomey et al., 2014), or eigen-based features, are commonly used in segmentation and classification procedures due to their ability to provide in-depth information on the geometrical layout of the reconstructed scene. The most common covariance features include (Table 1) (2)

Geometric features
Omnivariance In the same way, in this paper, we aim to identify a reduced number of geometric features that can be used in any possible environment, notwithstanding a fast computation.
The number of covariance features used in the presented approach is reduced to the only two Omnivariance and Surface Variation, chosen because of their ability to distinguish macroelements and entities (Teruggi et al., 2020). In order to demonstrate the effectiveness of the reduced selection, the classification experiments were also carried out using the entire set of features (Section 4). In addition, a height-based feature (Distance from Ground) and a normal-based one (Verticality) are considered. In particular, we have noticed that the feature Verticality is typically needed, independently from the scenario, to differentiate precisely horizontal and vertical artefacts. Directly related, the use of a height-based feature like the Distance from ground (Δz component) becomes essential to distinguish the different horizontal elements (i.e., street and roof). It has to be underlined that the selected features are extracted within a spherical radius ε (offered by the Clustering features) in a multi-scale approach. Experimentally we observed that a maximum number of 4ε was optimal for all scenarios.

EXPERIMENTS AND RESULTS
For each case study, the Random Forest classifier was trained with different combinations of features to have an internal comparison between the proposed method and the ones we combined. . Besides, all datasets were treated with and without their radiometric attributes to further test the approach's reliability in different conditions. Tables 2 shows results coming from the above-mentioned feature combinations in the four considered scenarios (Section 1.2). It can be seen that the proposed approach, which combines clustering features and a few selected geometric features, leads to an improvement in results. It is also noticeable that the combined approach performed better when radiometric features were included. Besides, we can see that the better improvements were achieved for the urban scenarios. In fact, quite similar accuracy values were reached between the standard multi-scale (Approach B) and the combined approach for the indoor and architectural datasets. However, from a qualitative point of view, classification results look much "cleaner" when cluster and geometric features are combined ( Figure 5 and 6). In addition, it has to be considered that only 17 features were used for the proposed approach, against the 73 of the standard one. For more details about the quality of the results, please check Figures 7-10, comparing hand-annotated and predicted point clouds. Finally, in Tables 3-6, all the per-class F1 scores are reported.  Table 2. Classification metrics achieved in the four scenarios using different features.

CONCLUSIONS
The paper presented a combined approach, based on clustering and covariance features, for point cloud classification based on a traditional machine learning predictor. Four heterogeneous datasets were considered, featuring different type of classes and scenarios. Experiments proved that unsupervised object-based features help supervised point-based classification. Therefore, the combined method offers reduced labelling efforts, speeds up classification processing, improves accuracy, requires low computational power and is generalisable to various scenarios, making it suitable for daily work in various fields.
As future work, we plan to compare the presented approach with other state-of-the-art methods for benchmarking purposes, including deep learning methods.