ENHANCING GEOMETRIC EDGE DETAILS IN MVS RECONSTRUCTION

: Mesh models generated by multi view stereo (MVS) algorithms often fail to represent in an adequate manner the sharp, natural edge details of the scene. The harsh depth discontinuities of edge regions are eventually a challenging task for dense reconstruction, while vertex displacement during mesh refinement frequently leads to smoothed edges that do not coincide with the fine details of the scene. Meanwhile, 3D edges have been used for scene representation, particularly man-made built environments, which are dominated by regular planar and linear structures. Indeed, 3D edge detection and matching are commonly exploited either to constrain camera pose estimation, or to generate an abstract representation of the most salient parts of the scene, and even to support mesh reconstruction. In this work, we attempt to jointly use 3D edge extraction and MVS mesh generation to promote edge detail preservation in the final result. Salient 3D edges of the scene are reconstructed with state-of-the-art algorithms and integrated in the dense point cloud to be further used in order to support the mesh triangulation step. Experimental results on benchmark dataset sequences using metric and appearance-based measures are performed in order to evaluate our hypothesis.


INTRODUCTION
For a given set of images with known orientation parameters (poses), typically as the output of the SfM, multiview stereo methods (MVS) generate a 3D dense point cloud, a triangulated mesh or a volume. Among the various MVS methods for dense reconstruction, depth map merging methods are commonly used in photogrammetry because of their accuracy and scalability. Semi-global matching (Hirschmuller, 2007) and PatchMatch (Bleyer et al., 2011) are within the most widespread methods for depth estimation and dense cloud generation due to their robustness, even though also learning methods have lately been popular (Huang et al., 2018). Mesh representations, on the other hand, can be either generated as part of the photogrammetric workflow or as a standalone task (Nocerino et al., 2020). The first method keeps the photometric consistency criterion active also while wrapping the surface and providing thus a more refined mesh, while the second generates the optimal surface mesh out of a (typically dense) point cloud using i.e. Delaunay Triangulation or Poisson Surface Reconstruction (Kazhdan et al., 2006). Although such algorithms are mature enough with impressive results, several challenges still exist towards the complete, accurate and detail-preserving 3D reconstruction of scenes. Inadequate image network setups, challenging objects such as textureless or reflective surfaces, occlusions, as well as harsh depth discontinuities may affect the quality of the final reconstruction and the level of preserved details. Crease edges on 3D meshes often do not coincide with natural edges on the object surface. Traditional photogrammetric techniques used vector constraints, the so-called breaklines, to tackle such depth discontinuities and imply geometric constraints during DSM generation (Briese, 2004). Similar to corner points, edges have been used to support various photogrammetric and computer vision tasks such as image matching (Wang et al., 2009;Wang et al., 2021), camera localization (Hirose and Saito, 2012;Salaün et al., 2017;Miraldo et al., 2018), abstract 3D scene representation (Hofer et al., 2015;, meshing sparse clouds (Bódis-Szomorú et al., 2015;Sugiura et al., 2015) as well as modelling and simplifying the scene (Langlois et al., 2019;Chen et al., 2020;Li and Nan, 2021).

Aim of the paper
3D edges are usually invariant to significant illumination changes and a robust representation of the most salient parts of the scene. In this study, we investigate whether edges can potentially support detail-preserving MVS reconstruction and generate visually coherent results. Out of the plethora of MVS mesh reconstruction methods, we consider triangulated meshes by 3D point clouds coming from depth map fusion. The scope of this article is bifold: first, two state of the art methods for 3D edge reconstruction are adopted and experimentally compared in terms of performance, usability and practical limitations. Then, we propose an approach for integrating the extracted 3D edge information into the MVS pipeline towards detailed and sharp feature preserving mesh reconstruction and evaluate the potential and the limitations of the method.

RELATED WORK
The presented work leverages edge constraints in the MVS reconstruction procedure, hence the respective literature is reviewed.
Multiple view stereo: MVS algorithms (Furukawa and Ponce, 2009) generate a complete 3D representation of the scene starting from known camera poses. Generally, methods can be point cloud-based, volume-based and mesh-based. Point cloud-based methods use photo-consistency metrics like the Sum of Squared Differences (SSD) or the Normalized Cross-Correlation (NCC) to estimate the depth of the scene pixels and generate dense 3D point clouds (Shen et al., 2013;. Such point clouds can be further converted into triangulated meshes using iso-surface extraction e.g. with Poisson (Kazhdan et al., 2006) or Delaunay Triangulation and graph-cut methods (Labatut et al., 2007). Volume-based methods (Curless and Levoy, 1996) discretize the space into voxels or tetrahedra and calculate a 3D volume from which the optimal surface will be extracted using e.g. graph-cuts (Vogiatzis et al., 2007) or the signed distance function (Newcombe et al., 2011;Werner et al., 2014). Surfaces are produced from volumetric representations using Poisson triangulation (Kazhdan et al., 2006) or the marching-cubes algorithm (Lorensen and Cline, 1987). Mesh-based methods (Vu et al., 2012) use the photo-consistency metrics to refine (i.e. remesh) an existing initial mesh, generated with volume-based methods or point cloud-based methods accompanied with mesh extraction.
In photogrammetry, typically meshes are generated by triangulating dense point clouds coming from depth map fusion methods. Point clouds are converted to meshed surfaces with Poisson triangulation (Kazhdan et al., 2006) or Delaunay Triangulation (Tola et al., 2012;Jancosek and Pajdla, 2014). Delaunay Triangulation is commonly preferred since it adapts to point density and is, thus, more scalable. The final mesh is defined as the boundary between empty and full tetrahedra, typically formulated as a graph-cut problem.
Edge extraction: Traditionally, line segments have been extensively used in the 2D space with simple gradient-based detectors like Sobel (Sobel, 1972) and LoG (Marr, 1980), to most sophisticated solutions as Canny (Canny, 1986), Rothwell (Rothwell et al., 1995), Edison (Meer and Georgescu, 2001)  3D reconstruction and linear segments: Line segments have been used in image registration tasks in photogrammetry already for a long time (Baillard et al., 1999). In the latest years, linear segment matching has been used in pairwise image matching (Wang et al., 2009;Zhang and Koch, 2014), as well as in SfM (Bertoli and Sturm, 2006;Micusik and Wildenauer, 2018) and SLAM algorithms (Hirose and Saito, 2012;Salaün et al., 2017;Zhou et al., 2019) for pose estimation and mapping or 3D reconstruction purposes (Remondino and Zhang, 2006). At the same time, matched linear segments are coupled with the SfM results as a less computational expensive alternative to the MVS reconstruction as in (Hofer et al., 2015;. Sugiura et al. (2015) performed 2D line matching and reconstructed a so-called 3D "line cloud", and used these edges to extract the mesh using a tetrahedra-carving method. Romanoni and Matteucci (2015) used 3D points belonging to edges and carved a 3D Delaunay Triangulation of sparse points. Bódis-Szomorú et al. (2015) proposed an approach for large scale urban reconstruction for edge preserving mesh reconstruction enforcing the Delaunay Triangulation using CDT (Botsch et al., 2010) on a 2D base mesh. Bignoli et al. (2018) presented an approach to detect both straight and curved edges and support thus the reconstruction of 3D meshes from sparse point clouds. Other approaches combine primitives for regular parts of the scene and meshes for the irregular ones (Lafarge et al., 2010).
In this study, we consider the combination of edges and 3D reconstruction in a different fashion, as edge information is integrated in the dense point clouds to support and potentially generate more detail-preserving mesh models.
Edge extraction with Line3D++: Line3D++ (Hofer, 2016) detects and matches 2D line segments across the images and reprojects them in the 3D space, generating an abstract 3D representation of the salient parts of the scene. In more detail, camera poses and the respective sparse point cloud are given as input, obtained by conventional Structure from Motion pipelines like COLMAP (Schönberger et al., 2016a) or OpenMVG (Moulon et al., 2016). Epipolar guidance is used to establish correspondences between linear segments detected on the images. The LSD line segment detector (von Gioi et al., 2008) is used to obtain the line segments. The best hypothesis for the 3D position is selected and overlapping segments from different views are clustered together using graph-clustering to generate a 3D line-cloud. The reconstructed lines can potentially optimize also the SfM result using Ceres solver (Agarwal et al., 2021). The quality of the resulting 3D line cloud depends on several parameters ( Figure 2). First, it was proven experimentally that the denser our image network is (higher overlap), the more numerous the 3D lines are. However, the accuracy of the reconstructed 3D lines, naturally, depends on the accuracy of the SfM pose estimation and the reprojection error. Also, detected 2D edges are prone to outliers due to noise in local gradients and occlusions. Thus, inevitably, some edges fail to reconstruct and duplicates or wrongly reprojected lines are also present. These erroneous reconstructed lines may not affect the abstract representation of the scene alone (Hofer et al., 2016), yet may introduce a significant error in the meshing step. Edge extraction with EdgeGraph3D: EdgeGraph3D algorithm (Bignoli et al., 2018) uses a-priori detected edges coming from standard edge detection algorithms along with the SfM output, i.e. camera poses and sparse point cloud to subsequently project the salient edges in 3D. In contrast with Line3D++, EdgeGraph3D is able to match and reconstruct not only linear edges, but also curved ones in the form of 3D polylines ( Figure  3). This is made possible with the use of calculated 2D edge graphs for each image. Practically, to every pixel centre belonging to an edge, a node is assigned. Adjacent edge-pixel nodes are connected in polylines generating the 2D edge graphs. Based on these graphs along with the SfM data and the epipolar constraints, potential edge correspondences are defined, and further validated and reconstructed in 3D on top of the SfM points. In our experiments, we use the Edison algorithm (Meer and Georgescu, 2001) for 2D edge detection. We extended the functionality of the algorithm to export directly edge points and their visibility information without the SfM data. Our first experiments showed that both algorithms are rather sensitive to the accuracy of the pose estimation and thus often reconstruct noisy 3D lines or edge points. One important parameter that needs to be taken into account in order to optimize the 3D edge reconstruction is the visibility threshold N, i.e. from how many images a line or point needs to be visible, in order to be candidate to be reconstructed in 3D. For both algorithms by default the threshold is defined v=3, yet in our experiments we tuned the visibility threshold according to the dataset overlap (see Section 4 for more details). For instance, as shown in Figure 1, for the same visibility threshold (v=3) in Line3D++, in DTU-006, much more edge lines are reconstructed in comparison with Fountain-P11 due to the large overlap of this specific dataset. Another essential factor is the maximum accepted pixel error p for the same line or point across multiple images. By default, this error is 2.5 pixels in Line3D++ (see Section 4 for more details), while in EdgeGraph3D a slightly different approach is adopted with outlier removal based on visibility filtering, with a threshold error of 2.25 pixels. In EdgeGraph3D, points describing an edge do not strictly lie on the exact edge, but rather form a point cloud of irregular distribution and noise around the edge that can be useful for abstract scene representation. Line3D++ on the other hand, generates a clear line cloud and also has the advantage of computational efficiency. Thus, it was preferred over EdgeGraph3D although the latter detected both linear and curved segments (Figure 1).

Edge-aware mesh reconstruction
Our method uses dense point clouds and edge information in a joint fashion to support the edge-aware mesh reconstruction inspired by Bignoli et al. (2018) and Bódis-Szomorú et al. (2015). Our intuition is that the usage of such edge information, aligned with natural depth discontinuities, may support the meshing algorithms in such a way to highlight well-defined natural geometric edges while preserving details and avoiding oversmoothing. In contrast to most approaches, we do not integrate the edge points with the SfM sparse cloud, but rather leverage them into the MVS pipeline. We consider edge information priorly extracted with Line3D++, as our approach is based on regularly spaced points along the edges and the output of EdgeGraph3D didn't fulfil this requirement.
Line Sampling: Line3D++ extracts linear segments as vectors and outputs information about their start and end points' coordinates in 3D and 2D, along with the visibility information.
In our method, we interpolate linearly 3D points along the edges in equal space intervals, based on the average spacing of the input dense cloud (Figure 4). Keeping a regular point spacing will benefit the generation of triangles of similar edge length and face area around the region in the mesh reconstruction step. Integration to MVS: 3D edge points and their respective visibility information are integrated into the dense point cloud generated by the PatchMatch MVS algorithm (Bleyer et al., 2011;Shen, 2013) as implemented in the OpenMVS library (Cernea, 2021). Hence, we generate a final dense cloud merging the dense cloud points coming from the depth map fusion plus the 3D points along the edges. In order to eliminate the noise and the effects of potential errors, we filter out dense cloud points within a buffer of m times the average spacing before meshing, depending on the dataset. Mesh reconstruction is performed using Delaunay tetrahedralization as implemented in OpenMVS and built upon the CGAL library (2021). Iterative consistency checks are made for every point of the input cloud and finally the surface is extracted using a graph-cut method. Since such methods produce inevitably a significant amount of non-manifold vertices, an essential post processing step follows to repair the local topology of the mesh. Using the VCG library (2021) as integrated in OpenMVS, standard spike and spurious removal, hole filling and mesh smoothing is performed. Vertex repositioning follows, applied only to vertices belonging to edges, in order to keep the original position of the edge points. The mesh reconstruction implemented in OpenMVS was preferred over others, as it has been proven robust enough in previous works (Nocerino et al., 2020). In our experiments though, we do not perform the final mesh refinement minimizing the photometric error based on point visibility, but rather focus on the standard mesh reconstruction.

Datasets
We perform our experiments on benchmark datasets for which ground truth (GT) 3D information is provided. In particular: Fountain-P11 (Strecha et al., 2008): 11 high resolution images (3072 x 2048 pixel) with known camera poses from the EPFL benchmark. GT 3D mesh is provided for evaluation, as a result of a laser scanning acquisition. ETH3D-Façade (Schöps et al., 2017): 76 high resolution (6048 x 4032 pixel) sequence from the ETH benchmark. GT point cloud acquired with laser scanning is available for evaluation. DTU-006 (Aanaes et al., 2014): 49 medium resolution images (1600 x 1200) of known poses from the DTU benchmark under seven varying illumination conditions. For our experiments we used the middle exposure images. GT point clouds for evaluation are acquired with a structure light scanner.

Parameter settings
Line3D++ pixel error and minimum visibility threshold were tuned appropriately for each dataset, taking into consideration criteria as overlap between images, image resolution, pixel size and level of fine details that need to be reconstructed. Specifically, p = 2.5 was chosen for all datasets and v = 3,5,7 for Fountain-P11, ETH3D-Facade and DTU-006, respectively. For dense reconstruction, images were resized to max 3200 pixels and during the mesh reconstruction no decimation was performed. For the dense point filtering, we used m = 1.5,2,3 times the average spacing for Fountain-P11, ETH3D-Facade and DTU-006, taking into consideration the image resolution and noise of every sequence.

Evaluation Metrics
The literature on the geometric quality of meshes and preserved edges is quite weak, despite for industrial meshing (Stimpson et al., 2007). In our experiments, appearance-based metrics were chosen for the evaluation since our approach aims to add fine edge details in the final mesh reconstruction. More particularly, we use geometric features based on the combination of the eigenvalues ! ≥ " ≥ # ≥ 0 of the covariance tensor computed within a local neighbourhood of a point, as used by (Weinmann et al., 2015;Hackel at al., 2016) as implemented in CloudCompare (2021). More particularly, the metrics that were experimentally found to be of significance for highlighting the fine details, were surface variation and normal change rate. Local surface variation is defined as: Similar to surface variation, normal change rate represents the curvature variation within a local neighbourhood radius. This kernel size k was decided taking into consideration the density of the vertices. Results are shown as colour maps to highlight the high frequency details and noise ( Figure 6-7-8).
Using the proposed approach, more evident details are observed in the qualitative comparison of the mesh models, especially where accurate 3D edges were reconstructed ( Figure 5). Inevitably, potential errors in edge reconstruction may add some noisy triangles, hence a more robust 3D edge reconstruction and refinement step may be needed. However, RMS error to the GT model is similar between the standard mesh and the mesh generated using our approach (Table 1). But using the appearance-based metrics, we observe more evident fine details using the proposed approach.

CONCLUSIONS
An ideally reconstructed mesh in MVS scenarios should be smooth but also detail-preserving, especially around the natural crease edges. In this paper we presented an approach to leverage edge information in the standard MVS mesh reconstruction pipeline. Edges are detected and reconstructed in 3D using the approach of Hofer et al. (2016) enwrapped in the Line3D++ algorithm, while the performance and usability of EdgeGraph3D is also evaluated for our dense MVS scenarios. This presented edge-enhancing MVS methodology provide for more detailedpreserving meshes based on the appearance-based metrics used for the evaluation. Among the potential limitations of the method we address that the quality of the reconstructed mesh highly depends on the accuracy of the extracted 3D edges.