DETAILED PRIMITIVE-BASED 3D MODELING OF ARCHITECTURAL ELEMENTS

: The article describes a pipeline, based on image-data, for the 3D reconstruction of building façades or architectural elements and the successive modeling using geometric primitives. The approach overcome some existing problems in modeling architectural elements and deliver efficient-in-size reality-based textured 3D models useful for metric applications. For the 3D reconstruction, an open-source pipeline developed within the TAPENADE project is employed. In the successive modeling steps, the user manually selects an area containing an architectural element (capital, column, bas-relief, window tympanum, etc.) and then the procedure fits geometric primitives and computes disparity and displacement maps in order to tie visual and geometric information together in a light but detailed 3D model. Examples are reported and commented.


INTRODUCTION
Reality-based 3D surveying and modeling of scenes or objects should be intended as the entire procedure that starts with the reality-based data acquisition, geometric and radiometric data processing and structured 3D information generation.Surveying techniques employ hardware and software to metrically record the reality as it is, documenting in 3D the actual visible situation of a site by means of images, range-data, CAD drawing and maps, classical surveying or an integration of the aforementioned techniques, in particular in case of large and complex sites.For the detailed 3D modeling of monuments or single objects (columns, arches, etc.), unstructured dense or sparse point clouds need to be segmented and converted into structured data, like polygonal meshes, for further applications like texturing, visualization, understanding, style analyses, etc. Often the final 3D polygonal model, after the texture mapping phase, due to its high geometric details and heavy texture information, needs to be subsampled or highly reduce in order to allow more fast visualization (e.g.web or gaming) and interactive access.For this reason, bump and normal mapping techniques were developed in order to enhance the appearance and details of low-resolution models adding visual geometric details without using more polygons (Cohen et al., 1998;Cignoni et al., 1998).As already reported in El-Hakim (2002), a 3D reconstruction system must be able to produce digital models of real world scenes with the following requirements: (i) high geometric accuracy, (ii) capturing of all the details and (iii) photo-realism.These should be linked to (iv) full automation, (v) low-cost, (vi) portability, (vii) flexibility in applications and (viii) efficiency in model size.We can certainly say that such a system which satisfies all the aforementioned requirements and characteristics is still in the future.The article presents a methodology which tries to fulfill the above-mentioned points in particular to create highly detailed 3D models with reduced model size.The methodology is based on dense image matching 3D reconstruction and on a fitting of geometric primitives (cubes, cylinders, pyramids, etc.) for the generation of light but still detailed 3D models.The method relies only on reality-based 3D data, produces efficient 3D models in model size and it is reliable even on complex geometric elements.

RELATED WORKS
Automated architectural 3D reconstruction is still an active topic of research, pushed by the public demand of complete and photo-realistic 3D city models (e.g. for management, planning and BIM applications) and by the activities of Google and Microsoft.The generation of textured polygonal models from aerial images started almost 20 years ago (Gruen et al., 1995) with numerous approaches, data sources, performances and LOD representations (Gruen and Wang, 1998;Brenner, 2005;Kolbe et al., 2005;Haala and Kada, 2010;Karantzalos and Paragios, 2010).Although it is often considered a solved task, digital building model generation of complex structures still remains a challenging issue (Habib et al., 2010), in particular when geometrically detailed textured façades are needed.Indeed in many north American or modern cities, simple planar polygons are sufficient while many European cities comprise buildings with different architectural styles and epochs, thus demanding a complete modeling of all the façade features.This must be achieved (possibly) with an automated approach able to deliver structured surface representations similar to those achievable by an expert photogrammetric operator or CADdesigner (Van den Heuvel, 2000).Considering only groundbased data acquisitions, the literature on detailed façade reconstruction is quite vast.The employed data can be groundbased façade multi-view images (El-Hakim et al., 2005;Xiao et al., 2008), single images (Mueller et al., 2007;Barinova et al., 2008), ground-based laser scanning (Wan and Sharf, 2012) or a combination of sources (Becker and Haala, 2007;El-Hakim et al., 2007;Pu and Vosselman, 2009).Regarding the developed approaches, the most reliable algorithms applicable to real scenarios could be classified as:  procedural methods (Wonka et al., 2003;Mueller et al., 2006;Finkenzeller, 2008;Hohmann et al., 2009): they are based on grammar-based CGA shape modeling, using interactive or automated approaches and Generative Modeling Language (GML). model-based methods (Debevec et al., 1996;Schindler and Bauer, 2003;Zhang et al., 2011): they introduce prior knowledge into the shape reconstruction of buildings or façades.Thus geometric features are used, computing their parameters in order to optimally fit the input data (images or ranges). data-driven (Penard et al., 2005;Becker, 2009;Larsen and Moeslund, 2011): starting from a coarse model or a point cloud, the 3D reconstruction is refined and finalized as polygonal model.All the proposed systems which produced satisfactory and complete results are semi-automated (El-Hakim, 2002;Kersten et al., 2004;Sinha et al., 2008) thus requiring an operator during the modeling part of the façade elements.Fully automated methods still rely on geometry simplifications (e.g.limiting the possible façade elements to pre-specific types or considering windows as dark areas) or produce geometric models which are not containing all the salient architectural elements.

UNDERSTANDING ARCHITECTURAL SHAPES AND COMPOSITIONS
The developed methodology is based on the principle that one of the most effective ways to define the architectural surveying is to regard it as the (digital) rebuilding of the original architectural project.A 3D surveying and modeling project is indeed a reverse process in which, starting from an existing object, the process of its realization is rebuilt and the idea of the design is interpreted, i.e. it is an upstream of its realization.
The study of the drawing convention in the history of the architectural representation has a double finality: the first one leads to the representation, the second one leads to the surveying of the object.These two analyses the architectural elements are strictly interdependent.On the other hand, the knowledge extraction problem consists in identifying the genesis of the element shape to define the appropriate way for its measurement and for its representation.To this end, architectural knowledge rules have to be formalized.An architectural knowledge system can be described as a collection of structured objects, identified through a precise vocabulary.
Several studies led to the definition of classification methods for architectural elements based on levels of abstraction of the architectural space (Tzonis et al. 1986).These classifications are based on the study of the architecture treaties which organize the art to build knowledge relatively to different historical periods.Many treaties developed an identity coding of architectural elements.This identity is normally expressed through a hierarchical description of all the elements which make a build unit.In (Palladio, 1570), by means of a representation convention, each architectural element is expressed by (i) a geometrical description level (lines, curves), (ii) a topological relations level (parallelism, concentricity, etc) and (iii) a spatial relations level (proportions, harmonic reports/ratios).The problem of reverse engineering processes of architectural buildings return to the extraction of these three dimensions starting from the acquired 3D data.Thus an appropriate 3D modeling method should start from the various sources of knowledge and data (including the study of particular cases) in order to extract drawing rules, formalize them and make their appropriate digital translation into a semantic-based template library.
Starting from a geometrical analysis of the various parts of a building and having as a goal its geometrical and semantic description, De Luca et al. (2006) proposed a method for the geometrical reconstruction starting from profiles.This method is based on the analysis of invariant and morphological specificities that can be extracted from a semantic cutting of an architectural structure.It is known that throughout the history of architecture, the morphological complexity of the shapes was always influenced by the methods of geometrical control that made their conception possible.Examples of these methods are the descriptive geometry (Monge, 1799), or stereotomy (Desargues, 1640).Based on a study of the principles subjacent to these control methods of the architectural shape, one can then identify on one side relevant information to extract from a survey (profiles in a point cloud for example) and on the other side the process of construction better adapted to ensure the geometrical restitution of the elements.
In the presented methodology, we identified five key concepts for understanding the geometrical nature of classical buildings and for a better detailed 3D restitution of their shape (De Luca et al., 2007):  Dominant surface: each building presents a dominant surface which characterizes its space extension and its principal internal divisions.


Transition: in the classical language, architectural shapes are based on geometrical transitions thus profiles shared by two elements must be distinguished in the general composition.In the specific case of a digital 3D reconstruction, the problem consists in identifying the transitions between the elements throughout a dominant direction (see Fig. 2b).


Plans of construction: the profiles extracted from relevant plans thus constitute the descriptors of surfaces that can be generated by traditional functions of modeling (sweeping, revolution, interpolation, etc.).


Replication: the composition of large architectural buildings is normally based on the distribution of repeated elements.These elements are often organized following geometrical layouts: symmetry, rhythm and other rules of composition.Moreover, these replications can interest various scales: the hierarchical relations that the architectural composition expresses organize the characterized elements around the concepts of order, module, stage, or frontage. Mouldings: understanding the role the decorative mouldings are playing in the definition of architecture element shapes is essential.Indeed it help to (i) rebuild profiles by comparing them with a description language that belongs to the architectural representation, (ii) describe the building as a collection of objects identified by a precise vocabulary, (iii) better interpret the surveying data and (iv) avoid producing incorrect profiles from an architectural analysis point of view.

METHODOLOGY
The methodology relies on terrestrial image acquisitions in order to derive dense unstructured point clouds for the successive 3D modeling phase based on geometric primitives.
The aim is to generate simple geometric primitives that can be enriched with details contained in the original data set.
For the image-based 3D reconstruction (section 4.1 and 4.2), an open-source photogrammetric pipeline developed within the TAPEnADe project (www.tapenade.gamsau.archi.fr) is employed (Pierrot-Deseilligny et al., 2011).Indeed the potentialities of the image-based approach with respect to range-based / LiDAR methods is getting more and more evident, thanks to the latest developments in dense image matching (Hirschmueller 2008;Remondino et al. 2008;Hiep et al. 2009;Furukawa and Ponce 2010;Jachiet et al., 2010;Hirschmueller, 2011) and the availability of web-based and open-source processing tools (e.g.Photosynth, 123DCatch, Apero, MicMac, etc.).These developments for terrestrial applications, based on photogrammetry and computer vision methods, have shown very promising results and renewed attention for image-based 3D modeling as an inexpensive, robust and practical alternative to 3D scanning.
For the successive modeling steps, the user manually selects an area containing an architectural element (capital, column, basrelief, window tympanum, etc.) and then the procedure (i) model the element using geometric primitives (section 4.3) and (ii) map visual and geometric information on the simplified elements using "enriched textures" (section 4.4 and 4.5).This modeling part of the methodology has been implemented into a 3D reconstruction tool developed in MEL (Maya Embedded Language).

Image data acquisition
The employed digital camera are assumed to be perfectly calibrated in advanced in order to compute precise and reliable interior parameters (Remondino and Fraser, 2006).Although the developed algorithms and methodology can perform selfcalibration (i.e.camera calibration), is always better to accurately calibrate the camera using a 3D object / scene (e.g.lab testfield or building's corner) following the basic photogrammetric rules.To deliver metric results that meet specific project requirements, the image capturing must be planned follow well-tested best practice guidelines (e.g.Waldhaeusl and Ogleby, 1994).Nowadays the appropriate planning of sensor positions remains a highly active research area and efficient methods are still under investigation in order to guarantee (i) optimum sensor positioning, (ii) complete object coverage, (iii) sufficient overlap for automated multiview registration (a good compromise between a strong B/D ratio and automated matching methods is mandatory) and (iv) high geometric accuracy and detail of the final results.

Image triangulation and DSM generation
For the image orientation, the methodology relies on the open source APERO software (Pierrot-Deseilligny and Clery, 2011).
As APERO is targeted for a wide range of images, lenses and applications, it requires some input parameters to give to the user a fine control on all the initialization and minimization steps of the orientation procedure.APERO consists of different modules for tie point extraction, initial solution computation, bundle adjustment for relative and absolute orientation.If available, external information like GNSS/INS observations of the camera perspective centres, GCPs coordinates, known distances and planes can be imported and included in the least squares adjustment.
Once the camera poses are estimated, a dense point cloud is extracted (Fig. 1) using the open-source MicMac software (Pierrot-Deseilligny and Paparoditis, 2006).The matching algorithm consists of a multi-scale, multi-resolution, pyramidal approach and derives dense point clouds using an energy minimization function to enforce surface regularities and avoid undesirable jumps.The pyramidal approach speed up the processing time and assures that the matched points extracted in each level are similar.
Figure 1: Image-based 3D reconstruction of a column with its decorated capital and basement.

Primitive-based 3D modeling
Starting from the extracted dense point cloud or generated polygonal mesh model, the interactive and semi-automatic 3D reconstruction approach is based on three methods according to the morphological complexity of the analyzed architectural shapes:  Basic primitive adjustment (cubes, cylinders, pyramids, etc.) to the selected architectural element using a minimization approach. Progressive extrusion of relevant profiles (extracted from the point cloud) along typical path (arcs, columns, vaults, etc.) by using a set of low-level primitives (parametric mouldings). A library of parameterized architectural primitives (moulded elements of the classical language) which can be instantiated (i.e.dimensioned and positioned) onto the point cloud (De Luca et al. 2007).The intersection of a point cloud with a relevant plane allows to extract and identify points describing distinctive profiles of the architectural elements in order to exploit them for a successive surface reconstruction process.The relevant plane is defined by a generic surface connected to a camera with an orthogonal projection defined by an in-depth limited visual pyramid (in order to display entities ranging between the near and the far clip plane).Profiles extracted from this intersection plane constitute the surface descriptors which can be generated using five customized "surface generation" operators: linear extrusion, path extrusion, surface revolution, boundary and planar surface generation.As described in section 3, the architectural composition of a great number of buildings is based on a logical spatial distribution of elements.These elements are often organized by geometrical layouts around the concepts of order, module, stage, etc. Symmetry is the basic principle of a great number of architectural shapes where we can find correspondences and replications of sizes, positions and orientations between the different parts and sub-parts of an element.In geometrical modeling, symmetry is also an intrinsic property related to extrusion.The combined use of extrusion and scaling functions along the same symmetry axis, allows, for example, a progressive reconstruction of the transitions characterizing the shape along a dominant direction.Thus this principle can be applied to the basic 3D reconstruction of the envelope of a column (Fig. 2b) in order to:  Automatically extract several horizontal profiles in order to define a general vertical axis of the object.
Figure 2: The reconstructed column, planes in correspondence of the identified shape transitions, the basic primitives fitted between the planes, the disparity map derived from the geometric details and mapped onto the basic primitives and the final light and efficient-in-size 3D model of the column textured with a displacement map.
 Select (interactive or automatic) shape's transitions (i.e.discontinuities) to extract relevant profiles describing particular sections along the shape's dominant axis. Best fit generic profiles (circles, arcs, rectangles, etc.) or mouldings (fillet, astragal, scotia, echinus, torus, etc.) on the extracted profiles. Generate a parametric surface by interpolation of the extracted profiles along the dominant axis.This approach (De Luca et 2007) allows the description of architectural shapes with a generic formalism based on a network of semantic atoms.A node atom contains essential information for its representation in space (position, rotation, scale) and controls four under-nodes which share attributes.

Embedding visual and geometric details in primitives
This step of the process is based on the computation of a disparity map between the original surveyed element (dense point cloud or mesh) and the simplified one (primitive, light mesh, etc.).In principle:  a generic texture image is created defining a mapping function according to the object complexity (planar, cubic, cylindrical, spherical).


for each pixel of the generic texture image, (i) the transformation between the image point (texture space) and the primitive point space) and (ii) the distance between the primitive point and the original point cloud (or mesh) along the primitive's surface normal are computed. considering the distribution of the extracted distances, a parametric color ramp (in gray scale) is computed to express the complete range of disparity between the primitive and the original element (see Fig. 2d).The visual and geometric information (coming from the original element) is then mapped onto the simplified elements (Fig. 2e) by creating "enriched textures" composed by:  a displacement map altering the surface of the simplified geometry in order to embed and simulate surface details.This map is created (in the rendering step) capturing the vertical distance between two points on a surface into the computed disparity map.
 a normal map capturing the surface normal information of the original point cloud or mesh. a traditional color map containing high definition textures produced using the oriented images.

A COMPLEX ARCHITECTURE
Following all the steps of the methodology, element by element, an entire monument can be modeled in 3D (Fig. 3), preserving the metrics, geometric and visual details recovered during the dense matching reconstruction phase but still delivering a light and efficient-in-size 3D result.Figure 3 shows an example of the façade/portal of Saint-Trophime church in Arles (France), composed of arches, columns, mouldings, capitals, etc.For the 3D recording and geometric modeling, approximately 90 images were acquired with a Nikon D3X (24 Mpixel) using a 24-85 mm lens.The derived dense point cloud (Fig. 3b-c) consists of ca 45 mil.points which were then converted into a mesh of ca 10 mil.polygons.The successive primitive-based modeling could satisfactory reconstruct the façade obtaining a light 3D polygonal model (Fig. 3d).For example, the upper arch primitive is composed by 41 polygonal faces in total (Fig. 3e-f).
The disparity map computed with the proposed method generates a displacement map (Fig. 3g) which has a great sense of depth and detail, allowing to see self-occlusions, selfshadowing and silhouettes (although its calculation is costly as it has to handle the large amount of geometry in background).It's worth to notice that the displacement map, contrary to bump or normal mapping approaches, is like a geometrical representation of the surveyed surface and allow to go back to the geometric details if necessary.

CONCLUSIONS
The article presented a reality-based methodology to create effective-in-size but still detailed 3D models of façades and architectural elements.The idea mainly consists in two step: (i) the 3D surveying and reconstruction of the architectural scene using images and image matching algorithms and (ii) an interactive primitive-based fitting of the dense point cloud or mesh and the derivation of disparity and displacement maps in order to tie visual and geometric information together.Displacement maps are a good alternative to bump, normal and parallax mapping as they contain the real geometric information where there is a surface's displacement.On the other hand bump maps are ideal for fine detail in relatively flat areas but they don't modify the shape of the object and are mainly a graphical / visual effect, i.e. a simulation of the geometrical details.For many years, displacement mapping was only a distinctive peculiarity of high-end rendering systems while nowadays even real-time APIs (e.g.OpenGL, DirectX, etc.) can afford this technique.If compared to other interactive approaches developed in previous researches, where tie points are used to derive the basic geometric elements of the scene or fit predefined geometric elements onto the images the known image orientation parameters, our method derives dense point clouds in order to interactively create polygonal models enriched with displacement maps.Thus the method can easily model the fine details, thanks to the dense point clouds.If compared to automated methods based on meshed models, where point clouds are wrapped into polygonal models, the presented approach delivers much "lighter" results compared to the original meshed one and thus is appropriate e.g. for webapplications, 3D real-time renderings, etc. Moreover it contains all the architectural elements in detailed form and so it can be used for analyses, comparison, replicas, etc.The presented approach can reduce drastically the geometrical weight of complex architectural 3D reconstruction but produces models which embed rich detail coming from the surveying phase.
The façade of Saint-Trophime church (a) reconstructed using ca 90 images.The image matching procedure produced a dense cloud with ca 45 million points (b, c).The primitive-based modeling result on the entire façade (d) and a closer view on the upper arch (e, f).The displacement map (g) and the final primitive-based model (h).