Synthetic 3D Data Generation Pipeline for Geometric Deep Learning in Architecture

With the growing interest in deep learning algorithms and computational design in the architectural field, the need for large, accessible and diverse architectural datasets increases. We decided to tackle this problem by constructing a field-specific synthetic data generation pipeline that generates an arbitrary amount of 3D data along with the associated 2D and 3D annotations. The variety of annotations, the flexibility to customize the generated building and dataset parameters make this framework suitable for multiple deep learning tasks, including geometric deep learning that requires direct 3D supervision. Creating our building data generation pipeline we leveraged architectural knowledge from experts in order to construct a framework that would be modular, extendable and would provide a sufficient amount of class-balanced data samples. Moreover, we purposefully involve the researcher in the dataset customization allowing the introduction of additional building components, material textures, building classes, number and type of annotations as well as the number of views per 3D model sample. In this way, the framework would satisfy different research requirements and would be adaptable to a large variety of tasks. All code and data are made publicly available.


INTRODUCTION
The recent advances in CAD software showed the advantages of using 3D models to the specialists working in various fields. The advantages include the possibility of a 3D visualization of a project, inclusion of all the relevant metadata related to it, a relatively quick transformation to the desired representation, and the modifications that are reflected on all the stages and views. It is especially relevant to the field of architecture and urban planning that involves a lot of different data required to take into consideration as well as the spatial representation of a project. At the same time, the time and resources required for a full 3D model creation for a building project are one of the obstacles that prevent some of the specialists of the field from using 3D modeling in their pipelines.
A potential solution to this issue could be leveraging the Geometric Deep Learning techniques for the generation of 3D models from a given input. This input could potentially be represented in various ways, for instance, as a set of constraints determining the project or an image representation. In the case of an architectural project, it would be quite time-consuming to determine a set of rules due to the complexity of the subject. The use of the images would take advantage of the ability of Deep Neural Networks to learn visual features in a self-supervised manner.
One of the challenges the researchers face when dealing with this question is availability of annotated 3D datasets. The complexity of the 3D model creation process, a variety of software and approaches to building modeling, differences in the modeling quality, and the lack of access for the completed and finished 3D models make it difficult for the neural networks to learn successfully. At this point, to our knowledge, there are no existing architectural datasets at the building scale with a sufficient amount of samples to perform Geometric Deep Learning. Due to these reasons we provide the architectural community with a data generation pipeline that automatically creates a synthetic building dataset suitable for various Deep Learning tasks. Due to the specificity of our contribution and the lack of the datasets on this architectural scale, there are no benchmarks or downstream metrics existing so far.
The establishment of a 3D dataset of building envelopes with relevant information and development of a 3D reconstruction framework could benefit a manifold of different industries such as robot navigation, drone delivery, retails, urban planning, AR/ VR gaming experiences.

RELATED WORKS
3D Datasets The complexity of the urban planning and architecture fields create a variety of Deep Learning applications applicable to them. The drawback of this aspect is that different tasks require different input and ground truth data not only in terms of format but also in terms of internal requirements within one format. For instance, generative design of a building facade would require a semantic segmentation annotation, while 3D reconstruction would require a 2D representation of a building as well as a 3D representation, the format of which could vary based on the researcher's approach (mesh, voxels, point cloud). The intricacy of the subject of interest makes the dataset creation time-consuming, so while there are quite a few 3D datasets (Sun et al., 2018) (Chang et al., 2015) (Wu et al., 2015) (Kim et al., 2020) (Xiang et al., 2014) (Xiang et al., 2016) (Mo et al., 2019) (Koch et al., 2019) (Lim et al., 2013) (Table 1), not many of them are related to architecture. The format issue makes this scarce number even smaller when applied to a specific problem, such as facade reconstruction, 3D classification or semantic segmentation of the building parts. Urban 3D datasets are the most common ones in the field. This is due to the availability of data at the city level (shapefiles), open-source geospatial software and the relative ease of production, as most of these datasets contain Level of Detail 1 (LoD1) models (Gao et al., 2021). Level of Detail(LoD) is an important characteristic of the architectural datasets that allows to specify the amount of detail and generalization present in the 3D model (Biljecki et al., 2016b). LoD1 refers to a building envelope without any additional details while LoD4 indicates a detailed building model with an internal structure.
The work that that aims to solve the same problem is 3DCityDB (Yao et al., 2018), which contains the 3D representations of the real cities. The advantage their system has, is the use of the realworld data and texturing that gives more precise and realistic representations of buildings. The drawbacks of their framework include the limited amount of samples (as for the time of writing the database mentions the cities of Berlin and New York), the lack of meaningful information about the structures, absence of the semantic segmentation related to the parts of the buildings and the simplified models of the buildings (extruded polygons) that do not include finer details. The models in 3DCityDB are given in the urban context which could be seen as an advantage or disadvantage based on the task; moreover, there are no image annotations or separate 3D files the researchers could use directly for Deep Learning purposes.
Another approach that is similar to ours is Random3DCity (Biljecki et al., 2016a), that exploits procedural generation in order to create a variety of building forms. However, the authors aim to get a simplified building representation without the use of the real-world texturing which makes it inapplicable to single image to 3D reconstruction problem as we approach it. Moreover, the tool is intended to be used in CityGML format (Gröger and Plümer, 2012) that imposes certain limitations on the researchers. The main advantage of this dataset is the possibility to decide the LoD level up to LoD4 which includes the internal building structure and modularity that allows a relatively easy scaling of the dataset and its variations.
The dataset that is closest to ours is Structured3D (Zheng et al., 2020) that involves synthetic 3D models and image annotations for the architectural interior space. The annotations of this dataset include segmentation, depth and rendered images as well as the objects' structure and several interior configurations. There are a few more synthetic interior 3D scene datasets such as SceneNet (Handa et al., 2015) or InteriorNet (Li et al., 2018) (does not include 3D models). Unfortunately, all of them tackle the problem at a different architectural scale concentrating on one interior space while our task requires building exterior models with the information related to the structure.
Another disadvantage of the mentioned architectural datasets is related to their construction process, as the building objects they contain are not parametric, even the ones containing synthetic data. This aspect puts some limitations on the researchers using these datasets. We address this issue in our solution by providing a parametric building generation using programmatic procedural approach. The datasets that are used by the majority of the researchers studying geometric deep learning usually consist of a heterogeneous set of items divided into several categories. While being a good fit for research frameworks, these datasets could hardly be used in industry due to the lack of specificity. One major issue that arises while dealing with 3D datasets is class imbalance which introduces a bias in the learning process if not managed properly, as in Table 1.  (Xia et al., 2018) that provides 3D models of the building structures. This dataset is rather rich in annotations as it provides depth, normals and segmentation annotations, rendered images as well as full 3D mesh models. Unfortunately, this dataset, too, is focused primarily on the interior space while leaving the building exterior without semantic segmentation and texturing. Our dataset concentrates on this limitation by providing a dataset similar in structure but applied to the external appearance of the buildings rather than the internal one. Another drawback of Gibson dataset is its size, as it does not provide a sufficient number of building samples for geometric deep learning frameworks. Our solution for this problem is generation of an arbitrary amount of building samples depending on the need of the research project.

Existing Datasets
At the same time, the unavailability of the datasets or data generation pipelines in the field of architecture results in unavailability of the benchmarks related to the data generation pipeline evaluation.

Requirements
From the analysis of the existing datasets we have inferred the requirements for the new framework: • Sufficient dataset size • Class balance to avoid bias • Modularity of data • Variation and diversity in data • Generating required ground truth signals • Associated metadata generation • Extensibility, to accommodate more classes/models Since this dataset is mostly intended to be used in Deep Learning tasks, it is necessary for it to have a sufficient amount of samples for the artificial neural networks to learn successfully. The dataset should also be rich in details and alterations of materials, modules, shapes, dimensions. At the same time all the classes and variations should not only be ample in the number of samples but also balanced in order not to introduce an additional bias to the learning process.
The modularity requirement of the dataset that refers to the parametric nature of a generated building allows for the substitution of different parameters and architectural components, randomization, and modification of the object's structure.
Architecture is a wide field with multiple problems that can be solved via deep learning. To accustom to as many problems from the spectrum as possible it fundamental to provide different ground truth signals and metadata that would help in the quantitative evaluation of the models' performance. Moreover, the wide range of possible applications requires the dataset to be modular and extendable to be adjusted to different problems based on the researchers' needs.

Proposed framework
The synthetic data generation pipeline developed by us addresses the prerequisites mentioned in the Subsection 3.2. It is intended primarily for the 3D reconstruction based on parts assembly task , as we have identified that the image to 3D translation would have a major impact in many applications in the architectural field. Although, it should be noted that the annotations and modularity of the framework make it possible to use the generated data for the other deep learning tasks as well. In order to adjust for all the possible applications of this dataset it was decided to partially engage the user in the dataset creation and give the freedom of setting a number of parameters that define the content of the dataset and its variations. It has been decided to define five classes at the initial stage of the framework development, as can be seen in Figure 1.

Implementation details
In contrast to the existing generic 3D datasets, and alike many architectural datasets, ours contains synthetic data only. This difference allows generating the data relative to a particular research problem in a customized manner. This characteristic automatically lifts the problem of image-shape alignment present in the other datasets (Sun et al., 2018) as the images, their segmentation masks, and depth annotations are generated from the mesh directly. For each sample we provide the 3D model in .obj format, the point cloud consisting of 2048 points, the rendered image, the ground truth annotations (segmentation mask, depth image in .exr format, surface normals), and the metadata, which will be expanded in the future editions.

Generation
The dataset size does not have an upper bound as it is created in a generative manner, the user can define the number of samples needed to be created and the classes the dataset should contain, as well as the set of component modules (e.g. windows and balconies) and textures to include. The user can add his own modules as .obj files to be used in the building dataset generation.
The input parameters include level of detail (1 or 2), use of materials, textures used (if any), modules used (if any), building types, generated image size, number of points per point cloud, some of the output formats and the general characteristics of the buildings defined by common knowledge and city regulation base, for instance, minimum and maximum building height, width and length. It is also possible to indicate whether it is necessary to generate several views for one 3D building model and how many. All the parameters need to be specified in the configuration file.
The output consists of the following set of data, also illustrated in Figure 2: Dataset Generation code, and sample dataset renders has been made publicly available. 1

Domain Randomization
In order to adjust for the domain adaptation of the real-world building data, it was necessary to introduce domain randomization, which is considered one of the most successful techniques 1 github.com/CDInstitute/Building-Dataset-Generator to handle the transfer from synthetic to real data (Tremblay et al., 2018). The pipeline uses majorly the parameters and the textures that are relatively close to the real-world ones in order to be able to adjust for the domain adaptation task. Our dataset generation framework has a parameter that allows to create randomized sample images from the same 3D model sample.
The randomization parameters are: • Texture, reflectance of various building components • Light color, strength and position • Camera position and angle Randomization of the camera angle is illustrated in Figure 3. The camera assumes a random position on the XY plane, while its rotation along the local Z axis is limited by the range (40, 100) in order to take the most significant views of the building model, close to the ones that could be taken in the real world. Texture is randomized per building volume, as in Figure 5, with a user-defined probability that all the volumes will be textured with the same material. The textures are selected randomly from the user-added textures. Light randomization is illustrated in Figure 4.

Performance
We ran the dataset generation algorithm with different input parameters on Windows 10 OS on CPU and GPU using AMD Ryzen 7 3800-X 8-Core Processor and GeForce GTX 1080 and report the performance in the Table 4. The images were rendered with 500x500 dimensions. The dataset generation time heavily depends on the number of components in the models, as generation of more components requires more time; thus, larger or taller buildings require more time to be generated with respect to the models with smaller dimensions.
As previously mentioned, due to the lack of accessible datasets on the architectural scale and the specificity of the contribution, there are no benchmarks and downstream metrics available in the field.

LIMITATIONS
Building design can be considered a product design task of a very high complexity as it involves multiple parameters densely interconnected between them. This aspect makes the generation of lifelike synthetic data rather complicated. One of the limitations of the proposed dataset generation framework is the lack  of authenticity, as it does not provide the real-world data and many visual features present in the photos made in an urban environment lack in the rendered images.
Another limitation is the lack of the urban surroundings around the building. The object is rendered in an empty scene which does not occur in the real world and thus makes domain transfer task more complicated. Moreover, the buildings generated with the framework account for the most typical design patterns present in the major cities, while the more creative and outstanding operas of architecture are not included in the generation pipeline due to the high complexity of the problem. This limitation is the most severe one as it narrows the range of possible building solutions and the entire variety of the architectural forms is not presented in this work. Not only does it affect the generalization capability of the neural networks, but it also introduces a bias in the training process.
We are planning to address the mentioned limitations in our future research work.

PERSPECTIVES
The proposed dataset is not free from limitations, which we intend to tackle in our future works. This work is the first step towards the use of Geometric Deep Learning in the field of Architecture, consequently, it opens multiple possible ways for future development. One of them is to improve the quality and variation of the building models by adding various building types, components and textures and by making the renders look more realistic. Another possible development direction could be the extension of the level of detail and the addition of the building parameters into the dataset as well as the internal building structure. Moreover, it is important to develop the framework to account for the limitation of not having the urban surroundings. Expanding the framework in order to have the nearby buildings, urban furniture, trees and other elements of urban environment is fundamental for 3D reconstruction in architecture.
The exploration of the Geometric Deep Learning Frameworks intended for single image to 3D reconstruction, their performance on the present dataset is equally important to the development of the dataset generation framework, as it is the task the dataset was intended for. This direction of research could bring multiple practical implications related to the fields of architecture, gaming, AR/VR, simulation and others. Moreover, with the variety of annotation this dataset provides, it would be possible to use it in the other Deep Learning tasks as well.
Finally, this dataset could serve as a base for the advances in the field of generative architectural design via the exploitation of Generative Adversarial Networks and their ability to learn from the visual features. Moreover, the possibility to generate synthetic data in various ways could facilitate domain adaptation from synthetic to real data.

CONTRIBUTION
The contribution of this work consists of two principal parts: • A detailed overview of the existing datasets related to the architectural and urban fields as well as the main geometric deep learning frameworks aimed at solving single image to 3D reconstruction task for buildings; • Providing the architectural and deep learning community with a field-specific dataset generation pipeline targeted at various tasks.

CONCLUSION
The overview of the frameworks related to the Single Image to 3D Reconstruction task demonstrate the necessity of a fieldspecific dataset due to their inability to generalize to unseen data. The exploration of the existing architectures has suggested the necessary requirements and highlighted the weak points of the existing architectural datasets, which later became the foundation for our tool.
Presented dataset generation framework provides a field-specific tool for the generation of 2D and 3D data in various formats. This instrument gives the researchers the possibility to apply the Deep Learning and Geometric Deep Learning techniques to the architecture-related tasks. The framework proved to be able to generate large amounts of various data in a short time compared to the traditional methods previously exploited in the building 3D modelling and the consequent image rendering. Moreover, the way the tool was built allows for extendability and customization of the dataset to the specific tasks within the architectural field using task-specific building components, textures and building typologies. It also addresses the limitations of the existing urban datasets we have given an extensive overview of.
However, this dataset is not without drawbacks, the main of which is the use of the synthetic data instead of the real-world data. This is the trade-off that was made due to the high complexity and time and effort consumption of the 3D modelling and 3D scanning processes. Our decision in favor for the synthetic data opens multiple directions for the future work as on the dataset framework as on the algorithms using this data to learn. Moreover, the provided synthetic data facilitates the research in 3D reconstruction related to the architecture field that was not available in the open access previously.