SFM-BASED 3D RECONSTRUCTION OF HERITAGE ASSETS USING UAV THERMAL IMAGES

: In the last few years, notable progress has been made in the field of non-invasive diagnostic for the monitoring of heritage assets. In particular, multispectral imagery (more specifically thermal images will be addressed in this manuscript) allows investigations in the non-visible range of the electro-magnetic spectrum to be effectively carried out. Many researchers are currently exploring the possibilities related to the use of this kind of images in photogrammetric SfM-based processes to produce 2D and 3D value-added metric products, characterised by high level of detail and spatial resolution, including the information connected to the non-visible data. A data fusion-based strategy enables co-registering visible and thermal images in order to exploit the higher spatial resolution of the traditional true colour images. However, there are still many shortcomings to be addressed to properly and efficiently orient TIR (Thermal Infrared) images, connected (among other factors) to their low spatial resolution, or to the low contrast between adjacent materials characterised by similar emissivity. This paper proposes two different workflows to process thermal images using SfM algorithms, applied to three different case studies, each characterised by different characteristics and features (size, morphology, emissivity of the materials, etc.). The different pipelines are described and the obtained results are critically evaluated considering the metric accuracy, 3D geometric reconstruction and noise, completeness of the data and overall quality of the generated dense point cloud. Additionally, the effectiveness of the adopted strategies in connection with the peculiar features of the analysed case studies is also considered.


INTRODUCTION
In the framework of built heritage monitoring operations, a particular attention is paid to the development of sustainable strategies which consider the intrinsic fragility of the monitored assets. Contactless and non-invasive methods are certainly preferred, and in the last decades many techniques and methodologies have been developed with the aim to perform noninvasive diagnostic investigations of the historical buildings. Multispectral sensors -which allow to acquire radiometric information not only in the visible part of the electro-magnetic spectrum, such as the thermal radiation in the case of TIR images -represent a very interesting potential for the monitoring of architectural assets (Lerma et al. 2012;Adamopoulos & Rinaudo 2021). Additionally, the mass-market availability of highperformance commercial sensors allows a broader user community to access this kind of technology for operational applications. In this regard, it should be underlined the remarkable opportunity to spatially connect the radiometric information (e.g., thermal radiation) to detailed 3D models where geometries are featured by a very high-level of detail, which nowadays we are able to generate using different strategies and techniques, generally by means of digital photogrammetry (Patrucco et al. 2020a). For this reason, in the last few years many Geomatics researches made a significant contribution in this direction, developing efficient and user-oriented co-registration strategies in the fieldwork, in order to connect radiometric data with high-resolution and detailed spatial models (Scaioni et al. 2017;Adamopoulos et al. 2020). Also, the possibility to directly exploit thermal images in consolidated photogrammetric workflows has been explored. In particular, different studies * Corresponding author aimed at testing a SfM (Structure-from-Motion) approach using TIR images, obtaining remarkable results (Akçay 2021;Dlesk et al. 2018;Adamopoulos et al. 2020). However, many issues connected to the co-registration of TIR and visible images still need to be solved and the development of a user-oriented data fusion strategy certainly represents an actual challenge (Javadnejad et al. 2019). The limited spatial resolution of the thermograms represents one of the principal issues during the autocorrelation or imagematching phase. If the number of extracted homologous points is too low -due to the low resolution of the images and to the low contrast between materials characterised by similar thermal emissivity (leading to similar temperatures) -the 3D reconstruction could be heavily affected by outliers if not fail. Considering the extraordinary potentialities of this kind of imagery in the framework of non-destructive investigations of heritage assets (especially if taking into account the precious possibility to enrich the high resolution 3D models with nonvisible multispectral information, following data fusion strategies) (Dlesk et al. 2022;Patrucco et al. 2020b), it becomes fundamental to underline how the development of new coregistration techniques can provide new effective and powerful tools for the documentation -and therefore for the safeguard -of the cultural heritage. In the particular case of this contribution, the main aim of the presented research is to test -and to evaluate -the effectiveness and the efficiency of two different SfM-based approaches in the framework of thermal images photogrammetric processing. Specifically, during this research experience three different thermal dataset of heritage assets have been considered.
The first approach is to apply a standard SfM-based pipeline (Scaioni et al. 2017;Patrucco et al. 2020) to estimate the position and the assets of the thermal cameras, and therefore to perform a three-dimensional reconstruction of the surveyed asset. The second approach consists in approximating the positions of the cameras using the estimated values of a visible dataset acquired with a traditional optical camera (embedded in the thermal camera which has been used during the thermal datasets acquisition) and then in optimising the obtained results using a set of control points. This precise comparison of the obtained results has been made on three different case studies. Each case study, as underlined in the next section, has been selected due to their own specificities, in order to properly stress how the characteristic features of each considered building can heavily affect the acquisition of the data and, therefore, to the processing strategies. One of the crucial aspects that need to be considered in the field of the documentation of historical buildings is represented by the uniqueness of these kinds of assets. For this reason, the planning of survey activities in this framework needs to be carefully addressed by considering the characteristics and the peculiarities of the studied objects, in order to efficiently respond to the needs related to the monitoring requirements of the cultural heritage. The flexibility of the new instruments and techniques -which have been developed in the last few years -represents a key factor (Spanò et al. 2018) since in some cases, for many reasons (size of the object, morphology, materials and many others), the same documentation strategy cannot be efficiently applied to case studies characterised by not homogeneous features, as underlined by the results of several research experiences. For this reason, in the framework of heritage documentation tailored solutions are usually required, since in many cases a similar approach represents the optimal strategy to effectively face the various issues intrinsically connected to the specific features characterising each asset belonging to our built heritage.
In the current research, three different case studies with different characteristics have been addressed, namely: different levels of morphological complexity of the surveyed buildings; different homogeneity of the materials composing the analysed surfaces; different complexity of the acquisition geometries of the thermal images (this last aspect is strictly related to the morphology of the surveyed buildings). More specifically, the three case studies (all located in north-west Italy) are: -a rural chapel located in Molini Allioni, a small alpine hamlet (Elva, Cuneo). The church is made of traditional stone masonry partially covered with a rough plaster coating, and the nature of the stone is extremely heterogeneous (Figure 1a); -a module of the reinforced concrete façade of the C. Alvaro-P. Gobetti Comprehensive School in Torino. The portion of the considered building is characterised by a regular geometry and a repetitive pattern ( Figure 1b); -the parabolic arch of Morano sul Po, near Casale Monferrato (AL) (Figure 1c). This historical arch is an evidence of the industrial past of the area mainly devoted to concrete production. Nowadays the arch of Morano sul Po is the focus of a valorisation project during which an extensive 3D metric survey campaign has been carried out (Patrucco et al. 2021).

Data acquisition
In the last decades the constant development of new highperformance and relatively cheap sensors in the framework of UAV (Unmanned Aerial Vehicle) platforms has greatly contributed to the enhancement of a sustainable and nondestructive monitoring from an aerial close-range perspective. In addition, the possibilities related to beyond-visible imagery acquisition (e.g. multi-spectral cameras, hyperspectral cameras, thermal cameras) using UAV platforms have been explored in the last few years thanks to the exponential growth in terms of performances of COTS (Commercial Off-The-Shelf) solutions. Nowadays these kinds of technology are having a widespread use in different application areas, as demonstrated by the results obtained during numerous research experiences (Belcore et al. 2021;Olivetti et al. 2020;Melis et al. 2020).
Concerning the current research experiences, the acquisition of the thermal images has been performed for all the case studies using a UAV system, the DJI Matrice 210 V2, which mounts a DJI Zenmuse XT2 thermal camera. This camera model is equipped with two separate sensors allowing the acquisition of both thermal and visible images at the same time with a constant relative position and an -approximately -equal angle. The main specifications of both passive sensors can be observed in Table 1. For each case study the coordinates of a set of reference points have been measured to provide an adequate number of Ground Control Points (GCPs) to orient the photogrammetric model and Control Points (CPs) to assess its 3D accuracy.

(a) Zenmuse XT2 (thermal sensor)
-In the case of the rural chapel in Molini Allioni, a set of points (16) has been extracted from a LiDAR point cloud acquired using a Faro Focus 3D X 330 (accuracy ±2 mm @ 10 m). The points have been extracted in those areas of the chapel façade characterized by a high radiometric contrast in the visible images, in the TIR images -due to the different emissivity of adjacent elements materials -as well as in the coloured TLS point cloud, to unambiguously identify the selected points (Patrucco et al. 2020).
-As far as the module of the façade of the C. Alvaro-P. Gobetti Comprehensive School is concerned, 17 points have been acquired using a total station. Both natural points placed in high contrasted areas and low emissivity aluminum markers (Hill et al. 2020) have been measured (Fig. 2).
-In the case of the parabolic arch of Morano, both natural points and low emissivity aluminum markers have been measured using a total station; furthermore, some additional points have been extracted from a LiDAR point cloud (in this case the scans have been acquired using a Faro Focus 3D X 330 and a Faro Focus 3D S120, as described in Patrucco et al. 2021). The data have been collected following the well-known CIPA 3x3 rules (Waldhäusl et al. 2013) and ensuring a high percentage of overlapping between the acquired images (>80-90 %) and high convergence of the cameras, according to consolidated SfM related guidelines. Of course, the acquisition scheme has been fine-tuned according to the specific geometry and the morphology of the surveyed objects for the three case studies.
As regards the rural chapel in Molini Allioni, two photogrammetric stripes have been performed acquiring 47 convergent images from an estimated distance of about 5 m ( Table 2). As it is possible to observe in Figure 3, the geometry of the acquisition scheme is relatively simple. As far as the Alvaro-Gobetti school dataset is concerned, the considered module of the façade has been acquired with several photogrammetric stripes, both longitudinal and transversal (for a total of two cross grid flights) ( Fig. 4) with the camera in a forward configuration. Also in this case, considering the geometrical simplicity of the façade, the acquisition scheme followed to cover all the surveyed surfaces is relatively uncomplicated. During this acquisition, 183 images (VIS and TIR) have been acquired from a mean estimated shooting distance of ca. 12 meters (Table 2). The parabolic arch of Morano sul Po, due to its complex spatial configuration, in addition to the homogeneity of the materials leading to limited temperature differences, can be considered the most challenging among the three analysed case studies. Consequently, the geometry of the acquisition was the most complex among the three case studies. For this reason, it was necessary to collect a large number of images from different perspectives in order to cover all the surfaces of the considered object, while ensuring an adequate overlapping between consecutive cameras, as shown in Figure 5. A total of 568 images has been acquired. The different average shooting distances of the three acquisitions can be observed in Table 2, while the estimated GSD can be observed in Table 3. Of course, the estimated GSD of the thermal datasets is higher than the corresponding visible datasets due to the lower spatial resolution of the TIR images.

Data processing strategies
All the datasets described in the previous section have been processed as follows, using the COTS photogrammetric software Agisoft Metashape (build 1.8.0): 1) Photogrammetric processing of the visible images following a standard photogrammetric SfM-based approach: internal camera orientation (i.o.) and tie points sparse cloud generation by means of relative external orientation (e.o.); absolute external orientation using GCPs; evaluation of the metric accuracy using CPs; depth maps generation and dense point cloud generation.
2) Photogrammetric processing of the thermal images following the same pipeline described in point 1. This strategy -applied to the TIR images -will be referred in the following sections as "Workflow 1".
3) Orientation of the TIR dataset using the previously estimated absolute e.o. parameters of the visible cameras as an initial approximate e.o. solution for TIR images, i.e. importing for each TIR image the relevant camera positions and attitudes of the corresponding image of the visible dataset; optimisation of the cameras using GCPs (required to estimate the relative position of the thermal sensor with respect to the visible one); evaluation of the metric accuracy using CPs; depth maps generation and dense point cloud generation. This strategy will be referred in the following sections as "Workflow 2". The flowchart of the followed workflows can be observed in Figure 6. In the following sections, the processing of the datasets of the three different case studies are specifically described; furthermore, a critical evaluation of the results (in terms of workflow, metric accuracy, 3D geometric reconstruction, noise, completeness of the data, etc.) will be presented, considering the possibility and the advantages connected to the merging of the visible model (characterised by a higher spatial resolution and level of detail) and the information embedded in the thermal model, in a data fusion perspective.

Rural chapel dataset
Between the three analysed case studies, the rural chapel is the one that most closely resembles the optimal scenario as concerns the SfM-based photogrammetric process. The acquisition geometry is relatively simple; the acquisition distance is relatively low (5 m, Table 3); the TIR images are characterised by high radiometric contrast due to the high heterogeneity of the materials of the chapel façades, allowing for easier homologous points detection in the stereoscopic pairs; the surveyed building is relatively small and characterised by non-modular geometries. For these reasons, this is the case study where it is possible to observe a higher number of extracted key points and, consequently, the estimation of the parameters for the orientation of the cameras is easier. The visible images have been properly oriented following a standard photogrammetric pipeline; as regards the TIR dataset, both the Workflow 1 and the Workflow 2 (previously described) have been followed. In both cases the 47 thermal images have been successfully aligned and a sparse cloud of tie points has been generated. As previously reported, a set of 16 points has been used as GCPs (11) and as CPs (5)  A preliminary comment is related to the quality of the dense point cloud generated with the thermal dataset (Fig. 7, b and d). After a preliminary visual inspection, it is clear that the previously described characteristics of the TIR images have allowed to achieve a thermal point cloud which is comparable to the visible one both in terms of completeness and geometric reconstruction of the final 3D data; in addition, also the achieved accuracies (Table 4) are of the same order of magnitude. As expected, the level of detail of the TIR-based point cloud is lower than the visible-based one (due to the lower spatial resolution), but the obtained point cloud would be already adequate for the generation of metric value-added products (e.g. 3D mesh or orthomosaic).

C. Alvaro-P. Gobetti Comprehensive School dataset
The second collection of processed datasets is the one of the module of the façade of the C. Alvaro-P. Gobetti Comprehensive School. This second case can be considered slightly more challenging by the processing point of view considering the high modularity of the elements of the façade and the homogeneity of the materials; however, the geometries to be reconstructed are relatively planar and simple. In all the three cases (visible dataset; thermal dataset processed according to Workflow 1; thermal dataset processed according to Workflow 2) it was possible to successfully perform the orientation of all the images following the previously described strategies, and then subsequently to generate a dense point cloud. The RSME (Root Mean Square Error) achieved after the bundle adjustment is reported in Table  5. While in the previous case (the Mollini-Allioni chapel dataset) the difference between the accuracies observed on the CPs is millimeter-level, in this case it is possible to observe a higher discrepancy (approx. 1.5 centimeters) due to the higher acquisition distances.  Unlike the previously analysed case study, where the resolution and the completeness of the thermal point cloud were sufficient for the generation of metric products, in this case several critical areas are visible in both TIR-based point clouds (Fig. 12), especially in the areas corresponding to the windows or to the plastered walls, where the identification of homologous points is more difficult. In this case a data fusion based strategy is advisable, in order to exploit the higher completeness and geometric resolution of the visible point cloud (Fig. 8).

Parabolic arch of Morano sul Po dataset
The parabolic arch case study is the most complex of the three ones analysed. In the case of the visible dataset, it was possible to achieve a proper estimation of all the 568 camera positions only after applying exclusion masks, in order to remove the pixels of the images containing background data (e.g. sky, vegetation and all the elements potentially jeopardising the autocorrelation phase). The same time-consuming manual masking process carried out by an image analyst has been applied to the TIR dataset, before proceeding with the relevant data processing. However, in both cases it was not possible to achieve a complete orientation of the entire block of thermal images. In this case it is necessary to underline that, following the Workflow 1, only 42 thermal images have been oriented, while following Workflow 2 the proper orientation of 179 images was possible (Fig. 9). The dense point cloud obtained following the second approach is therefore significantly most complete (Fig. 10). The metric accuracies obtained after the photogrammetric processing can be observed in Table 6; it is necessary to specify that the number of GCPs and CPs used for Workflow 1 is lower since only images facing one side of the parabolic arch has been correctly oriented, and therefore the coordinated of the CPs placed on the other sides of the structure cannot be estimated. Also in this case, the accuracies observed on the CPs are millimeter-level, while the mean error of the thermal dataset is sensibly higher (ca. 4 centimeters).    Table 6. Mean residuals on GCPs and CPs (parabolic arch datasets). (A) Visible dataset; (B) thermal dataset (Workflow 1), (C) thermal dataset (Workflow 2).

EVALUATION OF THE RESULTS AND DISCUSSION
After the processing of the data, it was necessary to carefully analyse the generated dense point clouds to understand the effectiveness of the proposed workflows and the robustness of the metric products.
To evaluate the overall quality of the point clouds, the confidence value automatically generated from the Metashape platform has been considered. The confidence value is defined as "the number of contributing combined depth maps" used for the generation of each point of the final point cloud. This value is then "recorded and stored as a confidence value", as a scalar field (Agisoft Metashape manual, 2022). Obviously, this is connected to the image overlapping (the higher overlapping percentage, the higher possibility to achieve better confidence levels), to the characteristics of the images (such as the resolution, the sharpness, the absence of blurred or smoothed out areas, etc.), to the possibility to correctly identify homologous points and, consequently, to the generation of high-quality depth maps. Even in those cases where the images have been collected with high overlapping criteria, the confidence level could be low for areas where an image-based 3D reconstruction is traditionally challenging, like those characterised by dense vegetation, reflective surfaces (e.g. windows panes), poor lighting conditions, etc. Generally, when a lower confidence value is observed, the generated point cloud is noisy and/or poorly reconstructed. Therefore, the confidence level can be considered as an indicator of the quality of the results of the photogrammetric process after the densification procedures. For this reason, in the current research this value has been adopted to compare the results obtained from the visible and the thermal datasets (processed following the two different approaches). As it is possible to observe in Tables 7, 8 and 9, three different confidence intervals have been considered: >2 depth maps/point (low confidence level), >5 depth maps/point (medium to high confidence level) and >10 depth maps/point (high confidence level).
Considering that the visible datasets have been acquired using the embedded visible camera of the DJI Zenmuse XT2, the overlapping between TIR images and true colour images is comparable. The only slight difference between the image coverage is due to the different focal length of the visible and TIR sensors (Table 1), obviously affecting the FoV (Field of View). However, as expected, in all the three cases the optical data allow to achieve results characterised not only by a higher level of detail, but also by a significant higher confidence level. This is presumably due to the higher resolution and radiometric contrast of the visible images. Nevertheless, it should be underlined that in the first case (the rural chapel), the percentage gap between TIR and visible confidence level is low (Table 8) as a result of: I) the shorter acquisition distance (Table 2) and II) to the higher radiometric contrast of the TIR imagery due to the very diversified materials composing the chapel façades, characterised by heterogeneous emissivity. This is also evident by a visual inspection of the considered point clouds (Fig. 11). For this reason, it is correct to underline that TIR images with similar characteristics (short acquisition distance; high radiometric characterisation; low modularity) allow to achieve a more accurate 3D geometric reconstruction. In fact, in this case the use of an approximate solution for the estimation of the cameras positions and assets did not produce appreciable benefits and, as observable in the Figure 11, the obtained results are strictly comparable.  Table 7. Molini Allioni rural chapel dataset point cloud analysis. N° of points with confidence greater than 10 (1), 5 (2) and 2 (3) for: (A) visible dataset, (B) thermal dataset (Workflow 1), (C) thermal dataset (Workflow 2).
For the second case (Alvaro-Gobetti school dataset), even though both the workflows have allowed to orient the entire thermal photogrammetric block, even from a preliminary visual inspection (Fig. 12) it is possible to notice that the confidence of the second thermal point cloud is sensibly higher than the first one. This is also underlined by the results reported in Table 8, where a difference in terms of confidence between 8.3% and 5.6% can be observed (depending on the considered ranges). In addition, the deviation is significant also as regards the true colour dataset, for the reasons previously described (higher shooting distance; higher homogeneity between the materials emissivity; presence of reflective surfaces; modularity of the façade). visible and thermal 3D reconstruction, it is also the one where we can observe a higher confidence deviation also between the two thermal point clouds (Table 9). In addition, only according to the second workflow it was possible to produce a relatively complete point cloud from the thermal dataset (Fig. 13), since the standard photogrammetric pipeline used in the first case allowed to orient only a relatively small number of images of a portion of the arch (Fig. 9). As far as the geometric resolution of the obtained 3D metric data is concerned, it should be underlined how in challenging cases (as the third case study) a data fusion strategy between visible and TIR datain order to achieve a thermal texturized mesh or an orthomosaic, exploiting the higher spatial resolution of the true colour dataset -is highly recommendable. This is due to the numerous lacks and topological errors in both the TIR-based point clouds.

CONCLUSIONS AND FUTURE PERSPECTIVES
The goal of this contribution was the evaluation of different SfM-based strategies aimed at achieving a 3D reconstruction derived from TIR images, considering different levels of complexity of the surveyed object and analysing the context-dependent applicability. As it was possible to observe in the previous section, the same approach applied to different case studies -with different characteristics -led to different results in terms of point cloud quality (noise, completeness, etc.). This aspect stresses the need to carefully plan the monitoring operations in the heritage domain, where dealing with very different and peculiar assets is very commonplace, adopting tailored workflows and strategies. Additionally, the authors deem it is not possible to define a one-way-fits-all strategy for photogrammetric 3D reconstruction based on thermal imagery. Despite the processing of TIR images using a 3D approach have is increasingly adopted, not only in the heritage conservation fieldwork, but also in many other disciplines (as demonstrated by the development of tailored workflows and templates in different software for the management of photogrammetric or 3D data), different bottlenecks have yet to be addressed. Nowadays one of the most efficient strategies to solve the problems connected to the thermal 3D reconstruction is represented by co-registration between oriented TIR images and high-resolution 3D models based on visible data, exploiting a common reference system. However, this approach is viable only if the thermograms are properly registered but, as stressed in the previous sections, there are still many criticalities related to the quality of TIR images (low resolution, low contrast, difficulties in homologous points extraction, etc.) and to the characteristics of the surveyed buildings (morphology of the object; emissivity of the materials; modularity; etc.). For this reason, in a user-oriented perspective, in order to support the scholars and the professionals in the field of the heritage conservation using thermal data, it is important to develop standard workflows to achieve an effective visible-TIR data fusion, to be chosen mainly depending on the peculiarities of each architectural asset belonging to our legacy.