DATA FUSION IN CULTURAL HERITAGE – A REVIEW

: Geometric documentation is one of the most important task of the Cultural Heritage (CH) conservation and management policies. 3D documentation, prior to any conservation and restoration works, is considered a basic pre-requisite for preserving, understanding, communicating and valorizing CH sites and objects (London Charter, 2009; Sevilla Principles, 2011). 3D models have become the usual way of digitally preserving, communicating, explaining and disseminating cultural knowledge, as they have the capability of reproducing ancient states and behaviors. Using photo-realistic and accurate 3D models, the current conservation state can be shown and preserve for future generations. But despite the large request of 3D models in the CH field, there is no 3D documentation method which can properly satisfy all the areas with their requirements, therefore a fusion methodology (of data and sensors) is normally required and performed. The paper analyzes the fusion concept and levels as well as some merging approaches so far presented in the research community. While the paper will be necessarily incomplete due to space limitations, it will hopefully give an understanding on the actual methods of data fusion and clarify some open research issues.


INTRODUCTION
Data fusion, or data integration, refers to the process of merging data (and knowledge) coming from different sources (or sensors) -and, generally, at different geometric resolutionbut representing the same real-world object in order to produce a consistent, accurate and useful representation. Data fusion processes are often categorized as low, intermediate or high, depending on the processing stage at which fusion takes place (Lawrence, 2004). Low level data fusion combines several sources of raw data to produce new raw data. The expectation is that fused data is more informative and synthetic than the original inputs. Similar terms referring to the same concept are data integration, sensor fusion or information fusion. Data fusion is commonly applied in many Cultural Heritage (CH) documentation projects -even without being aware -especially when large or complex scenarios are considered and surveyed (Gruen et al., 2005;Guarnieri et al., 2006;Guidi et al., 2009;Remondino et al., 2009;Fassi et al., 2001;Remondino et al., 2011;Fiorillo et al., 2013;Cosentino, 2015;Serna et al., 2015). The fusion is done to exploit the intrinsic advantages and overcome the weaknesses of each dataset (or sensor), merging or integrating image-based with range-based point clouds, visible with multispectral images, terrestrial with aerial acquisitions, etc. Data fusion is an essential issue and powerful solution in Cultural Heritage, as frequently there is no way to be completely successful without combining methodologies and data. Other fields where data fusion is a common strategy are remote sensing and medical imaging.
The paper will primarily review the needs, problems and proposed solutions in data fusion, reporting some of the past scientific publications. The paper will not describe all the acquisition techniques or fix a working pipeline as every scenario and project need different approaches and solutions. This review, while necessarily incomplete due to space limitations, will hopefully give an understanding on the actual methods of data fusion and clarify some open research issues.

DATA FUS ION LEVELS
According to  three different fusion levels can be considered: low, medium and high. Low level (data fusion) combines raw data from different sources to obtain new data which should be more representative than the original ones; medium level (feature fusion) merges features coming from different raw data inputs; high level fusion is related to statistic and fuzzy logic methods. In Forkuo et al. (2004) three approaches for data fusion are mentioned as well: the first one integrates data from two sources; the second one represents the fusion derived from feature matching; the last one, called model-based fusion, consists of establishing the relation between 2D digital images and 3D range point cloud data, either to derive the orientation of the digital image or to produce a photo-realistic 3D model projecting image intensity over the 3D point cloud. We propose to expand the fusion classification with respect to different aspects: -Purpose-based levels: o Raw-level: Generate a new data from the raw sources o M edium-level: Relate the existing data o High-level: Obtain a complete textured 3D model -Data-based levels: o Point-based o Feature-based o Surface-based -Dimension-based levels: o 3D-to-3D o 2D-to-3D o 2D-to-2D High-level, surface-based and 3D-to-3D fusion level are the most common ones, both raw data are processed independently and the resulting meshes are merged in order to derive a complete 3D model (Guidi et al., 2009). The medium-level approach is a feature-based detection approach, either in 3D or 2D, in order to compute the relative orientation parameters between both sensors (Kochi et al. 2012). Lastly, the raw-level data fusion approach analyzes each raw data capabilities, detects its weaknesses and overtakes them with complementary raw data .

DATA FUS ION IN CULTURAL HERITAGE
The main goal in CH geometric documentation is to generate a complete, photo-realistic and accurate 2D (e.g. maps, orthos, plans, etc.) or 3D (dense point cloud, polygonal model, etc.) product. To obtain it, a wide range of data and sensors could be considered and integrated (Fig. 2): passive sensors (satellite, aerial, terrestrial or underwater images), active sensors (airborne and TOF laser scanners, triangulation and structured light scanners, SAR/Radar), GNSS and/or topographic ground data (used for scale and georeferencing purposes), manual measurements (tape or disto) and even drawings or old material like analogic pictures. According to the working scale and required products, we can summarize the available data and sources (i.e. sensors, platforms, instruments, etc.) as shown in Fig.3 (Lambers and Remondino, 2007). As there is no panacea, the integration of all these data and techniques is definitely the best solution for 3D surveying and modeling projects in the CH field.

RANGING VS IMAGING
In most of the CH applications, ranging instruments (aerial or terrestrial laser scanning, structured light scanners, RGB-D sensors, etc.) and imaging techniques (multispectral, photogrammetry, computer vision, remote sensing, etc.) based on passive sensors (i.e. digital cameras) are the two most employed solutions. The first comprehensive comparison between range and image data (traditional photogrammetry) was performed in Baltsavias (1999). Afterwards many researchers have dealt either with the complementary of the two techniques or with the choice between them (Boehler and M arbs, 2004;El-Hakim et al., 2008;Kiparissi and Skarlatos, 2012;Andrews et al., 2013). When laser scanning gained his popularity, early XXI century, many people thought photogrammetry was over. Laser scanning grow in popularity as a means to produce dense point clouds for 3D documentation, mapping and visualization purposes at various scales, while photogrammetry could not efficiently deliver results similar to those achieved with ranging instruments. Consequently these active sensors became the dominant technology in 3D recording at different scale (aerial and terrestrial) and replaced photogrammetry in many application areas. Further, many photogrammetric researchers shifted their research interests to range sensor, resulting in further decline in advancements of the photogrammetric technique. But over the past five years, many improvements in hardware and software, primarily pushed from the Computer Vision community (e.g. Structure from M otion -SfM tools), have improved photogrammetry -based solutions and algorithms to the point that laser scanners and photogrammetry now can deliver comparable geometrical 3D results for many applications. As summarized in Table 1, both approaches have advantages and disadvantages. Important to be mentioned is the issue of edges (or geometric discontinuities) and object properties. Edges elements are not usually well defined in range-based point clouds due to the scanning principle and reflectivity effects. Regarding object materials, on one hand photogrammetric dense matching algorithms can suffer from mismatches on difficult surfaces, but it has been proved to be more effective with materials that cause lots of reflective problems with laser beams (e.g. marble -El-Hakim et al., 2008), or even they are able to reconstruct transparent materials based on changing phenomena (M orris and Dutulakos, 2007). On the other hand, textureless surfaces are almost impossible to be geometrically reconstructed with image matching algorithms but successfully registered with range methods. And, despite reflective materials still have problems for both methods, imaging coupled with laser-induced fluorescence targeting seem to give good results over highly reflective or transparent materials (Jones et al., 2003). Give problems Give problems

Textureless surfaces
Not influenced Give problems (unless using pattern / spray)

Transparent materials
Give problems (unless using spray) Give problems Humidity, wet surfaces Possible beam / laser absorption Not much influenced

Underwater applications
Only triangulation scanners Yes Fast moving objects Yes (but mainly industrial solutions) Yes (high speed cameras) Table 1: Image-based and range-based approaches with the respective characteristics.

DATA FUS ION APPROACHES IN CH
Several works have been published showing the integration and fusion of different data and sensors in CH applications. The fusion is normally done to overcome some weaknesses of the techniques, e.g. lack of texture, gaps due to occlusions, noncollaborative material/surfaces, etc. Forkuo et al. (2004) generate a synthetic image by re-projecting a 3D range point cloud onto a virtual image, as taken from the same point of view (projection center) of the scanner, but calculated with the same resolution as an image taken from a real camera would have had from the same point of view. This synthetic image is used to perform feature matching over the original real images. The quality results of this approach are fully dependent on the synthetic images generated, which totally influence the matching process. Furthermore, depending on the image resolution, sometimes it would be difficult, nearly impossible; to derive the desired resolution from the original scanned space points. A similar approach was done by Gonzalez-Aguilera et al. (2009), performing an automatic orientation of digital images and synthetic image from range data. Kochi et al. (2012) combined terrestrial laser scanning and stereo-photogrammetry, using stereo-images to survey areas where the laser scanner has some occlusions. The method is based on raw data 3D edge detection, independently for both stereo-images and laser point clouds, and 3D edges are extracted and matched to register the two datasets. Despite the interesting method, the case study where it was applied consists on a new building where the architectural elements are straight, nearly all perpendicular, and edges are very well defined. Whereas, CH artifacts usually don't have such obvious edge conditions and it is generally quite difficult to find straight edges. Hastedt et al. (2012) presented a "hardware" approach with a multi-sensor system including range and image sensors. The system is constructed and pre-calibrated (interior and exterior orientation) so the relative positions are known and the corresponding image radiometric levels can be assigned to the range-based point clouds by ray back-projection. The main drawback is the necessity of recalibration every time any relative movement between scanner and camera occurs. Gasparović et al. (2012) analyzed the texture readability, comparing the texture obtained from the scanner instrument and from photogrammetry. The readability is considered as the radiometric quality, so is about projecting textures coming from digital images. In Lambers et al. (2007), image texture is projected over the range-based mesh. To do so, images are oriented in the same reference system of the range data by using ground control points and afterwards a projection is applied in order to texture the mesh. M eschini et al. (2014) presented a kind of geometric mesh fusion using complementary terrestrial range-based data and aerial (UAV) image-based data. The procedure is performed to The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-5/W7, 201525th International CIPA Symposium 2015, 31 August -04 September 2015 avoid occlusions in the range-based datasets and, optimistically, assuming the range data accuracy higher than the image-based one. Tsai et al. (2014) presented a single-view reconstruction method based on image vanishing points in order to obtain rectified facades and used them as the texture for range-based 3D models. This approach has the advantage to employ even old photographs or drawings to texture the geometric model.

DATA FUS ION NECES S ITIES AND PROBLEMS
When planning a 3D documentation of a heritage object or site we have to observe and consider: -Requirements: accuracy level, radiometry importance, documentation purpose, final use of the 3D products, etc.
-Scene / object characteristics: color, presence of complex ornamental elements, not-cooperative surfaces, dimensions, location, etc.
-Equipment / sensor availability. To fuse different data, a set of common references in a similar reference system clearly identified at each dataset is indispensable. These common references can be either one, two or three-dimensional, and can be identified either manual or automatically. Due to the fact that every data has its own characteristics, the identification of natural common references is sometimes a hard task. That's the reason why targets are widely used to be able to clearly recognize them in each dataset. Targets are normally used to merge data during the processing procedure. It is also common to perform a 3D-to-3D data registration, calculating the necessary transformation with a best-fit approach. This solution is of course not considering the intrinsic advantages of the employed surveying techniques as the data fusion is done at the end of the processing pipeline. Regarding the accuracy level, when mixing several data we have to ensure that accuracy and details are preserved. This could be problematic when merging data at different geometric resolution. Another approach to consider the resolution differences between datasets, is to define several accuracy levels, adapting the level of information of each artifact to its acquisition methodology. In Guidi et al. (2009) a multiresolution proposal was presented: several acquisition methods and instrumentation were used, and the output texture resolution coming from each artifact was calculated as a function of the geometric resolution of the acquisition methodology employed. It also important to fully understand the capabilities and performances of the employed instruments otherwise the data integration is not leading to better results.

CONCLUS IONS
Considering the "Purpose and Efficiency principles" (Sevilla Principles, 2011), a CH geometric documentation and a computer-based visualization should satisfy the project requirements using the most cost-effective and proper technique. Assuming this, firstly we have to consider the data fusion objective even at the first stage of the pipeline (i.e. the data acquisition step). At the processing stage, the main challenge in CH data fusion nowadays is to be able to automatically register data coming from range-and imagebased sensors. To do that, several approaches were presented but none seems to completely success with a real CH element, at least not in an automatic way. An idea could be the use of edges as a cue to link the available datasets. Indeed one of the disadvantages of range-based sensors (particularly ToF) is the lack of edge definition and its noise at object borders. However, image processing algorithms are able to detect and extract edges, so images can be used in a way to filter range data. Edges could thus be detected in the images and re-projected onto the raw range-based data in order to detect and delete nonsignificant geometric information, improve the quality of the collected data and achieve better geometric reconstructions. Given the large abundance of data and sensors available nowadays, we can for sure foresee more and more research activities in the data fusion field, particularly in the heritage field. This links also to the big-data issue, i.e. the correct handling of large quantity of heterogeneous data.