A COMPLETE FRAMEWORK OPERATING SPATIALLY-ORIENTED RTI IN A 3D/2D CULTURAL HERITAGE DOCUMENTATION AND ANALYSIS TOOL

: Close-Range Photogrammetry (CRP) and Reﬂectance Transformation Imaging (RTI) are two of the most used image-based techniques when documenting and analyzing Cultural Heritage (CH) objects. Nevertheless, their potential impact in supporting study and analysis of conservation status of CH assets is reduced as they remain mostly applied and analyzed separately. This is mostly because we miss easy-to-use tools for of a spatial registration of multimodal data and features for joint visualisation gaps. The aim of this paper is to describe a complete framework for an effective data fusion and to present a user friendly viewer enabling the joint visual analysis of 2D/3D data and RTI images. This contribution is framed by the on-going implementation of automatic multimodal registration (3D, 2D RGB and RTI) into a collaborative web platform (AIOLI) enabling the management of hybrid representations through an intuitive visualization framework and also supporting semantic enrichment through spatialized 2D/3D annotations.


INTRODUCTION: FILLING THE GAP BETWEEN CRP AND RTI
Multimodal acquisitions in Digital Heritage studies are becoming commonplace while requiring efficient frameworks to fully enable the analytic potentials of diverse spatially-registered results produced with imaging techniques. The photographic-based techniques, Close Range Photogrammetry (CRP) and Reflectance Transformation Imaging (RTI) (Ciortan et al., 2016) are widely used and diffused among CH community, also because several open-source packages are available to process and visualize those data types. Nevertheless, they are usually not used together since there is a lack of integrated solutions able to immerse different data types in the same working space and to visually analyze them jointly. The so-called multi-view RTI have been presented as a viable solution (Gunawardane et al., 2009), but both a proper methodology for acquisition, and accessible tools for processing and visualization are missing. Therefore this paper proposes a methodological shift, since the two media are not treated in isolation or side-by-side; conversely, we present the first results of a combined approach to link CRP and RTI into a common framework. Starting from some open source solutions, we designed a computational workflow allowing to merge and visualize RTI images with 3D models (as well as the raw 2D images used to produce the 3D models). Therefore, the main goal of this paper is to demonstrate the potential of an open platform integrating spatially-oriented and re-ligthable images into a 3D environment able to visualize data produced with the multi-view reconstruction approach.

RELATED WORK
CRP and RTI, are considered the dual low-cost techniques for 3D and 2D documentation in ICT and CH literature, broadly describ- * Corresponding author ing either the basic technologies and practical guidelines (Remondino et al., 2014, Ciortan et al., 2016 or the many applications tested (coins, rock painting/inscriptions, painted canvas, etc). Several works have also proposed comparisons to highlight advantages and weaknesses among competitive technologies (Mathys et al., 2013). On one hand, Multi-view and imagebased modeling (CRP and SfM) are periodically discussed regarding their gain of completeness and/or accuracy in necessary critical overviews (Remondino et al., 2014), whereas they can be exploited nowadays as a stable and reliable technique. On the other hand, Multi-light and RTI's techniques are massively developing from the original open-source code (Malzbender et al., 2001, Mudge et al., 2006, with numerous enhancements concerning the fitting-viewing side (Palma et al., 2010, Giachetti et al., 2017a and going to multispectral (Hanneken, 2014, Giachetti et al., 2017b and new devices (Schuster et al., 2014) support.
A third part of the literature focuses on the technical fusion of both approaches aiming or to enhance a 3D model with religthable reflectance texture mapping (Berrier et al., 2015) either to improve 3D reconstruction (Wu et al., 2011). Finally, a recent work tries to link RTI and 3D model from a methodological side (Shi et al., 2016), but a common and reliable spatial registration to fully benefit of this type of data fusion and endorse human-driven/datadriven analysis is missing. Meanwhile, the issue of multi-view RTI has been discussed for a decade (Gunawardane et al., 2009)  lenses and light sources) is a key feature for the quality of the final results. Despite their major difference is that one is based on the motion of a light source and the other on the motion of a camera, a common data acquisition protocol could easily converge to a combined approach, making easier the further data merging phase. Moreover, the camera parameters for optimal image quality for the two techniques are well-known as very similar even if the two processes are independent (e.g. lossless encoding, lowest ISO, sweet spot aperture, sharpness and fixed white balance).
To combine efficiently CRP and RTI, data acquisition protocol have been evaluated in wide diversity of acquisition scenarios, recommending to include or anticipate the RTI views as a component of the photogrammetric sequence; at the same time, we consider important to minimize as far as possible the contextual deviations of shooting conditions, parameters and equipment. In this context, the spatial registration of RTI will be conditioned by the robustness of incremental approaches relying on: i) features extraction and matching score; ii) the evaluation of intrinsic parameters; iii) the spatial overlapping allowing an accurate estimation of camera poses. The Spatial Orientation of an RTI towards the respective portion of 3D model -hereafter referred as SO-RTI -is maximized by the general overlapping and consistency of the RTI picture with its closer neighbours in the set of images acquired and processed in the CRP pipeline. From the photogrammetric meaning, this notion of overlap is extended to all elements composing the picture, including all elements below hereinafter cited as causes of matching issues : • The spatial resolution (i.e. the minimal distance on the object between two pixels) scaled by modifying the focal length and/or the distance to the object.
• The pixel resolution (i.e. the number of pixels count) set by the image dimension and related to the sensor pixel density.
• The radiometric values, more especially the radiance (hence the luminosity of pixel in the image) that will affect features detection and extraction.
• The temporal or contextual gaps, (i.e. differences on the object itself or on its close environment of acquisition) have also to be considered (e.g. highlight detection ball of H-RTI).
For this purpose a benchmark composed of tens data sets (see Table 1) from real CH case-studies and experiments led by the MAP laboratory for the past years have been re-processed and evaluated using the photogrammetric-based registration described in the next section.

TOWARDS FULLY AND VERSATILE AUTOMATED CO-REGISTRATION
Our co-registration method is based on incremental processing as image sets had to be merged among several iterations, with multiple users and/or at different time. As tie-points extraction and matching is the core and the main issue of photogrammetric based 2D/3D registration, our pipeline is initialized by an enhancement of 2D matching based on best neighbours pre-selection (see Figure 15) aiming to optimize the numbers and the quality features extracted. It is followed by a self-adaptive method for internal and external calibration fitting to the complexity of the dataset. A complete description of the "Totally Automated Co-Registration and Orientation" photogrammetric based workflow, called TACO is given in a complementary publication (Pamart et al., 2019).
The global co-registration processing dedicated to RTI is structured as follow (see Figure 1), a first iteration (Iter0) aims to achieve a first photogrammetric result. At the current stage of development, this initial 3D scene composed of calibrated/oriented pictures and a 3D point cloud is set as a master acquisition and stay frozen. Hence, it implies that the RTI must overlap with the area covered by CRP dataset initially processed (completion and/or densification of the scene is a planned future work). On next iterations (Iter1 and next), RTI master views are extracted, matched and spatialized, sequentially trying two methods. Matching is performed on the albedo image, produced by exploiting the inverse lighting capacities of RTI. It has been chosen, among other master images derived from RTI such as Mean or Median composites, since our tests produced better results in feature extraction and matching, even if we noted a potential issue with some radiometric deviation. Figure 1. Diagram of the iterative co-registration processing for SO-RTI scene The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W9, 2019 8th Intl. Workshop 3D-ARCH "3D Virtual Reconstruction and Visualization of Complex Architectures", 6-8 February 2019, Bergamo, Italy The spatial registration is attempted firstly using a conventional Bundle-Block Adjustment (BBA) on the RTI image from which metadata have been copied from raw images so as to re-use and/or refine according to EXIF, the internal calibration from previous iterations. However, performance and accuracy of BBA aiming to retrieve both internal and external calibration on a single picture could be sub-optimal or fail, according to aforementioned matching issues. In the case of rebellious RTI (i.e. the ones with lower overlapping) we switch to a second approach, Spatial Resection (SR) using Direct Linear Transformation (Shih and Faig, 1988), which requires fewer matching points. Unfortunately, SR needs Ground Control Points (i.e. 2D and 3D coordinates) not always available and not suitable for automatic processing. Thus, an approximate but automatic solution has been implemented. Our GCP's generation is based on 2D tie-points and corresponding 3D coordinates extracted from depth-maps computed in the initial iteration. Of course the performance of the results is limited by the precision of tie-points and the relative accuracy of the initial dense cloud. Nevertheless, this surrogated method shows robust results in term of finding an homographic relation which is sufficient for our visualization purposes (as shown in Figure 14).
Nonetheless, the merging of multimodal CH dataset composed of photogrammetric and RTI acquisitions can be achieved through variable scenarios according to the type of data acquisition and set-up used for the respective technique. The first and optimal scenario consist of an ad-hoc combined data acquisition where the RTI have been acquired with an automated dome, thus using the same equipment (sensor and focal) in a fixed context (see data sets, results and illustrations of Tegulae, Manuscript and Vasarely). However, the RTI-to-CRP registration produced accurate results, even when slight changes in illumination occurred. As shown in Figure 2 Table 2. Results of orientations on multimodal dataset legend • Partially or oddly oriented • Not oriented 5. AN HYBRID 2D/3D VISUALIZATION FRAMEWORK Any modern collaborative system for visual analysis shall be designed as a web platform so as to allow multiple collaborators to work at the same time, and where data should not be replicated locally. This has been shown to be feasible even in cases where the typology and amount of data was massive, and the timeframe very short (Apollonio et al., 2017).
In this paper the data were integrated in the context of AIOLI, a web platform that allows to navigate, visualize, interact and annotate all those oriented images and 3D data in a viewer composed of multilayered cameras, light sources and 3D points. The 3D reconstruction produced with CRP is stored as a point cloud and it acts as a medium to transfer the annotations made on a single image to all the images which are depicting the same portion of surface. To do this, once the registration and images processing are complete, an indexation step aim to store the 2D/3D projective relationship that links all the contents together in a single coherent 3D representation space (Manuel et al., 2016). Thus, for each image an array is created to store the indexes of the 3D points corresponding to the projected pixels. This important step is more generally used to allow the automatic replication of annotations made on a specific resource (whether 2D or 3D) on all the others, but also to re-project on images some geometric descriptors (Manuel et al., 2018). All these contents are associated with layers and semantic attributes that can be freely controlled by the user. AIOLI's viewer distinguishes original oriented images, tagged as Master, and different auxiliary images, derived from calculation methods (normal maps, depth maps, curvature maps, uncorrelated images, etc.). Master images are used to simulate virtual cameras copying their intrinsic and extrinsic parameters, all images being visualized by texture projection over the near plane of its camera's frustum. According to his needs, the user can then switch from one texture to another and use it to make his annotations. The integration of RTI for the navigation and interaction is nearly straightforward. The image can be visualized as a master image together with all the others, possibly choosing a clear representative image (see Figure 3).
When an image is selected for tracing and annotation purposes, the interface is essentially the same as the one used to visualize RGB images, with the addition of a button that enables the light direction variation, so that the best possible illumination can be chosen to draw or select a new annotation area (see Figure 4).

PROCESSING AND ENCODING RTI IMAGES FOR WEB VISUALIZATION
While the potentials of collaborative web tools are clear, additional challenges rise especially when complex data sets have to be handled by several people at the same time. An annotation project, possibly including multimodal data, can easily reach hundreds of items to handle. A main issue is dealing with geometric data, for which several solutions has been devised (Ponchio and , Mwalongo et al., 2016. For the handling of complex point clouds, AIOLI makes use of an octreebased rendering library (Schutz, 2014) which is able to deal even with billions of points.
RTI images encode a quantity of information which is much bigger than RGB images, but even high resolution images can be an issue: in the past some solutions were devised (Palma et al., 2014 to provide easy web publication and visualization of RTI and high resolution images. For the integration of RTI in Aioli, we have endorsed a very recent encoding of RTI data (Ponchio et al., 2018b) which is able to fulfill the usually hard task of offering high compression without sacrificing sampling accuracy. The adopted RTI representation combines Principal Component Analysis (PCA), adopted for data reduction, with an interpolation scheme. PCA is heavily used for the compression and the transmission of Bidirectional Texture Func-tions (BTF) (Schwartz et al., 2013). In this context, the idea is to apply it to produce a compact representation of all the images acquired (N ) in different lighting conditions. Then, these images are recombined using Radial Basis Function (RBF) to obtain high visual fidelity. These two operations can be easily integrated into a weighted linear summation, making the visualization client simple and efficient to implement. The quality/size ratio obtained in this way is always higher than the one of standard RTI techniques, such as PTM or HSH. Additionally, the appearance of highly specular material is well preserved at a cost of moderate file size (300-400 Kbytes).
All the above presented results have been implemented in the Relight library (http://vcg.isti.cnr.it/relight/), that includes not only tools for creating a RTI image in any format, but also a Javascript library for web visualization. This initial code was modified to be integrated within the AIOLI framework, so that images are encoded in a tiled format, and refined progressively as soon as bandwidth is available.

CURRENT STATUS AND RESULTS OF THE IMPLEMENTATION IN THE AIOLI PLATFORM
In addition to the methodological explanations above, this section intends to describe and clarify the technical aspects of SO-RTI stages and framework currently implemented in AIOLI. Thanks to its flexible architecture, we have been able to easily implement this RTI dedicated pipeline (presented in Figure 5). All of the processes are containerized thanks to Docker (Merkel, 2014) and are triggered by the server as every other AIOLI process pipeline. This allows us to benefit from the scalability, while using regular AIOLI communications to speak with the database and the project storing unit, thus not requiring to build a specific communication layer.  Figure 5. Diagram of the iterative co-registration process for SO-RTI scene The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W9, 2019 8th Intl. Workshop 3D-ARCH "3D Virtual Reconstruction and Visualization of Complex Architectures", 6-8 February 2019, Bergamo, Italy The framework currently requires the upload of raw RTI images and the corresponding LP files (which include calibrated light direction data), to allow to control and automatize the entire process. This includes the RTI encoding, which is implemented by avoiding a user-driven or explicit ball and highlight detection. To simplify the first implementation of the workflow, we focused our experiments on this kind of scenario. In the future, a metadata management system (i.e. group by EXIF) will help to define the best strategy for incremental processing using the optimum or combined BBA/SR approach. The main difficulty is the integration of unprocessed Highlight-Based RTI but thanks to the flexibility of the Relight library in term of supported fitters (Ponchio et al., 2018a), importing processed data in RTI or PTM format will be an easy task. For further information on updates, next features releases or conditions to join AIOLI's the beta-testing program visit the website (http:www.aioli.cloud/) The first results show a robust and versatile co-registration process but remains perfectible and still has to be tested on: i) reproducibility, complex dataset (scaling, spectral or temporal gaps and numbers of iterations); and ii) accuracy, checking by a necessary shift from relative to absolute coordinates with known or measured positions. Not surprisingly, the best results are obtained with ad-hoc combined acquisition or at least with fixed set-up contexts, diminishing global overlapping and matching issues. The 2D matching approach seems robust enough even considering some spatial or resolution gaps (e.g. distance, focal or sensor) solved by an automatic increase of image resolution to obtain optimize tie-points extraction used for both methods of spatial registration. The BBA (Figure 2) shows decent residual error (below 1px) but it is still perfectible in computation time. As the automatic dense image matching is performed in medium density (i.e. 1pt for 16px) the alignment is therefore accurate enough to satisfy visualization and annotation requirements. The results of the second SR method, applied in case of failure of BBA, shows variable and perfectible results. The latter shows a "depth inconsistency", meaning that focal and/or object distance is incorrectly approximated while the orientation point of view correspond to a consistent homographic projection (see Figure  6). As for BBA the correctness of SR registration is not related to number of keypoints used for resection but more their redundancy and spatial distribution. Obviously, fewer are the key-points and their multiplicity, higher will be the uncertainty of the calibration and orientation computed. Solutions to overpass this issue will focus our future works as other approaches (see Figure 7) and algorithms (Ke and Roumeliotis, 2017) could improve the spatial registration of rebellious resources (i.e. RTI or others).

CONCLUSION AND PERSPECTIVES
Multimodal acquisition (intended as the production of not only images and 3D data, but also of other advanced types of data like multispectral images, HDR, RTI, etc) is becoming a standard in the inspection and diagnostic analysis of works of art (being them small objects or an architectural environments) (Pamart et al., 2016). Nevertheless, there is a lack of tools that provide visualization and interaction modalities for this plurality of data. The presented work is a first step towards this objective, since an advanced type of acquisition (RTI images) is fully integrated with usual 2D and 3D data. The current implementation of SO-RTI already participate in blurring the lines between acquisition modalities. This paper presented a first effort in the direction of integration of different types of media in the context of a collaborative web platform. The integration has been tested on some real cases, showing how RTI can be easily integrated and help the analysis and annotation work by enabling the transfer of quantitative and qualitative properties. From this first outcome, there are two main research directions toward a so called high-level data fusion (Ramos and Remondino, 2015) that could start from the promising results presented in this paper: • The seamless integration of other multimodal data. In principle, any advanced image that could be encoded with an interpolation (i.e. time lapse images) could be directly included in the Relight library. Another challenge is represented by hyper-spectral images, which are becoming more and more important in multimodal acquisitions. In this case, similar issues to RTI have to be taken into account: coregistration with RGB images, data compression, interaction in the context of Aioli, and seamless remote navigation.
• A further direct contribution of RTI images in the registration and geometry calculation process. As already shown in previous works (Dellepiane et al., 2006), the estimated normals that can be extracted from RTI are coherent with the real shape of the object, but not accurate enough. If accurate calibration of acquisition device is performed (with a better estimation of light direction, uneven light coverage, white balance), moving more towards a photometric stereo approach, the extracted normals might be more accurate that the ones extracted by CRP. Hence, the point cloud can be "refined" in a second photogrammetric iteration (Giang et al., 2017), helping in having a more accurate representation and re-projection of annotations. Reversely, the RTI could also benefit of the photogrammetric fusion, from a metric point of view, with the foreseen orthorectified or orthomosaic RTI with our on-going experimentations (Fig. 13). The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W9, 2019 8th Intl. Workshop 3D-ARCH "3D Virtual Reconstruction and Visualization of Complex Architectures", 6-8 February 2019, Bergamo, Italy Convergent CRP with 50mm of Iter0, Close-up CRP with 60mm macro of Iter1 and two RTI with 24mm and 60mm registered with BBA Figure 9. Multi-focal CRP blocks and RTI views oriented with BBA on Uffizi dataset.
CRP and RTI both with 35mm lens Figure 10. Successful low spatial overlapping co-registration on low-textured Vasarely dataset.
CRP and RTI both with 60mm macro lens Figure 11. Correct SO-RTI achieved with ad-hoc acquisition of Manuscript dataset.
CRP with iPad and 2 RTI with Nikon D800E+60mm macro lens Figure 12. Consistent BBA based registration of two RTI acquisitions on low-resolution overlapping of iSkull dataset. Figure 13. Experiments on ortorectified RTI made on Uffizi case-study.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W9, 2019 8th Intl. Workshop 3D-ARCH "3D Virtual Reconstruction and Visualization of Complex Architectures", 6-8 February 2019, Bergamo, Italy Convergent CRP with 35mm of Iter0, Orthomosaic CRP with 60mm macro of Iter1, 5 next iterations for BBA and RS based RTI's registration (highlight areas