3D LOW-COST ACQUISITION FOR THE KNOWLEDGE OF CULTURAL HERITAGE: THE CASE STUDY OF THE BUST OF SAN

The creation of three-dimensional models for the cataloguing and documentation of cultural heritage is today an emerging need in the cultural sphere and, above all, for museums. The cultural heritage is still catalogued and documented based on descriptive files assorted of photographic images which, however, fail to outline its spatial richness, possible only through the use of 3D artefacts. The essay aims to propose a methodology of digitalization by low-cost and easy-to-use systems, to be employed even by non-expert survey and photogrammetry’s operators. The case study of the statue of San Nicola da Tolentino, preserved at the Sant’Agostino complex in Bergamo, offered the possibility of a comparison between 3D models acquired with different digitalization tools (professional/action/ amateur cameras and smartphone) and processed by several image-based 3D Reconstruction software and methods.


INTRODUCTION
The use of 3D digital platforms for the documentation, enhancement and communication of cultural heritage is a very recent practice and, perhaps, still not widely spread. Most of the pieces of art in heritage collections are catalogued based on descriptive sheets assorted of photographic images, which, nevertheless, fail to outline their spatial richness, possible only by three-dimensional models. This is both due to the difficulties in designing complex interfaces for the use, sharing and dissemination of 3D data, and in quickly creating virtual models, also accurate and faithful in their geometries and colours. The videogrammetry, if operated with the strict application of a scientific method, even by using not particularly sophisticated instrumentation, offers the possibility of producing in a short time and in a simple way, three-dimensional documentation of the cultural heritage in line with the Italian national standards used for the cataloguing of the cultural patrimony (ICCD) and of adequate quality for online broadcasting. Specifically, low-cost videogrammetry can be useful for the creation of 3D models of those artworks that are often neglected and wrongly defined as 'minor' because considered of lower value, to which investments for digitization are more rarely consecrated. It can also help in highlighting the relationships with the historical context in which this cultural heritage was produced and to which was related, as well as those existing with the physical 'container' (a building, an architectural complex or territorial space) where today it is hosted and preserved. This work intends to compare the results obtained through both different acquisition systems (professional cameras, action cameras, amateur cameras and smartphones) and operating methods employing several image-based 3D reconstruction software (3DFlow Zephyr, Agisoft Metashape and Pix4D Mapper Pro). The objective is to provide a solution that endures for the right balance between a simplified practice and a quality sufficient for the determinations of the virtual museum.

THE BUST OF SAN NICOLA DA TOLENTINO IN THE OLD CHURCH OF SANT'AGOSTINO
The Sant'Agostino complex in Bergamo is an ancient convent, whose foundation dates back to the end of the 13 th century, which is today the seat of the local university. It sticks out on a plateau close to the Venetian Walls, of which it has modified the layout, near the homonymous door on the way to Venice. In the past, this place had a strategic role in the defence of the city because it 'guarded' the easternmost part of the Bergamo fortifications, which was also the weakest one as located at the bottom of the hill and sloped down towards the plain (Cardaci & Versaci, 2016;Cardaci et al, 2019).
The convent was suppressed in the 18 th century following the descent into Italy by Napoleon and the proclamation of the Cisalpine Republic. The patrimony of the Augustinian order was confiscated and the complex transformed into a barracks. The assets of the church -paintings, sacred furnishings, altars -were sold to noble and wealthy families, or donated to other churches (Damiani et al, 2016).
The project of reconstruction of the decorative apparatus of the former church (today, reused as the Aula Magna) on the occasion of the celebrations for the 50 years of the university's history, has started a program of initiatives aimed at enhancing the ancient monastic plexus ( fig. 1). In particular, the program gave life to the setting up of the chapel dedicated to San Nicola da Tolentino, with the relocation of the works of art removed after its desecration in 1798.
The chapel originally housed, in addition to the altars no longer existing, some canvases, including the altarpiece by Gian Giacomo Barbelli (1653) but, mainly, a large statue of the saint robed in a cloth dress. This study has therefore focused on the only sculpture enduring in the chapel (a prized piece of art by Giovanni Antonio Sanz da Bergamo), although today only the bust remains. It consists of a carved and painted wooden head, with large eyes in the coloured glass paste, resting on a papier-mâché bust supported by a wooden pedestal. In 1798, the sculpture was transferred to the church of Sant'Andrea to be forgotten in the basement after the construction of the new temple. Rediscovered in 2016, it was restored at the behest of the University of Bergamo and is today displayed in the old church of Sant'Agostino ( fig. 2). The chapel of San Nicola da Tolentino was the occasion to experiment a methodological practice aimed at the realization of virtual artefacts to be used within an open and publicly accessible database. A first step for the construction of a virtual museum destined to collect the contents of a systematic and multidisciplinary study -partly already underway -on the former convent, the fulcrum of the city's cultural identity. The use of today's reality capture technologies for the digital reconstruction of pieces of art, the creation of interactive teaching models, computerized films and animations, as well as the use of AR (augmented reality) to recreate objects in every place, are tools that can allow to reach an ever wider public (Cardaci et al, 2018).

METHODOLOGICAL PRACTICE
The survey of the sculpture was preceded by the careful planning of the acquisition phases, performed with the use of active and passive sensors. The bust was digitized employing 3D laser scanning instruments, to create an accurate reference model for the comparison of various photogrammetric artefacts -obtained through pictures made both with professional and amateur cameras -so reconstructing multiple 3D image-based virtual prototypes having different complexity and precision (Bolognesi et al, 2015;Brilakis et al., 2011). The survey project made it possible to find solutions to the problems linked to the 'measurement site', not a geomatic laboratory but the redone chapel of the ancient church. It has intact bound 3D laser scanning acquisitions, given the impossibility to remove or move the statue from its support, and complicated the photogrammetric sockets for the multiple light conditions. The activities carried out directly inside the new Aula Magna, in the times and conditions dictated by the room use, therefore, required a series of particular measures to conclude the measurements quickly so to avoid problems in the experimentation due to unforeseen circumstances. It was possible to perform the survey in just a few hours only thanks to the preparation of a dense network of markers for geo-referencing all the scans into a single reference system. It was decided to compare the models based on common points with known coordinates, without delegating the registration of The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W17, 2019 6th International Workshop LowCost 3D -Sensors, Algorithms, Applications, 2-3 December 2019, Strasbourg, France the acquisitions to ICP algorithms for shape registration. A digital print was made on a rigid support and, in the laboratory, the coordinates of the references were determined by direct, angular and for redundancy polar-distance readings, from a base of known length; this allowed to determine the coordinates of the points with millimetric accuracy. A subsequent response during the execution of the acquisitions allowed assessing the loss of accuracy due to the inevitable deformations of the rigid transport support, the variation of the hygrometric conditions, the weight of the statue. They were, however negligible and contained in the order of instrumental precision.
The markers, different in shape and layout so that they could be automatically recognized by the different software, have made up the GCP (Ground Control Point) and the GCC (Ground Control Constraint) of the system. In particular, the GPCs, four universal collimated manually markers, have allowed both the correct georeferencing and the verification of the error; the GCCs have instead been used as 'constraints' of reference for the models within the various photogrammetric software ( fig. 3).
The acquisition phases took place by vertically positioning the bust of San Nicola and moving around with the various instruments, avoiding altering the lighting conditions on the statue, casting shadows, hitting the object and changing its position.

Laser scanning data processing
The survey began with the acquisition of the San Nicola da Tolentino bust with a TLS phase difference instrument. Although less efficient and accurate than structured-light laser scanners -more appropriate to the survey of sculptures -it nevertheless provided a model of sufficient quality for the experimentation. Specifically, sixteen scans were performed by a Faro© laser scanner, positioned at a distance of a few meters from the object. Scans were done radially (8 + 8 scans) at double altitude, to maximize the coverage of the interested surface. The resolution of the instrument, set in a value ½ (half of the maximum precision), ensured an average distance between the points of less than 1 mm; the 'x-control' value was set to have a strong noise reduction. The point cloud registration was performed after recording the individual scans operated by using the proprietary software. A first scans' approach was performed through a pre-registration based on the targets to which the cloud cleaning followed.
In particular, all the points extraneous to the bust and pedestal ge-ometry (portions of walls, pavement, ceiling, etc.) were eliminated; the overlapping of areas common to several scans employing ICP algorithms was improved.
This gave a cloud of over 400,000 points, which was transformed into a continuous mesh model by the subsequent processing in WRAP. A qualitative evaluation showed a noticeable noise of the dark parts of the papier-mâché bust, not present on the coloured wooden face; this suggested applying a dedicated noise reduction filter for these parts only ( fig. 4).The mesh gotten was then subjected to the elimination of the spikes and the closure of the gaps.

The photogrammetric and videogrammetric acquisition
The bust of San Nicola da Tolentino was, therefore, the subject of a photographic campaign carried out first with a Canon EOS 5D Mark II professional camera (with Canon 24 mm fixed lens -f 1.4), then with a GoPro Hero 4 action camera and, finally, with an Apple I-Phone 7. In all cases, pictures were made by moving the instruments on an ideal sphere of constant radius and by directing the optic towards the centre of the statue. The first photographic survey campaign, made with professional instrumentation, followed a rigorous practice to obtain a photogrammetric 3D model of high metric and chromatic quality (Verhoeven, 2016;Torresani & Remondino, 2019). Photos were preceded by the calculation of both the depth of field and the hyperlocal distance, the control of the background noise and the compensation of the colour temperature of the ambient light by colour checker (fig. 5).
The shots were taken in the AP (Aperture Priority) mode, setting a closed aperture and obtaining -depending on the amount of light measured by the camera's exposure meter -the shutter release time. The starbuster effects are the result of light diffraction. Diffraction is the slightest bending of light into your room through a small opening, i.e. a small opening at a low focal length, it looks around the edges of the blades and creates the 'star' look. Therefore, particular care was required in the choice of the diaphragm, set at f/16 (and no longer closed as f/22 and f/32) to reduce the 'star' effect (Liu, 2015) but with a limitation of the depth of field that has, in part, conditioned the sockets. A set of 5 images was acquired with different exposure for the production of HDR images. The ability of High Dynamic Range (HDR) to record the full range of lighting in a scene has led  The acquisitions with action camera and smartphone have been very fast and carried out in automatic mode (the only one foreseen for these devices). As already mentioned, even in this case photos were taken 'freehand' by moving the instruments around the statue. At the same time, video footage was also taken both in 'direct connection' and with the use of a DJI Osmo Mobile triaxial gimbal stabilizer to create a smoother movie without the vibrations caused by the operator (fig. 6). Both cameras have been set to the maximum resolution possible (width 1920 pixels -height 1080 pixels) and a recording speed of 30 frames per second. All the precautions have been taken in order to maintain a stable and constant grip, an optimal framing and focus (both of the subject and of the GCPs) and to respect as much as possible, both the acquisition paths (made in a circular manner) at different heights and the duration of the shot, chosen for each video of around 100 seconds.

The photogrammetric data processing
The image-based 3D reconstruction technique, born from the combination of computer research in the field of computer vision and traditional photogrammetry, has allowed extracting metric information from the photographic images automatically, i.e. without the need for the operator to collimate the points (Guidi et al, 2015). The automatic photogrammetry processing (simplified term to indicate the process, sometimes improperly called photomodeling) takes place through four successive phases: • Structure-from-motion (SfM) and Multiview Stereo Reconstruction (MVS): the geometry of the camera poses is reconstructed and a first scatter cloud of points is processed; • Dense Point Cloud: the images are processed once again with the SfM and MVS algorithms indicated above conside-

The key-frame extraction and selection from videos
The opportunity to easily extract frames from a video is now supported by the majority of photogrammetry software. In most cases, however, the process is performed automatically through the choose of the number of images to be extracted in the unit of time (Alsadik et al, 2015, Xu et al., 2016. This prevents to eliminate similar or poor quality images, which add nothing to the creation of the 3D model. One of the opportunities offered by one of the three tested software (3DFlow Zephyr) is to do this in an 'intelligent' way; the algorithm first extracts all the frames, then deletes a part of them according to filtering based on similarity. The algorithm was found to be much more effective than temporal sampling as it was able to eliminate many frames when the camera was stopped or was moving very slowly, in smaller numbers when it moved faster. A series of tests carried out on various 40-50 second duration films (consisting of about 1000 frames) showed how the algorithm works by setting the similarity parameter with a value between 30 and 40. Below the value 25, no filtering occurs while, for values above 50, it preserves less than 10% of the frames. Particular attention was paid to the quality of the images acquired ( fig. 8).

Comparison of point clouds
The extraction of the frames from video precedes the photogrammetric workflow, for this reason, a poor input can influence alignment results badly. The selection the photos ensures that blurred and redundant frames, having probably a few key points between each other, are discarded. The image quality evaluation influences the quality of frames extract. The test carried out with action camera and smartphone videos (with and without gimbal stabilizer) gave the results shown in the following picture ( fig. 9). The points in the graph represent the image quality value. It appears that the smartphone stabilized acquisition produced the best results. The test confirmed that the action camera made lower quality data (high noise value) than the smartphone, also with the electronic stabilizer. Video images were of poorer quality than images acquired with a professional instrumentation, however, sufficient for low-cost photogrammetric processing. Visive qua-  The wrapping process was carried out using 3D Sistems Geomagic Wrap software in standard conditions (no noise filter) for each of the 18 models created, as well as for the laser-scanning one, to avoid possible filtering and closing operations holes that SFM software could have applied to point clouds. Each mesh was exported in .ply format and was compared with the laser-scanning reference model to assess their quality and highlight the qualitative and quantitative trend of geometric differences. The maps created using the CloudCompare software show the substantial differences of the models, both when the acquisition technique changes and when the software used to create them varies ( fig. 12-13). In particular, we observed the remarkable variability of the number of reconstructed points. The comparison of histograms indicates averagely reliable results in models with statically acquired images. The HDR technique provided a fairly benefit in terms of the number of points created in processing with 3DFlow Zephyr, while it was quite irrelevant in other cases. However, it was always favourable from the qualitative point of view of the texture, highlighting the details within a wider chromatic range, globally supporting its comprehension. The goodness of the videogrammetric results is much lower. In particular, all the different techniques showed remarkable differences that frequently exceeded 10 mm compared to the reference model, a considerable value in the field of cultural heritage valorisation.
From the tests carried out, it was possible to deduce that optical stabilization using a low-cost triaxial gimbal in certain cases can offer considerable advantages as it reduces the micro-moved effect of the frames, while it may be irrelevant or negative in other cases, for example in the case of acquisition made with actioncam. The Pix4D Mapper Pro software was not able to create the dense point cloud of the three videogrammetric models with the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W17, 2019 6th International Workshop LowCost 3D -Sensors, Algorithms, Applications, 2-3 December 2019, Strasbourg, France lowest frame quality, while it proved to be valid in the case of high-quality images. An additional comparative test between the results of a complete modelling with the examined software (from SFM to texturized mesh) has finally shown that, although the creation of the point cloud with Agisoft Metashape is generally very reliable and dense, the algorithm for creating the 3DFlow Zephyr mesh offers the most reliable result ( fig. 14-15).

CONCLUSIONS
The quality of video captures, while providing a lower resolution, guarantees a more fluid overlap of frames and a quick acquisition process, so making some data sets much easier to obtain. However, the qualitative level of the model based on the traditional photogrammetric technique conducted in a rigorous way is not at present obtainable. It was, thus, possible to deduce that the HDR technique, if correctly applied, can offer significant advantages in the creation of 3D models. The grade of accuracy clearly lower than in photogrammetric cases is surely influenced by the particular low-cost characteristic of the instrumentation used. The quality of the video frames is not comparable to SLR images due to the quality of the equipment but also to certain physical limits, for example, the sensor size, the movement, the video noise produced by small sensors. However, the video acquisition phase took few minutes, compared to much hours of the SLR acquisition. This aspect offers food for thought and opens up new possibilities in a time to come in which technological development will likely extend the market increasingly more efficient and compact sensors at a lower cost, besides producing the possibility of being able to detect assets so far found with less advanced techniques.