UNDERWATER PHOTOGRAMMETRY: POTENTIALITIES AND PROBLEMS RESULTS OF THE BENCHMARK SESSION OF THE 2019 SIFET CONGRESS

Benchmark sessions are an efficient tool for sharing best practices as regards surveying, data processing and presentation of results within a scientific community. The Italian Society of Photogrammetry and Topography (SIFET) introduced this particular type of session in its annual national conference starting in 2016. This article reports some considerations that emerged from the positive experience of the 2019 benchmark sessions held in Venice on the underwater photogrammetry topic. In addition to some interesting results, the advantages of analyzing the results obtained by different and heterogeneous groups, as regards training and software and hardware tools exploited, starting from the same dataset are highlighted.


INTRODUCTION
The promotion and testing of innovative methodologies must be one of the main purposes of every scientific society -both national and international -in order to assess the potentials of new systems, highlighting their problems, identifying guidelines and developing good practices. The benchmark is one of the tools that has proven to be the most suitable for these purposes. Having identified an area of analysis, this methodology proposes a unique dataset, which is then processed by experts in the sector in order to scientifically investigate and test different techniques and methods of data processing. In the Geomatics sector, this system is now well established and there are many examples of benchmarks, also proposed by the International Society of Photogrammetry and Remote Sensing (ISPRS) itself. In this regard, see, for example, ISPRS Test Project on Urban Classification, 3D Building Reconstruction and Semantic Labeling (wg4, 2018), ISPRS / EuroSDR Benchmark for Multi-Platform Photogrammetry (Nex et al., 2015;Gerke et al., 2016), Benchmark on High Density Aerial Image Matching (Cavegn et al., 2014). For years, as a national landmark in the Geomatics world, the International Society of Photogrammetry and Topography (hereafter SIFET) has been offering a benchmark on various issues as part of its annual National Conference. In fact, SIFET is a free association of scholars, technicians, public and private organisations related to the acquisition, processing, management and dissemination of spatial information. Particular attention is given to photogrammetric, topographical and geodetics methodologies and technologies related to these processes. Purpose of SIFET is the promotion, care, representation and dissemination of these methods and techniques. The first two SIFET benchmark editions focused on the photogrammetric use of images acquired by drone, which at the * Corresponding author time represented an innovative and promising methodology. In the first edition ("On the use of UAV images for 3D reconstruction: a joint experience among users", Mancini et al., 2016) it focused on the comparison between "professional" and "low cost" systems, with the problems related to the use of amateur and fish eye cameras. The second one ("Photogrammetry with oblique images by Unmanned Aerial Vehicles: potentialities and problems"; Piras et al., 2017) focused on the theme of oblique images and the consequent variations of photographic scale -and therefore GSD (Ground Sample Distance) -which they introduce. Since 2018, the SIFET Scientific Committee has decided to change focus and the third edition of the benchmark ("From point clouds to 3D / HBIM models -potentialities and problems") focused on the creation of 3D / BIM models from two sets of point clouds: one obtained from a UAV survey; the other one from terrestrial laser scanning (Scianna et al., 2018). Essentially, some datasets were provided to users that could process them choosing freely the methodology and software. The required products were explicitly stated in lists, specifying the typology and file format of the results, which were then collected and compared. For the analysis of the data received, the benchmark working group used point clouds obtained by laser scanning, which constituted the reference data to quantify the level of accuracy achieved. This kind of tests turned towards scholars or researchers, in particular, with good experience in the sector, whose knowledge and skills allowed to get to scientific validation by comparing the strategies they adopted. However, benchmarks also acted as a promotional medium. In fact, they stimulated the interest of professionals in the sector towards a particular theme, bringing to light unknown aspects. Here, we will discuss the results of the fourth benchmark edition, held on the occasion of the 64th SIFET National Conference in Venice (2019), in which the theme of underwater archaeology was analysed. Underwater surveys require different techniques respect to the terrestrial survey, also due to the fact that the application of active range-based sensors, e.g. like laser scanners, is nowadays still under development, especially for sites of large dimensions. Documentation and survey of archaeological sites, both underwater and terrestrial, have gone through an evident and clear change in the last two decades and photogrammetric technique is the most used documentation technique for surveying submerged sites, since it integrates speed of execution, good metric precision and cost-effectiveness (McCarthy et al., 2014;Demesticha et al., 2014;Diamanti et al., 2015;Drap et al., 2015;Yamafune et al., 2016;Agrafiotis et al., 2017). This technique has replaced direct survey (traditional trilateration measurements), which is used today only for measuring GCPs (Ground Control Points) in order to obtain their coordinates and allow scaling and referencing the virtual model in the correct position (Costa et al., 2015). As much as every other photogrammetric application, also underwater has taken advantages from the development of algorithms, methodologies and software products related to Structure from Motion (SfM) and Multi View Stereo (MVS) techniques (Remondino et al., 2012, Remondino et al., 2015Troisi et al., 2015). Compared to the terrestrial one, underwater photogrammetry offers peculiar aspects of research and development of good practices, both in the GCPs topographical survey and frames acquisition, and in the processing phase for obtaining the photogrammetric model. The benchmark session, whose results are reported in this article, allowed to investigate some of these aspects.

The original dataset
The dataset employed by the benchmark participants refers to a survey carried out in 2014 by Ca' Foscari University in collaboration with Soprintendenza del Mare di Palermo. The archaeological site has been the third step of the projects "The marble routes", conducted from 2013 to 2019 on 11 eleven marble cargos along the cost of Italy. The Marzamemi I shipwreck was discovered in the 1958 by fisherman and was investigated in the 1959 by Kapitaen and Gargallo. It lies on a rocky bottom seven meters deep and has been dated to the III century AD thanks to the discovery of amphoras of the Kapitaen I and II type (Balletti et al., 2016;Beltrame et al., 2018). Marzamemi I site has been chosen for its characteristics; it consists of 14 blocks scattered on a big area, but the main cluster is composed by three columns, three large square blocks, four little parallelepiped blocks and one big irregular block, arranged on a seabed from 5 to 7 m of depth covering an area of 18 x 10 m ( Figure 1). This small area permits to employ a reasonable number of images that has been used for quick and different analyses.

Figure 1. Squared blocks of the site
The marble blocks and the rocks of the bottom present the same texture and conditions due to the prolificated algae on the surfaces and the blocks must be clean by vegetation and concretions to better stand out from the bottom. Then, the site has been equipped with B/W numbered targets fixed on the upper side of some blocks for topographical survey, which has been computed with a 3D network using the "Direct Survey Method" technique (Rule, 1989), given the impossibility to employ electronic instrument. The data were processed with Site Surveyor software to create xyz coordinates of the targets ( Figure  2). The photogrammetric survey has been performed with a Nikon D700 with a fixed 20 mm lens and a hemispherical dome. Due to the difficulty on maintaining a static position during the diving phase for the acquisition of the frames, it was necessary to set up the ISO value to 1600, obtaining a F-stops from 6.3 to 13 and a shutter from 1/250 to 1/640 sec. This set up led to a higher noise in the acquired images in comparison with traditional terrestrial acquisitions. The dimension of the images is 4256x2832 and the pixel size is 8.46 x 8.46 μm; 221 frames were acquired through 9 nadiral strips and another 102 frames with 2 sets of radial strips at 45° and 90° around the blocks to better record the lateral surface. The images have been shot at 3.45 m from the bottom; consequently, the ground resolution is 1.13 mm/pix. To remove the blue dominant -characteristic of underwater images -a white balance was performed directly underwater with a medium grey panel. The images have been elaborated with Agisoft Photoscan and the coordinates obtained with the trilateration were inserted in correspondence of the targets and were employed both to rototranslate the model in the right position and to check the accuracy of the model.

The delivered dataset
For the participant to the benchmark, a folder with all the images has been prepared and, in addition to the photogrammetric survey, our commission have provided the coordinates of 5 GCPs surveyed by trilateration and the eidotype of the marble blocks with the arrangement of the GCPs. Here, we have also indicated some Check Points (CPs), whose coordinates were not communicated to the participants (Figure 3). As indicated by the benchmark committee, the works delivered and analysed include the following products:  Descriptive report of the employed methodology, software, hardware and processing times;  Estimation of the inner orientation and parameters of the lens distortion;  Estimation of the coordinates of the CPs indicated in the eidotype;  Dense point cloud;  Orthophoto at 1:50 scale;  Digital Elevation Model (DEM);  Section of the model along an indicated interval.
The proposed dataset has been distributed to the 14 participants that requested it. Among these 14 requests, 9 participants have delivered the mandatory products that have been the object of further analyses from the benchmark's committee. The 9 participants presented mainly three different affiliations: 6 are university members, 2 are from research institutes and 1 is a professional. In order to improve result readability, given their analytical, rather than aggregated, form, each workgroup is labelled according the "Gi" format, with i = workgroup number. Most workgroups used Agisoft's Metashape, with the exception of G2 and G7, who used MicMac and 3DFlow's 3DF Zephyr Aerial. Workgroups G7 and G9 adopted different processing strategies, whose results have been analysed separately. Workgroup G7 developed projects G7a using only nadir images and G7b using both nadir and radial images. Workgroup G9 developed projects G9a by processing their dataset using default settings and "as-is" images, and G9b by applying Lab Color Correction (LAB) image pretreatment filters with the tool image-enhancement (iMAREculture).

METHODS
Several analyses have been conducted to investigate the products provided by the participants to the benchmark session. The analysis of the delivered products is particularly interesting as it underlines how methodologies and tools could influence the results. As a first step, our commission has catalogued the various hardware and software used and at the same time the times declared for the various operations, such as bundle adjustment, generation of the dense cloud, the model and the orthophoto. Evaluating the performances of a determined workstation is not an easy task, mainly because several variables can influence the hardware configuration of each machine. To overcome these problems and create a general classification of the employed machines it was decided to use an online database. UserBenchmark 1 is an online service that allows both to perform the benchmark of one's personal computer thanks to a downloadable application or to test the performances of a virtually created machine. The service relays on a database that collects all the information about each hardware component available on the markets and its performances during time. Thanks to the information required by the committee and provided by the participants it was thus possible also the benchmarking of the workstation used during the processing of the provided dataset. The classification provided from this service was thus compared with the processing time provided by the single participant. It is interesting to notice that between the most performing solution and the worst there is a gap in the processing time of around 15 hours. The number of participants is obviously not a reliable sample, and also the settings of the single processing approach are not considered, however, this comparison allowed to underline that despite the previously cited gap, the mean processing time is comparable for all the other participants regardless of the employed hardware configuration. As a second step, we have compared the calibration parameters of the lenses and the estimated coordinates of the CPs. For the comparison of the dense point clouds, DEMs and orthophotos, we chose to use open source solutions. Point clouds have been analysed in the CloudCompare 2 software, while DEMs and Orthoimages have been analysed using the QGIS 3 software. The various data processing methods have generated different points clouds in terms of continuity and completeness, although the limit box had been defined by the commission and therefore it was the same for everyone. The difference is mostly evident at the edges of the model. Consequently, we have chosen to limit the analysis to a rectangular portion in the central part of the model in order to perform a more homogeneous comparison (Figure 4). Considering that DEMs, and consequently also orthoimages, have been generated using the point cloud as reference (or adding an intermediate step generating the mesh), the same rectangular portion has been used also for the analyses of this added value products. As concerns dense point clouds and DEMs, we have calculated the distances between the delivered products and one created processed by the committee, chosen as a reference. As regards point clouds, CloudCompare software provided the tool for calculation of the distance between clouds. Since the comparison between coordinates of the CPs already provided information on difference sign, clouds have been compared by means of the Absolute Distance, calculated by the C2C algorithm using quadratic local modelling. In the case of orthophotos, qualitative assessments were made by analysing the readability of the details, the radiometric quality of the image and its completeness. Finally, we compared the cross-sections generated along the indicated interval. The cross-sections were generated thanks to the terrain profile available form QGIS plugins repository starting from the DEMs submitted by the participants.

RESULTS
The mostly used software was Agisoft Metashape. Some elaborations have been performed with 3DF Zephyr (3DFlow) and with the open source software MicMac. As already reported, the characteristics of the hardware employed were very different and, despite the medium size of the dataset, they have had a profound effect on the time required for processing. Inevitably, they also affected the choice of parameters to be calculated and the complexity of the processing steps.
The results that will be analysed allow interesting considerations regarding the effectiveness of the application of filters to the images before photogrammetric processing and the different results that can be reached by simultaneously processing the nadiral images with the radial ones or by performing separate processing. The analysis of intrinsic orientation parameters and provided distortion coefficients did not show major differences related to software or processing strategies. In order to compare calculated CP coordinates, the differences between those obtained in the reference photogrammetry project (ref) and those obtained by the workgroups (Gi) have been calculated.
dXi =Xref-XGi dYi =Yref-YGi dZi =Zref-ZGi All workgroups, with the exception of G1, consistently achieved results showing the maximum variations for the Z coordinate. Interestingly, as regards G7, variations are smaller for project G7b, which also includes radial images, suggesting a role in tightening the model. Figures 5 to 7 show the representation of 2-D distribution of C2C distances for some particular cases. In Figure 5, the dense cloud obtained by G2 with MicMac obviously has little distance difference from the reference, although it shows wider gaps compared with other workgroups. Figure 6 show that processing of just nadir images for G7a led to deformations compared to the reference model. Finally, Figure 7 compares the reference cloud with G9b, which performed preliminary image filtering.  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)  The different delivered DEMs have thus been analysed inside the QGIS software. The first step consisted in the selection of the apriori determined area of the analyses; the same chosen for points clouds analyses and reported in Figure 4. This operation was thus completed on all the raster products of the participants. The comparison between the participants DEMs and the reference one was achieved simply using the raster calculator menu available on the software. A subtraction between the presented DEMs and the reference one was completed, and the results were thus analysed. An example of this operation is reported in Figure 8. As is possible to notice from the reported images, discrepancies between the reference and analysed data can vary in a range of few centimetres to more than one metre. This is evident on the edges and vertical surfaces of the marble pieces, which reconstruction was driven by the correct orientation and use of the provided set of oblique images. Analyses on the accuracy of the geometrical reconstruction of the surveyed object in the participants' datasets were also deepened thanks to the automatic creation of cross-sections. This operation was achieved thanks to the terrain profile plugin available in QGIS that allows to directly interpolate the DEM and extract continuous profiles. Cross-section position is reported in Figure  9 (above), along with an example of all the sections extracted and compared from the benchmark's committee in Figure 9 (below). This analysis allowed to better underline the information that were already partially visible on the DEMs analyses, providing a better understanding of how the different choices and processing modalities adopted from the participants influenced the accuracy also of the geometrical reconstruction. However, despite some more evident discrepancies among the datasets in some peculiar areas, the sections extracted from the different datasets are confirming a general homogeneity among the results provided. Finally, a qualitative assessment was performed also on the different orthoimages produced and delivered from the participants. In this case the analyses consisted only in a visual inspection of the different datasets, with particular attention to different features of the images. An example of this analysis is reported in Figure 10. In a first step it is interesting to notice how the treatment of the radiometric information, e.g. correction of the blue dominant, is highly affecting the final products (see orthophotograph of G9, Figure 10 -bottom left). This tool is very useful when the images are noise and with a particular color dominant; in this occasion we have evaluated a good enhancement of contrast and of depth value, but on radial images, the blue column of the water over the blocks is still present; the simple way to remove it is the employment of mask on the photogrammetric software or eliminating the blue point cloud before starting the meshing process.

dX [m] dY [m] dZ [m] dX [m] dY [m] dZ [m] dX [m] dY [m] dZ [m] dX [m] dY [m] dZ [m] dX [m] dY [m] dZ
It is also clear that the accuracy with which the DEMs were generated and the approach used for their generation, are highly impacting the final orthoimages. Figure 8. Examples of DEMs comparison. In this case two different datasets have been compared with a reference one. Figure 9. Position of the generated cross-section (above) and example of the sections extracted and compared (below).

CONCLUSIONS
Besides software and hardware, operating methodologies for photogrammetry data processing also and mostly can differ as regards processing strategies. Given a strict theoretical approach, often best processing practices allow to achieve the best possible results from the same dataset. In this view, benchmark sessions have a fundamental role in the definition of such best practices. The specific benchmark, albeit with limited availability of different cases, also highlighted some interesting features. The availability of benchmark dataset concerning the processing of underwater datasets is still limited. For the SIFET it was interesting to propose this kind of dataset for the first time and to evaluate how the participants involved processed the distributed data. The interest related with this topic have raised in the last years, as proven by the creation of the ISPRS WG II/9: Underwater Data Acquisition and Processing. In the national scenario, despite several research groups are working on these topics, there are few opportunities of discussion. The main idea of this benchmark was thus also to create a first step to foster this debate between university, research institutes and professional. The idea was also to create a first occasion of discussion between these different entities and to evaluate how they will deal with this type of data. The result achieved from the different participant were quite similar, but some considerations can be as well reported. It is interesting in first instance to notice that the commercial and opensource solutions used from the participants are performing in a similar way, demonstrating the level of maturity reached from some of the photogrammetric software solution also for the processing of this less conventional dataset.