IMAGE PRE-PROCESSING STRATEGIES FOR ENHANCING PHOTOGRAMMETRIC 3D RECONSTRUCTION OF UNDERWATER SHIPWRECK DATASETS

Although underwater photogrammetry has become widely adopted, there are still significant unresolved issues that are worthy of attention. This article focuses on the 3D model generation of underwater shipwrecks and intends explicitly to address the problem of dealing with sub-optimal datasets. Even if the definition of best practices and standards to be adopted during the acquisition phase appears to be crucial, there is a massive amount of data gathered so far by professionals and the scientific community all over the world that cannot be ignored. The compelling idea is to attempt to achieve the best reconstruction results possible, even from sub-optimal or less-than-ideal image datasets. This work focuses on the investigation of different strategies and approaches for balancing the quality of the photogrammetric products, without neglecting their reliability concerning the surveyed object. The case study of this research is the Mandalay MHT, a 34 m long steel-hulled auxiliary schooner that sank in 1966 and now lies in the Biscayne National Park (Florida USA). The dataset has been provided by the Submerged Resources Center (SRC) of the US National Park Service, in order to develop an experimental image enhancement method functional to the virtualization and visualization of the generated products, as a part of a sustainable, affordable, and reliable method of studying submerged artefacts and sites. The original images have been processed using different image enhancement approaches, and the outputs have been compared and analysed.


INTRODUCTION
Underwater environments have long received the attention of the scientific community across disciplines. With more than 70% of our planet's surface covered with water, the marine space is still largely unexplored in many respects. The importance of studying and preserving the underwater heritage is stated in three important international conventions and charters: the 1982 UNCLOS (United Nations Convention on the Law of the Sea) 1 , the 1996 ICOMOS (International Council on Monuments and Sites) Charter on the protection and management of Underwater Cultural Heritage 2 , and the 2001 UNESCO (United Nations Educational, Scientific and Cultural Organization) Convention on the Protection of the Underwater Cultural Heritage 3 . The 2001 Convention recommends considering first to protect ancient shipwrecks or submerged archaeological sites in situ before considering recovery. In marine archaeology, in fact, many wooden or metallic artefacts are found in a different state of conservation depending on the environment in which they are discovered (Bandiera et al., 2013), and their recovery is not always the best strategy for pursuing their preservation and conservation. Therefore, it is more and more important to consider advanced technologies, such as photogrammetry and rapid mapping, for the documentation and the virtualization of underwater CH (Cultural Heritage) for dissemination and visualization purposes. Underwater exploration is, essentially, interdisciplinary, and thus requires a strong collaboration between researchers in different fields (e.g. geology, biology, archaeology, engineering, geomatics). Due to the remoteness and the limited access of a typical underwater archaeological site, it is crucial to adopt 3D metric techniques for a correct and complete recording of the site itself and all its elements. (Rissolo et al., 2016). Geomatics techniques can nowadays provide a wide range of tools and solutions for monitoring and documenting marine assets . Consistent with emerging documentation requirements, the study of underwater CH via Geomatics employs Computer Vision techniquessuch as Structure-from-Motion (SfM) photogrammetrywhich can be used for the remote or indirect study of inaccessible CH sites by domain experts. Even if a precise survey is important for underwater archaeology, the survey itself must link the archaeological knowledge to the surveyed geometry. So not only computer science must be involved, but it is always fundamental to insert archaeological knowledge in the process (Drap et al., 2013). It is also important to provide access to unreachable underwater CH not only to scientists and researchers but also to the wider public, by using immersive technologies and virtual visits (Skarlatos et al., 2016). The growing number of applications in underwater photogrammetry in recent years allows one to virtually reconstruct the seafloor, shipwrecks, submerged structures and infrastructures, and therefore enables the study of marine environments and their submerged contents without divers or vehicles in the water. It is possible to identify three main topics regarding the issues connected to photogrammetric applications in underwater environments. A first point concerns the generation of the 3D models and the related metric products, which is related to best practices and standards adopted during the acquisition phase (e.g. overlap, camera calibration, radiometric correction, integration with data from different sources). A second point is related to the correct georeferentiation of the generated 3D model and the related metric products, due to the difficulty (or impossibility) of relying on GNSS navigation systems when underwater. Last but not least, a third topic is related to dissemination and management: certain visualization platforms offer the possibility of interacting with the model, with the option of generating geometric sections and sharing models in ways that facilitate communication within or between different user groups. This article focuses on the first of the abovementioned points (3D model generation) and specifically intends to address the problem of dealing with sub-optimal datasets. Although underwater photogrammetry has become widely adopted, there are still significant unresolved issuesespecially those related to the acquisition and the processing phases of underwater imagingthat are worthy of attention. First of all, the documentation itself is not sufficient and must be supported by careful georeferencing strategies of a topographical survey, and its consistency through the use of the same reference system (Balletti et al., 2015). Moreover, due to the need for generating an accurate virtual 3D replica or twin of the surveyed object or site, it is necessary to address specific issues, such as preserving consistent radiometry, avoiding blurry and low-contrast or over/under-exposed images, and the like. Even if the definition of best practices and standards to be adopted during the acquisition phase appear to be crucial, there is a massive amount of data gathered so far by professionals and the scientific community all over the world that cannot be discarded. The compelling idea is to attempt to achieve the best reconstruction results possible, even from sub-optimal (turbid, blurry, low contrast) datasets or less-than-ideal acquisition strategies. This work focuses on the investigation of different strategies and approaches for balancing the quality of the geometric and the radiometric component of the photogrammetric products, without neglecting their reliability concerning the surveyed object. Different methods have been therefore tested to solve this issue in a comprehensive pre-processing approach. The resulting enhanced images have been processed using different commercial/free software, and the outputs have been compared and analysed. The abovementioned procedures have been tested using a shipwreck dataset from the Biscayne National Park (Florida) provided by the Submerged Resources Center (SRC) of the US National Park Service. One of the final aims of this work is to provide a correct strategy and a useful pipeline for obtaining metrically controlled and radiometrically consistent products, aiming at the virtualization and visualization of the generated products (both point clouds and textured meshes). Those mentioned above can be used in online repositories and for VR and AR tours, providing a sustainable, affordable, and reliable way of studying submerged artefacts and sites.

DATA ACQUISITION
The case study of this research is Mandalay MHT (Figure 1) , refitted, and renamed as Mandalay in 1965, for being used as a luxury cruise ship. The vessel ultimately sank on Long Reef at the end of a 10-day Bahamian cruise, the 1 st of January 1966. The Mandalay now rests in very shallow water (maximum depth of 6 meters), and it is an outstanding snorkelling underwater site located in Biscayne National Park (25º 26.530 N, 80º 7.301 W); Biscayne was established as a National Monument in 1968 and designated as a National Park in 1980. The park is dedicated to the public enjoyment and preservation of cultural and natural resources, the protection of a rare combination of terrestrial and undersea life, and the preservation of a scenic subtropical setting 4 . The data presented in this research have been acquired using the SeaArray (Figure 2), a diver operated tri-camera photogrammetric system designed by Marine Imaging Technologies in partnership with the National Park Service Submerged Resources Center. The current version of the concept (V4) uses three Nikon Z7 Mirrorless digital cameras ( Figure 3 and Table 1) to capture 45,7 MP images. Figure 2. The SeaArray V4, a diver-operated tri-camera photogrammetry system employed in this research. In this article, the front camera is referred to as SA2, while the rear ones are respectively SA1 (left side) and SA3 (right side).
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)  Fixed aperture and focus was achieved using a 14mm Rokion Cine DS Lensing; custom housings, chassis and DPV mount were engineered by Marine Imaging Technologies. Each camera is always placed in the same position during all the acquisitions. The camera housings have been custom designed ad-hoc for the Nikon Z7 mirrorless cameras, with a Nikon FTZ adaptor and a zen 170 mm glass dome port. Since in a preliminary setup of the system a severe edge softness was noticed, the system was refitted with zen 230 mm glass dome ports to improve edge sharpness, also converting to a native Z-mount. For assigning a correct scale to the reconstructed 3D model, a total of 3 scale bars have been used ( Figure 4). The length of the two outer calibration target of each scale bar is 0,842 m ± 0,001 m.  Table  2. Images were acquired at an average distance of 3 m from the object using the SeaArray system ( Figure 5).

Camera Orientation Nadiral
Oblique SA1  2385  180  SA2  2347  211  SA3  2356  197  TOTAL  7088  588  Table 2. Description of the Mandalay dataset with the number of images acquired from each camera.

Camera Calibration
Even if it cannot be correctly considered a pre-processing step, camera calibration is essential for the accurate estimation of the image locations and dimensions in the object space (Shortis, 2015); in underwater photogrammetry, it is crucial to take into account the effects introduced by the water medium and the camera housings. During the camera calibration and image orientation step, it is crucial to verify the consistency of the focal length, use a separate calibration certificate for each of the cameras, and opportunely place the scale bars or the measured markers around the scene. If possible, one should perform a precalibration out of the water and underwater to compensate for the distortion introduced by the medium. As well, one should keep cameras in the same position during checkerboard and object acquisition in order to avoid change in their external assets.

Single-camera calibration
In this study, a pre-calibration approach has been followed using a 120x65 cm checkerboard test pattern 5 , in order to estimate, for each of the cameras, a preliminary set of intrinsics (F, Cx, Cy, K1, K2, K3, B1, B2, P1, P2) to be used as an initial guess for the subsequent self-calibration to be performed during the BBA (Bundle Block Adjustment). A total of 18 images have been acquired for each camera mounted on the SeaArray; this subdataset was therefore used for calibrating the employed sensors before the actual processing step.

Stereo-camera calibration: fixed baseline approach
Another approach was followed using the app Stereo Camera Calibrator of Matlab R2019a 6 in order to calculate, for the master-slave couples of cameras SA2-SA1 and SA2-SA3 the relative distance to be used as a fixed baseline during the image orientation step ( Figure 6). In doing so, the origin of the coordinate system has been set as the optical centre of camera SA2, and the performed stereo calibration of the pair SA2-SA1 and SA2-SA3 provided the relative R (baseline). This procedure, however, led to some sub-optimal results, both since the employed panel was not big enough to be seen from the considered camera couple while shooting simultaneously, both because it is likely that the cameras were not acquiring images precisely at the same time, and the acquisition of the checkerboard test pattern was made while moving. The estimated offsets of each of the rear cameras (SA1 and SA3, i.e. slaves) from the front one (SA2, i.e. master) has been estimated with an overall mean error of 7,13 pixels, and thus unacceptable to use the resulting offset values for imposing a fixed, known baseline between each camera couple. Another approach has been undertaken via a self-calibration of the same images as a chunk of rigid camera rig using the software Metashape by Agisoft (version 1.5.2 build 7838 for Windows x64) 7 . The software used the relative position of the slave cameras previously calculated (using the calibration checkerboard), adjusting their position in order to minimize the reprojection error. The variance of the estimated position, used as a realistic evaluation of the estimation uncertainty of the recalculated XYZ offset, was too high (Table 3) and therefore the implementation of fixed, known baseline was not adopted in this study.

Image enhancement filters
Underwater images are usually affected by lack of contrast, poor visibility and inconsistency in radiometry. In order to solve this issue, it is possible to run a pre-processing step to use the generated enhanced images only for the tie-point extraction phase. Different enhancement algorithms can improve several aspects of underwater imagery, and some of them have been already benchmarked (Mangeruga et al., 2018). In this study, different filters have been tested; the first image enhancement method has been applied using the ImageFilter module of Pillow 8 (PIL Fork) in the Python environment. The Python Image-processing library provides various image filters including edge enhancement filters implemented using a convolution of a specific kernel onto the image. The two filters employed in this study were DETAIL and EDGE ENHANCE. A third filter employed was the Wallis filter, an adaptive filter (generalized variance-based image enhancement operator) that applies a spatially invariant operator in order to produce an edge crispening and local contrast enhancement (Wallis, 1976). This filter might be useful in images that present both shadow and bright regions 9 . Another of the filters used, embedded in the MicMac 10 free and open-source photogrammetric suite (Rupnik et al., 2017), is the SFS option to augment the tie-point extraction phase. Since some implementations of the SIFT algorithm are not fully invariant to translation and scaling in radiometry, it is possible to lose potential information (as the low contrast areas in the images are assimilated to noise) and some tie-points cannot be extracted.
For performing the first test, a stereo couple has been selected (Figure 7), then each of the previously described filtered has been applied to both left and right image. . The algorithm has been applied based on five variables, the main ones are contrast 12 and brightness 13 (set to 0,50 and 1,00 respectively), which controls the dynamic range (or amount of enhancement) of an image to fit the target values of mean and standard deviation. The other variables are standard deviation 14 , kernel size 15 and average 16 (set to 50, 99 and 127 respectively). The SFS filter has been applied using the software MicMac via the syntax command TestLib 12 The parameter which controls the increase or decrease the amount of enhancement (variance gain). 13 The parameter which controls the degree of brightness forcing. 14 Target value for the intensity standard deviation in the kernel image. 15 Size of the convolution kernel, expressed in pixel and linked to the image size. 16 Target value for the intensity average in the kernel image.
PrepSift. The effect of the application of the algorithms on the stereo couple can be observed in the following Figure 8.  Following the software pipeline, features are first detected both on the left image and the right one using the SIFT (Scale-Invariant Feature Transform) key-point detector and descriptor algorithm (Vedaldi, 2007). The results of the feature detection are detailed in the following Table 4. Then, for matching the features, two approaches were used; a first was performed using the FLANN Matcher 18 (Fast Library for Approximate Nearest Neighbors), that provides a set of algorithms optimized for allowing a fast search of nearest neighbours. When possible, another feature matching method was also tested; the robust matching (based on a brute force approach) that taking the descriptor of one feature in the first set and matching with all other features in the second set using the sum of the absolute value of differences between the descriptor elements. After the matches are computed, they are refined via an LMSE (Least Mean Square Error) approach. Unfortunately for some of the stereo couple (SFS and Wallis), it was not possible to perform the computation. The results of the feature matching were reported in Table 5  However, this preliminary evaluation lacked a ground truth, making it challenging to evaluate which of the analyzed methods is the best for the selected dataset. As the obtained results might work well only for the selected stereo couples but might perform differently in a BBA with several pictures involved (preprocessed with the same filter), feature detection and matching and image alignment has been performed with the software Agisoft Metashape for a sub dataset of 32 images of the mast step. The results were reported in the following  Table 6. Numbers of tie-points and reprojection error at different quality levels of alignment for each of the employed image-enhancement filters.
As it is possible to observe, at a different level of image downscaling (original size; downscaled by a factor of 4; downscaled by a factor of 16) the Wallis filter performed better in comparison with the other applied filters. While in the case of low-quality alignment there is a consistent gain in the number of tie-points matched (comparing the original images with the same images filtered with Wallis), at higher quality alignment the observed gain is not that relevant. Since the tie-points are matched according to the feature spots detected, it might be useful to upscale source images to localize tie-points more accurately. However a higher number of tie-points does not necessarily correspond to a correct alignment (Calantropio et al., 2018); therefore, filters should be used only when consistent results cannot be obtained due to poor quality of the original images, as they might introduce an undesired level of noise (and a higher reprojection error). The exploratory considerations done so far will serve for an extensive benchmarking (cloud-to-cloud, density and roughness, analysis), that can be undertaken for an in-depth assessment of the presented filters. There are however some more issues that can interfere with the correct 3D scene reconstruction and should be considered, especially in the case of shallow or very shallow water, such as methods for the removal of caustics , which are not addressed in this study.

Image masking
Since distortion at the side of the sensor is high, and because the waterproof housing used, there might be blurring effects on the image edges. Those blurring effects must be avoided because they might reduce the quality of the calibration or increase the reprojection error in some part of the sensor. Reducing the area of the images used for key-point extraction and tie-point detection by masking might be a solution to this issue. This procedure could, however, introduce some problems in regards to the minimum necessary overlap required between images; this was easily solved in the Mandalay dataset because of the high redundancy of the acquired data. It is difficult for a scuba diver to follow a pre-defined acquisition trajectory, and operating in an underwater environment might also pose signifiant hazards espescially if complex tasks must be carried out at the same time.
For this reason, the presented procedure is a viable solution assuming that acquiring more than the essential number of images might be a way to solve less robust, poorly designed, or tricky to achieve acquisition geometries. At first, all the 7.676 images of the Mandalay dataset have been aligned without masking and considering their original size. As a result, only 5.936 on 7.676 images were aligned, with a total amount of 5.359.128 tie-points of 6.818.404 and an RMS reprojection error of 1,70 pix. Since the previous image orientation results were not satisfactory, all the images have been masked considering only their central part, with a circle centred at the centre of the image and a radius of 2300 px (inscribed in a square of 4600x4600 px) as shown in Figure 10. Masked images were aligned after being downscaled by a factor of 4. A total of 6.080 on 7.676 images were aligned, with 5.791.409 tie-points of 6.877.323, and an RMS reprojection error of 0,87 pix. For obtaining the results, presented in the following Table 7. the software Metashape by Agisoft has been used (version 1.5.2 build 7838 for Windows x64) setting the key-point limit at 40.000 and the tie-point limit at 10.000. Analyzing the results, even if the masked images were downsampled, the total number of aligned images, the number of tie-points and projections was slightly higher than the not-masked dataset, processed at full resolution. It is remarkable to notice also that this approach halved the RMSE reprojection error because the outer part of the sensor with higher residuals was not taken into account ( Figure  11). There is, therefore, a considerable improvement of the 22 https://nikonimglib.com/nvnxi/onlinehelp/en/index.html alignment phase, at a lower cost in terms of processing time and computing resources. Different sizes of circular masks were tested, leading to very similar or slightly sub-optimal results in comparison to the analyzed one (4600x4600 pixels), that resulted in being the best compromise in terms of the lower areas of the images discarded, the higher number of images aligned, the higher number of tie-points detected and the lower reprojection error. Figure 10. Circular mask adopted; the darker area will not be used during the calibration of the sensor and the feature detection (key-points) and matching (tie-points) steps.  Table 7. Results after the image alignment and camera selfcalibration for the same dataset for images with and without masks. Figure 11. Comparison of the image residuals on the SA2 camera after the calibration. On the left, image residuals using the full image; on the right, image residuals cropping out the parts close to the edges of the sensor.

Radiometric correction
Because underwater images may suffer from severe chromatic aberration, it is crucial to take into account image pre-processing procedures, in order to enhance the quality of generated 2D and 3D products (Neyer et al., 2019). As it is well known, going deeper, there is a loss of colour associated with the theoretical distance light penetrates underwater, according to its wavelength. However, being that the Mandalay wreck is, at its deepest point, at 6 meters below the surface (with images acquired at about 3 m from the surface), no advanced radiometric correction was performed and the colours were adjusted via white balancing. Images were acquired initially in a RAW format (.NEF) and corrected using the Nikon ViewNX-I 22 software, adjusting the white balance based on the RGB values of a sample white area (5x5 average) selected on the checkerboard (Figure 12). Although the authors are aware that the white colour on the checkerboard cannot be considered "pure white" and may lead to sub-optimal results in the radiometric correction, the acquisition was not meant initially with radiometric correction in mind. As stated in the introduction, this article intends to address the problem of dealing with sub-optimal datasets, and so a calibrated neutral white card and a grey card will be used for white balance and exposure setting in future acquisitions. The RAW images have been then batch-corrected in order to apply the white balancing achieved using the checkerboard sub-dataset to the whole scene. 3D model comparison, without and with white balancing, are presented in Figure 13. The corrected radiometry is also propaedeutic to the generation of orthophotos ( Figure 14 following page), that can support a more rigorous production of archaeological documentation. Figure 12. The employed checkerboard test pattern (50 mm squares -CamAlign -CHB -SXW -V7.1) before (top) and after (bottom) the white balancing. The RGB and the luminance histograms are showed on the top right corner both before and after the correction.

CONCLUSIONS
Underwater objects and heritage need to be also studied by nondiving experts (historians, architects, chemical engineers, preservationists). Because conservation in situ is often the only option, there is a growing need for documenting and mapping archaeological evidence of Underwater CH, such as shipwreck sites of different ages and also submerged coastal villages or cargoes of architectural construction materials that sank during transportation. This article presented an overall analysis of several aspects of pre-processing strategies, proposing different strategies for enhancing photogrammetric 3D reconstruction of underwater shipwreck datasets. The Mandalay dataset was, together with other Biscayne datasets, derived from an early iteration of the SeaArray multi-camera photogrammetry system. The authors experienced some successes, but also some setbacks with the camera array. Not only have the authors designed an adhoc acquisition strategy on this large-scale shipwreck, but there have been issues that will have to be resolved in future studies. One of the possible configurations to test in the future will be to tilt some of the cameras in order to achieve a convergent/oblique multi-camera configuration, in addition to the nadiral asset of the master camera, for a better 3D model reconstruction. The correct estimation of known baselines between cameras will also help to improve the geometric correctness of the 3D models, that up until now relies only on the scale assignment using calibrated scale bars. Different consideration must be taken into account in order to apply this procedure in the future, such as ensuring that the cameras are synchronous, comparing calibration in water and outside water, keeping the variance as low as possible (at least of one order of magnitude), and evaluating if the SeaArray structure can be considered rigid enough to impose fixed baselines. The checkerboard employed was not large enough due to the 14 mm focal length lens used. Optimal conditions would require calibrating the cameras at the same acquisition distance of the object to be surveyed (which is typically 2-4 meters off the bottom). This issue revealed the need for a large calibration grid (that is currently under production for a spring deployment as a baseline for the 2020 year's work). It appears however necessary to assign to the model not only a correct scale but also a coherent georeferentiation; future studies will be devoted to the experimentation of underwater positioning systems, and different calibration procedures will have to undergo further tests, both in and outside water.
Figure 13. 3D model (triangulated mesh) of the Mandalay texturized without using white balancing (top) and after the radiometric correction (bottom). Figure 14. On the left, orthophoto of the Mandalay (70% transparency) superimposed over its hillshade DSM in order to make the 3D features more relevant; On the right, archaeological drawing of the site (Image courtesy of the National Park Service). It is possible to notice several differences between the 2020 orthophoto and the archaeological survey (produced several years before).