EVALUATION OF VISION-BASED LOCALIZATION AND MAPPING TECHNIQUES IN A SUBSEA METROLOGY SCENARIO

Metrology is fundamental in all the applications that require to qualify, verify and validate measured data according to standards or, in other words, to assess their compliance with predefined tolerances. At sea, metrology is commonly associated with the process of measuring underwater structures, mainly pipeline elements widely used in offshore industry. Subsea operations are very expensive; optimizing time and money resources are the core factors driving innovation in the subsea metrology industry. In this study, the authors investigate the use of state-of-art vision-based algorithms, i.e. ORB-SLAM2 and Visual Odometry, as a navigation tool to assist and control a Remotely Operated Vehicle (ROV) while performing subsea metrology operations. In particular, the manuscript will focus on methods for assessing the accuracy of both trajectory and tie points provided by the tested approaches and evaluating whether the preliminary real time reconstruction meets the tolerances defined in typical subsea metrology scenarios.


INTRODUCTION
In subsea metrology, high accuracy 3D measurements are needed to inspect or assist the correct assembling of parts such as pipes, rigs and alike, as well as to reverse engineer and manufacture parts that may need a replacement.Besides the high accuracy requirements (in order for the part to fit the engineering structure it is made for), a very important aspect to be considered is the capability to provide in real time preliminary measurements (such as bending of a pipe, extension of a damaged part, deformation, etc.), and to make sure that all required parts of the object have been acquired by the metrology system (on-line full coverage verification).Other use cases may include the repeated monitoring over time of objects of different nature (organic or manmade) on the seafloor.A localization system that allows the ROV to localize itself in the marine environment, and with respect to the object to be monitored is desired to increase effectiveness of operations.While above the water these tasks can be accomplished in realtime in most situations, guaranteeing an easy revisiting and remeasurement, under-the-water precise localization remains an expensive and complex activity.The lack of a high accuracy global positioning subsea system like the GNSS available abovethe-water, together with the complexity given by the physical environment itself, keep geo or even locally referenced 3D measurements an open issue.Acoustic positioning systems deployed on the seabed in the form of a network of Long BaseLine (LBL) transponders are the current industry standard for navigation, positioning and metrology applications underwater.These systems are expensive not only for the cost of the sensors themselves, but also because they require specialized teams and time consuming installation and initialization procedures before they are ready to be used.Moreover, real-time centimetre accuracy positioning can be obtained only in confined areas, within the network of transponders, thus making a systematic mapping of larger areas time consuming and ineffective.On the other hand, vision-based localization techniques, such as visual odometry and Simultaneous Localization And Mapping (SLAM), are receiving more and more attention because of their significantly lower cost with respect to acoustic positioning methods (Eustice et al., 2008;Kim and Eustice 2009;Duarte et al., 2016, Ferrera et al., 2019).Moreover, open source frameworks are becoming publicly available, making the integration of such technology much easier.Born for real time robot navigation (Hidalgo and Bräunl, 2015), SLAM and visual odometry techniques are being gradually introduced also in mobile mapping and surveying above the water (Nocerino et al., 2017;Lehtola et al., 2017;Tucci et al., 2018).Their use can be seen as a stand-alone alternative solution for limited areas or as an additional technology, integrated with current state-of-the-art navigation technology.

Motivations and aim
In this study the authors investigate the use of vision based real time techniques, namely SLAM and Visual Odometry, as a navigation tool to assist and control a Remotely Operated Vehicle (ROV) while performing inspection and monitoring tasks underwater.In particular, the paper focuses on methods of investigation able to assess the accuracy of both trajectory and 3D tie points used in the image orientation process and evaluate whether the preliminary real time reconstruction meets the tolerances defined in typical subsea metrology surveys.Although the use of real time techniques for underwater navigation and mapping is not recent, few studies have been presented to evaluate the accuracy of such methods in a real environment using a certified ground truth benchmark surveyed with high accuracy techniques.Studies and benchmark datasets for the evaluation of different SLAM techniques exist above the water (Sturm et al., 2012;Geiger et al., 2013).Using such datasets, issues have been reported in the literature such as trajectory or scale drifts reporting also a metric accuracy evaluation (Mur-Artal et al., 2015;Mur-Artal and Tardós, 2017;Yuan et al, 2017).
Under the water, because of the complexity of the environment and thus difficulty to obtain very accurate 3D measurements, comparative studies have been focused on simulated trajectories (Duarte et al., 2016) or on comparing real-time results with photogrammetrically derived trajectories obtained offline, in post processing, using for example BINGO software or Agisoft Photoscan (Drap et al, 2015;Nawaf et al., 2018) or Colmap structure from motion application (Ferrera et al, 2019).A real accuracy assessment of visual odometry and SLAM techniques, evaluated against a ground truth surveyed using an independent and more accurate method is still missing.This study wants to present a first evaluation of the open source implementation ORBSLAM2 Mur-Artal et al (2015) together with a variant of the visual odometry approach developed by the authors (Drap et al, 2015;Nawaf et al., 2018).The COMEX underwater test field is used to provide qualitative and quantitative measures.A trifocal sensor ORUS3D 3000 m depth rated system developed by COMEX SA is used in the COMEX underwater test field to capture synchronized imagery and inertial sensors raw data.

SUBSEA METROLOGY
Metrology is defined by the International bureau of weights and measures (BIPM) as the 'science of measurement, embracing both experimental and theoretical determinations at any level of uncertainty in any field of science and technology.' 1 Metrology aims at qualifying, verifying and validating measured data according to accepted standards.Consequently, a key aspect in metrology is traceability, i.e. the 'property of a measurement result whereby the result can be related to a reference through a documented unbroken chain of calibrations, each contributing to the measurement uncertainty' (VIM3 2.41).Traceability requires the definition of references, allowing for the assessment of measurement uncertainty and comparison of different measurement results, under the assumption that they are traceable to the same reference.Metrology is fundamental in industry, where it serves the purpose of ensuring quality and accuracy of manufactured parts and components against standards developed at different levels, from international and national basis to industry specific or even customized for internal purposes.
Figure 1.A pictorial representation of subsea structures whose relative positions are measured through subsea metrology techniques (Bai and Bai, 2010).
1 https://www.bipm.org/en/worldwide-metrology/In the underwater environment, metrology commonly refers to the process of acquiring accurate and traceable dimensional measurements of subsea structures (Figure 1), widely used in the offshore, marine and underwater engineering companies (Bai and Bai, 2010;IMCA, 2017).Subsea structures are mainly pipeline interconnections, joining subsea assets from hydrocarbons reservoir to processing and storage facilities.The pipeline connectors are called hubs or flanges; the pipeline elements that separate the hubs are called spools, when they run parallel (i.e, horizontally) to the seabed, and jumpers, if they are vertical.The objective of subsea metrology is to determine accurately (Jørgensen et al., 2015;IMCA, 2017): • Horizontal position and depth of the hubs; • hub-to-hub slant and horizontal distances (also called baseline); • hub-to-hub relative heading and attitude; • spool azimuth (i.e., the bearing of the spool from the hub) and angle of approach (difference between the spool azimuth and hub headings); • seabed profile along the structure route.Typical subsea metrology (Table 1) are defined according to the permissible hubs misalignment, taking into account several factors which include stress analysis, fabrication tolerance and possible deformation resulting from deployment operations.Because of the high daily costs for operations at sea, limiting the time for subsea surveying is one of the main factors driving innovation in the subsea metrology industry.Spools and jumpers are the latest elements to be fabricated and installed and require not only the relative positioning and orientation between the hubs but also the sea bed 3D topography to adjust shape and size of jumpers and spools accordingly; for this reason, 3D real-time measurement techniques are of key importance to optimize the fabrication processes.

SLAM, STRUCTURE FROM MOTION AND VISUAL ODOMETRY
SLAM and SfM aim both at estimating the pose of the agent (the robot for the SLAM and a camera for the SfM) and at reconstruction (or mapping) the environment (or the 'structure').
The two main classes of algorithms were originally developed in two different communities, respectively robotic and computer vision (Saputra et al., 2018).According to Davison (2015), before his seminal work on SLAM with a single camera (Davison, 2003), the mobile robotics community had almost completely abandoned pure vision-based navigation approaches and the computer vision community had been almost completely disinterested from real-time and robotics applications.
In a SLAM based approach, data coming from different sensors, or modules, are fused together to estimate the system position and attitude (the state vector) and build the map of the environment.Crucial for SLAM approaches is the identification of the so called 'loop closure', i.e. the detection of a previously mapped place and consequent relocalization of the system with-in the already measured environment.The 'loop closure' reduces the drift accumulated in the SLAM solution over time (Newman & Ho, 2005).At the early stage of development, the main difference between visual SLAM and SfM was that the first was mainly developed for real-time (or on-line) computation while the second was traditionally performed off-line, meaning that all the data and measurements (i.e.images) are provided and (post-) processed together.Saputra et al. (2018) considered MonoSLAM (Davison, 2003) the first approach to bring the general SLAM problem from the robotic community into pure vision.
However, it should be noticed that also SLAM can be formulated as full or off-line problem, when the whole trajectory and the map is estimated providing all the sensor data and measurements.On the contrary, online SLAM updates incrementally both the agent pose and map with the most recent estimates from the sensors.
The two approaches differ in the estimation techniques implemented (Bresson et al., 2017): filter-based approaches (such as the Kalman filter) are most suitable for iterative, realtime implementation; optimization-based methods (bundle adjustment, BA, or graph-based SLAM) are usually adopted for solving the full SLAM approach.The feature measurements are integrated by estimating the probability distribution in filterbased approaches or through optimization in BA (Saputra et al., 2018).
Visual odometry (VO) consists in estimating the motion of a single camera or stereo systems from visual input (images or video frames) alone (Nistér et al., 2004).The main differences between SLAM and VO are explained in Scaramuzza & Fraundorfer (2011) and are here summarised.While SLAM and visual SLAM aim to obtain a global and consistent estimate of trajectory and map, VO is mainly devoted to recover the path incrementally, potentially optimizing only over the last n poses (also called windowed bundle adjustment).VO can be implemented as step for a complete SLAM algorithm, where also loop closure and possibly a global optimization step are performed.Visual SLAM is potentially more accurate of VO, because more constraints are enforced on the mobile path; however, this does not ensure higher robustness, since outliers not detected in the loop closure can critically affect the map correctness.
Under the water, SLAM techniques have been used to fuse inertial and acoustic positioning systems in particular in subsea metrology industry (IMCA, 2017), or for autonomous underwater robot navigation and localization using imaging sonar and visual sensors.First methods were focused on using acoustic images (Fusiello et al., 1999;Castellani et al., 2005;Roman 2005, Clark et al., 2008;Ribas et al., 2009) to move then to visual based methods (Eustice et al., 2008;Kim and Eustice 2009;Duarte et al., 2016, Ferrera et al., 2019).A more comprehensive review of different techniques used for localization and mapping can be found in Paull et al. (2014) and Hidalgo and Bräunl (2015).

COMEX 3D underwater reference test-field
With the aim of evaluating the performances of visual based techniques in a subsea metrological context, a high accuracy underwater 3D reference test-field was recently set up in the COMEX test pool (Figure 2 a, b).The test-field consists of 200 optical targets placed over a 30 m long transect, comprising two opposite walls facing each other, and a rectilinear pool floor section in between.The width of the transect is 1.2 m while maximum depth difference is about 1.4 m.The targets' coordinates were determined through multi-lateration and triangulation using a laser tracker after emptying the pool.The laser tracker Spherically Mounted Retroreflector (SMR) was aligned to be in tangency (Figure 2 c) with the photogrammetric circular target in four points, hence the centre of the photogrammetric target was determined through a best fit circle.The multi-station measurements were then adjusted through least square procedures providing standard deviation of the 3D reference coordinates below 0.5 mm over 30 m length.The reference coordinate system is set with the X axis pointing along the main direction of the test-field, Z vertical and Y according to the right hand rule convention.
Figure 3.The ORUS 3D 3Kv system mounted on a Mid-Size observation class ROV skid.

ORUS 3D underwater photogrammetry system
The ORUS 3D (Figure 3) is an underwater system specifically designed for photogrammetric measurements, i.e. its design, materials and calibration procedures are optimized to guarantee high absolute accuracy taking into account the refractive effects of water, thermal and pressure influences.The ORUS 3D is composed of three main parts: 1) an Embedded Processing Unit (EPU) managing the synchronization and real-time raw data processing of connected sensors; 2) a cluster of sensors including, among the others, three industrial global shutter cameras (one high resolution-HR and two low resolution-LR), 4 LED based strobes, an attitude/ heading reference system, underwater altimeter (acoustic range finder), temperature and pressure sensors.3) Surface Control Unit Computer (SCUC) connected to the EPU via umbilical of the ROV managing the remote control of the system parameters and displaying the remote real-time visual-inertial odometry performed on-board the EPU with real time 3D measurement capabilities.
The version tested in this study is the 3Kv rated to a depth of 3000m with camera pressure housings made of titanium and dome ports made of optical glass.
The current photogrammetric processing relies on ad-hoc developed procedures of calibration and image triangulation with and without inertial sensor integration in the bundle adjustment.
From the camera calibration process, images are rectified to provide distortion free images based on single view point pinhole camera model.This pre-processing step allows a more flexible and easier cross platform and software use of the images.The ORUS 3D system is being certified by Bureau Veritas -Marine & Offshore for subsea metrology inspections.The COMEX pool test-field was used during the certification tests providing an accuracy of 3D coordinates better than 1 cm over 30 m length (RMSEX<6mm RMSEY<3 mm RMSEZ<2mm) with a single photogrammetric strip.
The system is manufactured in three different versions according to the depth rating (1000/3000/6000 m).The 3000m (3Kv) system used in this study is conceived to be installed on a skid starting from Mid-Size ROV class.

Image dataset characteristics
The image dataset used in this experimentation consists of a session of 10 HZ LR stereo-rectified full HD images extracted from a session acquired with an ORUS3D 3Kv on the COMEX test-field.To reduce the risk of failures in the image orientation due to the almost featureless surface texture of the pool floor, some metallic plates with contrast random pattern were installed at the pool floor (Figure 4 b, c).The image acquisition was carried out with the ORUS 3D set with slightly negative buoyancy, hanged from a floating platform and manoeuvred by a diver as depicted in figure (Figure 4 a).The images were acquired at night to test the system in lighting conditions similar to those available at typical operative depths.
The image sequence used in this experimentation included approximately two thirds to record the vertical walls of the pool facing each other, and another third for the rectilinear part.Table 2 summarizes the main acquisition parameters.

Reference 3D trajectory, angles and mesh
A reference trajectory was generated orienting the images in Agisoft Metashape application, using the full dataset (including HR cameras), then a bundle adjustment was run by constraining the solution using the inertial sensor and coded target coordinates as soft constraints (COMEX ORUS 3D software).A dense point cloud through dense image matching techniques and a mesh was generated (Figure 5).A cloud to mesh distance check was performed to verify the consistency between the reconstructed mesh and the reference coded target coordinates.The RMS of distances resulted below 1 mm.Left and right camera exterior orientations (trajectories and angles), coded targets and mesh were used as reference for comparing the real-time visual odometry and SLAM, processed offline in this experiment.Figure 5.A panoramic view of the pool transect (a) and the reference mesh built using full resolution images with superimposed coordinate reference system (c)

ORB-SLAM2 mono and stereo test
The monocular and the stereo pipelines of ORB-SLAM22 have been used to obtain the estimated device trajectory and the mapping of the environment.ORB-SLAM2 was developed on top of ORB-SLAM3 adding the support for stereo and RGB configurations.The system requires as input a visual vocabulary, used to speed up feature matching and loop detections, and a configuration file containing the camera calibration and the parameters for the ORB extraction.During the tests, the default parameters were used.The input images were down sampled to a factor of 2 of original size (half width and height).As output the system provides the estimated structure of the environment (sparse point cloud of 3D tie points) and the estimated poses of the keyframes.The keyframes is a subset of the input images on which the mapping and the BA optimisation is performed; this is done to ensure the real-time behaviour of the system.

Visual odometry with windowed bundle adjustment
The visual odometry method tested in this study is a mono variant of the method originally developed in Drap et al. (2015) and further improved in Nawaf et al. (2018) with the addition of a windowed bundle adjustment.The method is composed of two steps, first, a relative pose estimation is performed on each new image following multiple view geometry fundamentals, second, a structure and motion bundle adjustment approach is applied to a set of images defined by a sliding window that selects the last n images.The implementation was tested using the python scripting API of Agisoft Metashape application.A visual odometry procedure was simulated so that images are processed in sequence as if they are acquired in real-time.The influence of window size on the accuracy of the estimated trajectory was studied and window sizes of 3 and 4 are reported.Furthermore, the effect of using down sampled images on the accuracy was experimented.In this study a down sampling factor of 2 of the images (half width and height) is reported.Calibration parameters were kept fixed in the bundle adjustment procedure.

RESULTS
Commonly available camera positions along the trajectory obtained respectively with the ORB-SLAM2 and the visual odometry methods were used to compute a best fitting similarity transform (according to least-squares principle) with respect to the reference trajectory.The transformation was then applied to calculate new camera positions and angles in the reference coordinate system and consequently translation and angular errors as difference against the reference values.The procedure is a common practice in surveying disciplines and is corresponding to the absolute trajectory error (ATE) presented in Sturm et al. (2012).Also, using the same transformation, 3D coordinates of tie points were brought in the reference coordinate system and compared against the mesh.In order to highlight the drift as function of the distance, another comparison called "drift analysis" was performed through a local alignment.The first third of the sequence, corresponding to the vertical wall of the pool and before the linear section of transect, was used to compute a similarity transform, then same translation and angular errors as for the global alignment are reported.
The method is similar to the relative pose error (RPE) presented in Sturm et al. (2012) but in our opinion is better suited for estimating the angular and translation error for those methods that rely on bundle adjustment as it is less sensitive to the choice of the reference camera used for the relative pose error estimation.Indeed, even if the real-time process is based on a sequential estimation of the trajectory and map, their global integrity could be preserved even if few images were not properly oriented.This is the case, for example, of camera position and orientation estimated for those images containing only a number of image observations close to the minimum needed for a resection or relative orientation and at the same time containing few outliers or wrong matches.
It is worth noticing that the scale factor was always computed, except for the stereo version of ORBSLAM2.Table 3 reports the results of the RMS of the errors for the global and local "drift" versions.
Figure 6 shows planimetric OXY and vertical OXZ orthographic views of the drift analysis residuals for the different algorithms.

DISCUSSION AND CONCLUSIONS
The paper reported the results of a preliminary accuracy assessment carried out to verify whether real time algorithms may be suited for subsea metrology purposes.Tests were run using default parameters for the ORBSLAM2 algorithm while for the windowed bundle adjustment visual odometry approach, the influence of window size was also reported.The tested algorithms showed very promising results with trajectories differing only for few centimetres from the reference one.By looking at the difference maps of 3D tie points against the reference mesh and from the angular error table, it is worth noticing that, at the moment, distance tolerance of 10 cm and angular tolerance of 1 degree may be potentially met only for transects below 30 m (under the assumption of a correct external scaling, except ORBSLAM2 stereo that already provides scaled measurements).Further tests are necessary to understand the repeatability of results through several repeated run of the algorithms and using different image acquisitions of the same transect.
Figure 2. Underwater 3D test field set at COMEX testing pool (a,b) consisting of 200 coded targets measured with a laser tracker and spherically mounted retroreflector (c).

Figure 4 .
Figure 4. Diver manoeuvring the system during the acquisition of the image dataset used in the presented underwater experiments (a) and an example of the rectified image pairs depicting the pool floor with contrast plates and targets (c).

Figure 7 .
Figure 7. 3D tie points colour coded according to their euclidean distance from the reference mesh for the global best fit test: a) ORBSLAM2 mono, b) ORBSLAM2 stereo and c) visual odometry WS 4.

Table 2 :
Main information for the image dataset used in the presented underwater experiments