MULTI-CAMERA SYSTEM CALIBRATION OF A LOW-COST REMOTELY OPERATED VEHICLE FOR UNDERWATER CAVE EXPLORATION

Exploration, documentation and mapping of underwater environment is one of the biggest open challenges for science and engineering. Humankind is not naturally designed to operate in water and, despite the enormous technological advancement that offers nowadays unprecedented opportunities, diving and working underwater is still very dangerous, especially in confined spaces such as underwater caves. Great research efforts are currently devoted to underwater autonomous navigation, but available solutions still mainly rely on complex and expensive systems, due to the difficulty of adapting localization and mapping sensors and algorithms suited for terrestrial or aerial applications. However, small and affordable underwater remotely operated vehicles (ROVs) are available, which offer good opportunities for underwater exploration and mapping. This paper focuses on the development of a small, low-cost ROV designed for 3D mapping of underwater environments, like caves. The system is based on a commercially available vehicle, the BluRov2, and relies on the use of up to 12 action cameras (GoPro) mounted on it. A trifocal camera system for underwater real-time visual odometry can also be included. The work describes the photogrammetric procedure developed for the synchronization and calibration of the GoPro cameras and provides a thorough analysis on the achievable results.


INTRODUCTION
The ability of acquiring and recording precise, dense and georeferenced 3D information is a constant request for a variety of applications, ranging from civil engineering and construction to cultural heritage, from environment to industry.When the area to be mapped is vast, mobile mapping systems (MMSs) are the optimal solution, allowing to derive 3D metric information of the environment while moving in it.Traditionally, MMSs are mounted on vehicles, i.e. cars, vans, but also planes and boats (Petrie, 2010).These systems are usually expensive and complex (Ellum and El-Sheimy, 2002), and are not suitable for mapping indoor and underground spaces.To overcome these limitations, portable MMMs have been developed both in the research and commercial domains and are becoming more and more popular for applications in scenarios characterised by harsh conditions (Lehtola et al., 2017;Nocerino et al., 2017;Tucci et al., 2018).While the last systems are designed to be carried around by an operator, 'terrestrial' robotic platforms (Nüchter et al., 2013;Ziparo et al., 2013) and autonomous aerial micro vehicles (Cieslewski et al., 2017) are under study with the aim of surveying complex scenes autonomously and, consequently, assuring the safety of human operators.While it is undoubtful that there still is room for improvements towards more and more efficient 3D mapping and modelling approaches for 'terrestrial' scenarios, challenges are even more crucial when it comes to the underwater environment.It is estimated that about 95% of the ocean is unexplored (NOAA, 2018) and underwater caves represent probably the most fascinating and dangerous type of underwater exploration.Deep water applications commonly entail the use of expensive, huge and complex remotely operated (ROV) or autonomous underwater (AUV) vehicles.Recently, smaller and low-cost systems have started to appear (e.g., BlueRov2, OpenROV, Sibiu Nano, etc.), revolutionizing the world of underwater investigation and opening unprecedented opportunities for researches in underwater caves, an environment which can be highly dangerous also for well-trained divers.

Model
BluerRov2 Thrusters 6 x Blue Robotics T200  Moving from the last considerations, this contribution presents the development of a small, low-cost ROV for 3D mapping of underwater cave based on photogrammetry.The system is a modified version of the commercially available BluRov2 (Table 1), equipped with a trifocal camera system (Nawaf et al., 2018) and with the possibility of mounting up to 12 GoPro action cameras (Figure 1).The paper focuses on the methodology developed for the synchronization and calibration of the GoPro multi-camera system.In the current version, eight GoPro cameras, two Hero 5 Black and six Hero 6 Black, are mounted on the BluRov2.The manuscript is organised as follows.First, an overview of related works is provided and, then, an in-depth analysis on the achieved results is presented.The paper concludes summarising the main outcomes, while further steps are will be required to combine the data from the GoPro cameras with the trifocal camera system and the inertial navigation unit embedded onboard the BluRov2.

RELATED WORKS
Action cameras have become very popular for photogrammetric applications thanks to their flexibility, lightweight, robustness and low cost.Ballarin et al. (2015) used a GoPro Hero3 Black Edition on a small aerial drone to survey an archaeological site.The accuracy of orthophotos produced with the GoPro images was tested against check points measured with RTK-GNSS approach.Hastedt et al. (2016) investigated the reliability and accuracy of GoPro Hero4 for unmanned aerial vehicle (UAV) based photogrammetry, using different software packages, calibration setup, acquisition mode (i.e., single images vs video) and resolution.
A five-head GoPro system for indoor mapping purposes was proposed by Teo (2015).The five cameras were separately calibrated, their lever-arms and boresight angles estimated and video streams synchronised using an external timer.Keohl et al.
(2016) presented a four GoPro Hero 4 multi-camera system for outdoor mobile mapping applications.A remote controller was used for the system synchronization, providing a not very accurate synchronization.
Schmidt & Rzhanov (2012) employed a stereo-camera systems based on two GoPro Hero2 for seafloor bathymetry measurements.The cameras set in video mode were synchronised using a synchronization cable.A similar stererocamera system mounted on micro-ROV and combined with a sonar scanner was proposed by Nelson et al. (2014) for surface reconstructions of an underwater archaeological site.

DEVELOPED APPROACH
The approach developed for the multi-camera system calibration, i.e. the computation of relative orientation (RORE) parameters between the cameras, is intended to be flexible.It follows the guiding principle of developing a procedure applicable in different environments, without the need of a dedicated facility.
The main steps are summarised in Figure 2 and detailed in the following sections.The developed ROV system is designed to work in harsh underwater environments, characterised by narrow spaces and low light conditions.Consequently, the mean idea is that the system is calibrated 'in-air' before the underwater mission and the computed RORE parameters are employed in the processing of the underwater dataset in two ways.The simplest consists in exploiting the RORE to infer a scale constrain between the camera centres.The second aims to speed up the processing, computing the approximate exterior orientation parameters of slave cameras from the exterior orientation of the camera selected as reference.
For this specific application, where a high number of action cameras are used and the system is designed to operate underwater, a hardware-based synchronization approach is not feasible.Consequently, all the cameras are set in video mode and a post-processing software synchronization method is used, based on a well-recognisable common event visible in the single video streams.

Scale definition and reference measurements
The multi-camera system calibration approach relies on the use of reference measurements for the definition of RORE parameters between the several cameras that compose the system.
Coded targets, arranged on planar sheets of known dimensions, are attached to a rigid structure; while the current experiment was performed in a basement (Figure 3), the procedure is easily exploitable in different closed spaces, like for example a small room or the back of a van.
In this study, a professional grade digital single lens reflex (DSLR) camera (Nikon D810) equipped with a 24 mm lens is used to measure the coordinates of 135 coded targets, with a final estimated accuracy of 0.5 mm.The self-calibrating image network, comprising 190 images, is shown in Figure 3, together with the calibration environment.

GoPro cameras set-up, video processing and frame extraction
To reproduce dark lighting conditions expected in real environments, the BlueRov2 lights are kept turned off and the GoPro cameras settings were configure to avoid motion blur in the video frames (minimum exposure time equal to 1/120 seconds and 400 as minimum ISO).Following the outcomes of the experiments presented by Hastedt et al. (2016), the videos were recorded with the in-house GoPro camera fisheye distortion correction enabled (FOV linear), thus producing frames that can be assimilated to a central projective model.This choice is enforced by the evidence that underwater when flat ports are used, as for the GoPro cameras, the field of view (FOV) is limited by total internal reflection (Menna et al., 2016).The video resolution is set at 2.7K (2704 pixel x1520 pixel) @ 30 frame per seconds (fps), corresponding to a nominal frame rate of 29.97 fps.
The frames are extracted in the lossless png format from the original video stream using the FFmpeg free software (FFmpeg Development Team, 2010).The png frames are then converted in jpg at the highest possible quality, and the following exif tags are added using ExfiTool (Harvey, 2003): Make, Model, FocalPlaneXResolution, FocalPlaneYResolution, FocalLength, FocalLengthIn35mmFormat. Based on these parameters, photogrammetric software applications can automatically recognise images coming from different cameras and estimate the initial value for the principal distance.Table 2. Results of self-calibrating bundle adjustment for the eight GoPro cameras.Interior orientation and additional parameters (correction coefficients) are reported along with internal assessment in image (root mean square -RMS of re-projection error) and object space (root mean square error -RMSE on check points).

Single camera calibration
The eight GoPro cameras are calibrated separately; a typical image network configuration is shown in Figure 4.The volumetric calibration testfield is composed of a vertical wall intersecting the floor and ceil.
From each video stream, the key frames are extracted, assuring a good image quality and including rolled imaging around the optical axis.Agisoft PhotoScan is used for the image processing.
Table 2 summarises the results of the calibration processing, where only the significant additional parameters are retained, and Figure 5 shows the distortion maps for the different cameras.Interestingly, the affinity term B1 is one order magnitude higher for the two GoPro 5 than the GoPro 6.The distortion maps feature very peculiar trends, with the two GoPro 5 showing a more pronounced vertical pattern.

7-GoPro6 8-GoPro6
Figure 5. Distortion maps (difference in mm between ideal and actual distorted pixel position) for the eight GoPro cameras

EIGHT GOPRO CAMERA SYSTEM CALIBRATION
Once the single cameras are calibrated, they are mounted on the BlueRov2, which is positioned on a cart to avoid touching the cameras.After the system synchronization (section 5.1), video streams are acquired while the cart is moved in the calibration environment (Figure 6 a and b).

Synchronization
For the synchronization of the eight video streams, the system is put in complete darkness and a flash light is switched on and off (four times in this experiment).The frames are extracted at the original frame rate and, for each frame, the median intensity values is computed.The video streams are automatically synchronized by locating the peak (minimum and maximum) values in the times series of the median intensity difference (Figure 6c).The synchronization procedure with the flashlight is repeated at the end of the acquisition to verify that the synchronization is preserved.With the adopted approach, the maximum synchronization error can be equal to one frame, i.e. 1/29.97 (≈0.033) seconds that is the inverse of the actual frame rate.

Relative orientation
RORE parameters are computed for all possible camera pairs, at the each instant of time or epoch (t).According to equation 1, for an eight-camera system, 28 is the total number of RORE for each epoch t.
where T is the total test duration.
The camera showing with the smallest set of RORE std (or MAD) will be selected as reference.
For RORE computation, the definition and notation presented in Menna et al. (2013) are here adopted.
In the following equations, the letters G, R, S, indicate respectively the global (or world), reference camera and slave camera coordinate systems.The superscripts specify the frame where the quantity is defined, e.g.P G is the position vector of a generic point P known in the global frame.The notation    expresses the transformation from the coordinate system {i} to the coordinate system {j}.According to this rule, M G S is the where ( 3) and ( 4) are respectively the rotation matrix, containing the Euler angles, and the position vector of the slave camera perspective center in the global frame.Performing the RORE between the two cameras is equivalent to re-orienting the global reference system to be coincident with the reference camera frame.This transformation is expressed as: where ( 6) and ( 7) are respectively the rotation matrix (i.e., orientation or boresight angles) and coordinates of the slave camera in the reference camera coordinate frame.In other words, equation ( 7) represents the components of the baseline (or leverarm) between the two cameras in the coordinate system centered on the reference camera.From the above transformation, the exterior orientation parameters of the chosen reference camera become null, and the exterior orientation of the slave camera with respect to the reference camera, i.e. the RORE, is obtained.
At each epoch t, the RORE of the eight GoPro cameras mounted on the BlueRov2 are estimated from the exterior orientation parameters.The mean and median values are then computed from the whole time series of duration T.
Staring from the last identified synchronization event (section 5.1), one frame per second is extracted from each camera video streams.All the extracted frames are processed together in Agisoft Photoscan, enforcing a camera-variant bundle adjustment, where a different set of calibration parameters is defined for each GoPro.Three different approaches are tested: 1. the pre-computed camera calibration parameters (section 4.3) are used as initial values for a selfcalibrating bundle adjustment, i.e. interior and exterior orientation parameters are estimated for the complete camera network shown in Figure 6 a and b; 2. the pre-computed camera calibration parameters (section 4.3) are kept fixed in the processing, i.e.only the exterior orientation parameters are estimated from the complete camera network shown in Figure 6 a and b; 3. as 2, i.e.only the exterior orientation parameters are estimated, but from a small subset of images (white rectangle in Figure 6 b), with the aim of simulating the camera calibration procedure in a small environment.The coordinates of the coded targets previously measured with the reference photogrammetric system (section 4.1) are used to perform: a. a free network adjustment, where the coded targets are used to define the scale b. a constrained adjustment for the three approaches described above.The results of different processes are reported in terms of root mean square error (RMSE) on check points, maximum value of the standard deviation (std) and median absolute deviation (MAD) of the baseline (lever arm) and Euler (boresight) angles from the RORE between each camera pair (Table 3).The RMSE on the check points improves for the constrained solution, especially in the case of the small image subset.The use of a robust statistic such as the MAD is highly significant for the omega angle when the self-calibration adjustment is performed.Figure 7 shows the mean and median baseline values with the associated std and MAD (error scale = 100).The robust statistic estimators show that the effect of residual outliers can be further mitigate, especially for approaches 1 and 3.The absolute difference between the three different approaches, computed for the constrained adjustment solutions, is also reported, taking as reference the fixed calibration full dataset (Figure 8).It shows that the fixed-calibration full and subset approached provide more consist results, with difference below 2 mm. Figure 8. Absolute difference (in mm) between the three different approaches, taking as reference the fixed calibration full dataset.The results are reported for the constrained solution.

CONCLUSIONS AND OUTLOOK
The paper presented an in-depth analysis of the calibration of a multi-camera system, composed of eight action cameras (GoPro) mounted on a low-cost ROV.The system is designed as MMS for the photogrammetric survey of underwater caves.The multi-camera system calibration approach entails a threestep procedure, consisting in the (i) measurement of a reference calibration environment, (ii) estimation of interior orientation parameters for each single camera, (iii) computation of the RORE (lever arm and boresight angles) between the multiple cameras.
The calibration results of the different GoPro cameras show that the interior orientation parameters are significantly different, while similar patterns in cameras of the same model can be identified.
The pre-calibration of the single camera provides results with smaller std (and MAD) in the estimation of the RORE.Normal and robust statistical parameters show that the effect of residual outliers can be further mitigate.Decreasing the number of images and reducing the space for the calibration leads to a difference in the RORE baseline less than 2 mm.The next steps will involve the inclusion of the trifocal sensor and the inertial navigation unit in the system calibration, and tests in underwater environment.

Figure 1 .
Figure 1.The BlueRov2 fully equipped for underwater cave exploration

Figure 2 .
Figure 2. Flow-chart of the implemented procedure for the computation of RORE between the GoPro cameras.

Figure 3 .
Figure 3. Top (a) and later views (b, c) of the test area with the reference camera network

Figure 6 .
Figure 6.Camera network of the eight GoPro cameras system calibration (a, b).The white rectangle in b indicates the subset selected for the calibration approach 3 in section 5.2.(c) Automatic synchronization method based on the peak values of the median intensity values: four separate events are evident for the eight different cameras, represented by the different colours rotation matrix from the global to the slave camera frame and O S G represents the origin of slave camera frame expressed in the global reference frame.Following this notation, the coordinates of the point P G known in the global frame can be expressed in the slave camera reference frame as function of the exterior orientation parameters of the slave camera:  c  s  + s  s  c  s  s  − c  s  c  − c  s  c  c  − s  s  s  s  c  + c  s  s  s  − s  cos  c  c  ] 1 (3)

SelfFigure 9 .
Figure 9. Mean and median baseline (lever arm) values (in mm) with the associated std and MAD (error scale = 100).The results are reported for the constrained solution.

Table 3 .
Results of multi-camera system calibrationThe International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-1, 2018 ISPRS TC I Mid-term Symposium "Innovative Sensing -From Sensors to Methods and Applications", 10-12 October 2018, Karlsruhe, Germany This contribution has been peer-reviewed.