GEOMETRICAL CALIBRATION FOR THE PANROVER: A STEREO OMNIDIRECTIONAL SYSTEM FOR PLANETARY ROVER

A novel panoramic stereo imaging system is proposed in this paper. The system is able to carry out a 360 stereoscopic vision, useful for rover autonomous-driving, and capture simultaneously a high-resolution stereo scene. The core of the concept is a novel "bifocal panoramic lens" (BPL) based on hyper hemispheric model (Pernechele et al 2016). This BPL is able to record a panoramic field of view (FoV) and, simultaneously, an area (belonging to the panoramic FoV) with a given degree of magnification by using a unique image sensor. This strategy makes possible to avoid rotational mechanisms. Using two BPLs settled in a vertical baseline (system called PANROVER) allows the monitoring of the surrounding environment in stereoscopic (3D) mode and, simultaneously, capturing an high-resolution stereoscopic images to analyse scientific cases, making it a new paradigm in the planetary rovers framework. Differently from the majority of the Mars systems which are based on rotational mechanisms for the acquisition of the panoramic images (mosaicked on ground), the PANROVER does not contain any moving components and can rescue a hi-rate stereo images of the context panorama. Scope of this work is the geometric calibration of the panoramic acquisition system by the omnidirectional calibration methods (Scaramuzza et al 2006) based on Zhang calibration grid. The procedures are applied in order to obtain well rectified synchronized stereo images to be available for 3D reconstruction. We applied a Zhang chess boards based approach even during STC/SIMBIO-SYS stereo camera calibration (Simioni et al 2014,2017). In this case the target of the calibration will be the stereo heads (the BPLs) of the PANROVER with the scope of extracting the intrinsic parameters of the optical systems. Differently by previous pipelines, using the same data bench the estimate of the extrinsic parameters is performed.


INTRODUCTION
Planetary rover camera's major tasks are to assess the traversal of the near field terrain surrounding the rover, avoiding possible hazards (Gosh et al 2017, Wang et al 2019), and record images for scientific purposes. To operate both of those tasks, a set of optical cameras are necessary. The paradigm of modern planetary rovers is the already landed on Mars as the Mars Exploration Rovers (MERs) (Maki et al 2003) and the Mars Science Laboratory (MSL) rover (Curiosity) (Maki, et al 2012). The proposed stereo system (hereafter PANROVER) may have comparable performances of those by using less image sensors yielding. A lower mass and less complex system. For instance, it uses only 2 image detectors to have comparable performances, in terms of optical performances, with respect to MERs and MSL rover stereo cameras, which use, respectively, 8 and 13 imaging detectors.
The PANROVER is in fact characterised by a wide field of view that the Panoramic optical systems can provide, which made 360 cameras extensively used in situations like surveillance, robotic vision, navigation and military applications (Fowski et al 1995). In addition, together with the context stereo images, optimizes its FoV with a limited hi-resolution channel obtained by its blind spot in zenithal direction. A deeper description of the PANROVER instrument from an optical and stereo point of view will be reported in Section 2. Section 3 will describe the method applied for intrinsic and extrinsic calibration. Section 4 will report the results.

THE INSTRUMENT
The BPL is a specific application of the HH (hyper hemispherical lenses) already designed (Perneleche et al 2016) by our Institute. The HH are part of the very wide-angle lens set. This class was historically underestimated for the distortion aberration that is introduced by the fact the chief rays angles on the object side are not preserved passing through the optics proceedings the aperture stops Error! Reference source not found.. The advent of lowcost large area digital sensors and the increase of the velocity of the digital processing made in the last decade more diffuse, as it is easier to work on the images, often in real time and make available to the user an undistort (unwarp) products. The most known very wide angle lens is the "fish eye" Error! Reference source not found. (1923) able to acquire a field of view near to the hemispheric range (180° of zenith angle and 360° in azimuthal one). The omnidirectional one (Kleinschmidt et al 1911) is a less used class of very wide angle lens. These kinds of cameras are able to increase the zenith angle making the camera able to acquire even beyond the hemispherical field (above and below) the horizon. A known limit of these kind of lenses is the blind spot around the optical axis (near null zenith angle) which give to the acquired images the well-known "donut shape". The HH were designed with the aim of merging the fisheye and omnidirectional lenses capabilities with a FoV greater than the 180° and avoiding the donut shape. Figure 1 shows an example of the resulting images (note the zenithal not blind region). We already realized this kind of approach for space applications (Pernechele et al 2018).

The bifocal panoramic lens
An application of the HH idea is a novel "bifocal panoramic lens" (BPL) proposed for planetary rover and used to create the stereo acquisition system called PANROVER. PANROVER system is based on a vertical baseline hosting two BPLs. The two ultra-wide field of view optical objectives are both composed of an objective with two optical components: a catadioptric element (with a reflective concave surface) for the panoramic fields (PF). a secondary lens (fore optics) for the frontal field (FF).
These fore-optics, although the use of a semi-reflecting surface in the optical train, allows to cover an off-boresight angle of 135°. This supplies the panoramic acquisitions. Differently by classical omnidirectional cameras the central blind zone on the detector (20° around the boresight) is exploited by the fore optics which has higher magnification power with respect to the panoramic path for a high-resolution view limited to zenith direction but easy correctable thanks to a flat folding mirror as shown in Figure 2.

Figure 2 Bifocal lens scheme
The lenses have been mounted in OAPD Laboratories on the first breadboard of the stereo setup. PANROVER will be able to perform a 360 stereoscopic vision (by PFs) and to capture high resolution stereo scenes (by FFs), simultaneously and without any moving part.
In the design of our breadboard a 5MP resolution, 2/3 inch format image sensor (CMOS Sony IMX264) was chosen. The pixel pitch is of 3.45 μm and the readout speed may reach a framerate of 35 fps. The first breadboard and an example of the image acquirable (synchronically for FF and PF) are shown in Figure 3.

Optical performance
The PF of the realized BPL is 360 x105 , while its FF is 20 round. The angular resolution of the PF is within a range of 1-3 mrad/px (the lens, due to its extremely large FoV, is anamorphic) while the resolution of FF (with a frontal optic magnification of 3x) is about 0.3 mrad/px. These parameters should be compared with the current planetary rovers working on Mars, the MERs 0 and the MSL rover 0. Both the payloads include engineering cameras and scientific ones. We report the FoV and iFoVs of the acquisition systems in Table 1 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition)

Stereo setup
Other approaches were proposed for panoramic stereo system for landing environment. A remarkable example is the double panoramic system for rover platform proposed in (Huan et al 2012) using two coaxial annular lenses where two annular lenses share the same sensor in a rigid 100m baseline. The system requires a mounting alignment (to convert the two optical trend in the same detector) and does not take the advantage of the flexibility of the PANROVER where the mounting does not need any optical precision and can be refined once reached the planetary target by edge based imaging methods.
PANROVER stereo system in realized by placing two panoramic lenses aboard of a rover (see Figure 4 below) with a vertical baseline, it is possible to look at the entire environment surrounding the vehicle in stereoscopic mode. The two BPL are indicated in the Figure 4 as and .

Figure 4 Stereo baseline scheme
As demonstrated in (Liu et al 2015), it is known the development of algorithms that automatically generate high-resolution digital elevation model (DEM) from the rover's navigation cameras (Navcam) images. Our team, which has developed a stereo processing tool (3DPD Simioni et al 2017) able to generate DTMs from orbital data will extend the application introducing obstacle maps considering all factors obstructing the traverse of rover such as slope, aspect, and elevation differences. The application of the 3DPD pipeline and the optimisation of its strategies (such as similarity functions, continuity limits and oriented performance tuning) are depending by the results obtained by the on ground calibration of the intrinsic and extrinsic parameters defined by these first breadboard tests. It should be considered that , thanks to its long baseline, the BPLs channels will rescue local DTMs with a vertical precision less than 20 mm from a 2 meter distance.

CALIBRATION
From a calibration point of view the PANROVER represent a non-central projection, it does not have a single effective viewpoint (see Figure 6) or projection centre (Baker at al 1998). This characteristic makes it not applicable to a common 360 prospective model. As demonstrated by calibration results the generic Omnidirectional toolbox proposed by 0 well model the cameras designed for the PANROVER. This calibration is limited to the single acquisition system. We introduced improvements to upgrade the system and to obtain the stereo extrinsic parameters. Thanks to the use of multiple images of a well known target the pipeline allows to define: 1. the intrinsic parameters (the model of the camera) , 2. the target parameters (the position and attitude of the target in the world space) 3. the extrinsic parameters (the model of the acquisition baseline).
The intrinsic parameters will be described in the next section. The target parameters, in case of N target images correspond to the 6xN parameters which define for each target images the 3 attitude and the 3 positions of the targets in the camera reference system. The extrinsic parameters are 6 and, commonly in the stereo systems, describes the rototraslation between the reference system of one of the cameras and the other. Next section will give an introduction about the single effective viewpoint constraining the camera models (3.1). Section 3.2 will clarify the model and the intrinsic parameters proposed for the BPL camera. Following Section (3.3) will describe the intrinsic and target calibration for a single optical head of the PANROVER. The extrinsic calibration method will be reported in Section 3.4.

Single effective viewpoint constrains
As described in previous section PANROVER does not have the central characteristics this means that is not possible to model it as a single point projective system associated to a mirror. Central and not central design are showed in Figure 5. For a catadioptric camera to be a central system, the following arrangements have to be satisfied: for an hyperbolic mirror the camera optical centre has to coincide with the focus of the hyperbola; for a parabolic mirror camera, the lens should be orthographic. Cameras using fisheye lenses are not in general central systems, but they very well approximate the single viewpoint property.
(a) (b) Figure 5 In (a) a camera-mirror assembly non-central (i.e. nonsingle effective viewpoint) system where the optical rays coming from the camera and reflected by the mirror surface do not intersect into a unique point. In (b) a central camera where the single effective viewpoint property is perfectly verified. In both cases a non-orthographic projection is used to model the image plane formation .
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition) In the case of the BPL the non-central configuration implies the necessity to find the relation between a given 2D pixel point of the image plane and the 3D direction coming from the equivalent mirror surface effective viewpoint. This can be performed by Omnidirectional Toolbox (OCAMCALIB) under limited assumption.
Omnidirectional Toolbox differently from oldest ones, does not use any specific model of the omnidirectional sensor. It only assumes that the imaging function can be described by a Taylor series expansion whose coefficients are estimated by solving a four-step least-squares linear minimization problem, followed by a non-linear refinement based on the maximum likelihood criterion.
The method models a camera system by combining (as proposed in Baker et al 1998) an orthographic camera with a radial symmetric mirror.

Camera model
The model proposed associates the homogenous coordinates in the sensor reference system (centred in in Figure 6a) to a coaxial reference system ( called here after "centre of the camera") following an affine transformation: ( 1) where is an affine transformation which well approximates possible off-axis misalignment between the orthonormal projection and the equivalent mirror symmetrical axis.
The relation between a 3D point (expressed in homogeneous coordinates) and the coaxial reference system coordinates can be expressed by: (2) Where the versor represents the optical ray associated to the pixel and g is rotationally symmetric non-linear function representing the mirror. In the case of non-central cameras (see Figure 6b) this definition is not associated to a central point. In the case of central cameras (see Figure 6c) the versor satisfied even the equation: where, is the projection matrix centred for catadioptric systems in the focus of the parabolic or hyperbolic shape mirror. In the case of the omnidirectional camera it is assumed that the function is rotationally symmetric with respect to the sensor normal axis. Following this Equation (2) assumption can be rewritten as: Where is the distance in pixels of the projected point from the camera centre in the reference system and is the zenith angle of the chief ray respect to the horizon plane, positive in the hemisphere full covered.
We propose two different solutions and the definition of the function. Both the solutions use an N order polynomial here after indicated as: or as FPF (Forward Projective Function). First solution is a simple "direct model" where the chief ray is defined by the same polynomial function: (3) A more "physical approach" defines the chief ray as: The direct model returns a more constrained camera model. The simplicity of the model allows for instance to add as vinculum the monotony of the ifov (simple derivative of the function). On the other side, physical approach reaches the best performance but making not possible to impose any additive vinculum. This brings as described in Section 4 to a not correct interpretation of the geometry of the camera model in the outer regions of the field of view of the PF. The two models' main equations including the solver used in the nonlinear regression methods of the different stages of the calibration are reported in Table 5.

Direct Model Chief Ray IFoV Solver
Physical Model Chief Ray IFoV Solver Table 2 Main equations of the simple and physical models developed The Table 2 reports even the IFoV as derivative of the zenith angle with respect to the pixel distance from the centre. Note that while in the first case it assumes a polynomial form for the cgheif ray , in the physical model it is defined by a rational model which reaches at denominator two time the magnitude of the polynomial order.

Single camera calibration pipeline
During the calibration procedure, a planar chessboard pattern of known geometry is shown at different unknown positions. To each acquisition of a chessboard the system estimats 6 parameters (hereafter "target parameters") corresponding to 3 angles representing the orientation in the 3D space of the chessboard and 3 for the position with respect to the camera. Considering known the chess dimension of the target (4.2 cm in our case) the target parameters consists in the definition for each acquisition of the matrices which allows to calculate the 3D coordinates of each corner of the chessboard as: Where and are the horizontal and vertical number of the corners considered. The process foreseen: 1. detection of all the corners of the chessboards 2. estimation of the first extrinsic target parameters 3. estimation of the image projection function 4. fine tuning 5. centre detection 6. Non linear refinement Corners are detected following approach of Rufli et al 2008 paper with a satisfying detection rate of 95%. Figure 7 shows an example of the results of the detection. The image shows also the camera centre the two FoV Channels of the BPL: the Panoramic one where the target is shown and the Frontal Field (in green) covered during the calibration.
Once the corners of the chessboard are detected the first step of the process estimates the angular target parameters by minimizing residuals via SVD (Singular Value Decomposition) and thanks to the orthogonality of the target grid. The positional parameters are estimated up to 1 coordinate which will be defined in the next point.
A part the last coordinate the step allows to define for each image the rototraslation from the reference system of the target (shown in the example of Figure 7) to the camera reference system. The following point which estimates the polynomial parameters of the radial function (defined in Equations (3 ( 4) is a simple least-squares solution of a overdetermined system (obtained by using the pseudoinverse) which allows on one side to define the better polynomial solution and affine transformation (defined in Equation (1) and , on the other the last parameters of the positions of the chessboard in the camera space. A linear refinement process can then be applied to reprocess both target and intrinsic parameters.
The model is obviously strongly dependent by the position center of the omnidirectional image (Figure 6a) on the focal plane. A not correct definition of his center corresponds to misalignment of the symmetric axis of the mirror function and has as effect a increasing weight of the reprojection error. For this reason the center is at this point iteratively corrected minimizing globally the Sum of Squared Reprojection Errors (SSRE). The linear solution given foreseen by the last steps of the pocess is obtained through minimizing an algebraic distance, which is not physically meaningful. We choose to refine it through maximum likelihood inference. The last non linear refinement is in fact based on the Levenberg-Marquadt algorithm (implemented by the Matlab function lsqnonlin.). Starting from the estimated solution the algorithm has the scope to refine target and intrinsic parameters separately (in order to speed up the convergence). In a first step refines the target parameters, ignoring the intrinsic ones. Then, the second step uses the target parameters just estimated, and refines the intrinsic ones.

Stereo camera extrinsic paramters
Once targets and intrinsic parameters are calibrated for both the stereo channels a two steps process is used to estimate the extrinsic parameters. The first step is the definition of the common rototraslation between the target 3D corners coordinates, estimated in the calibration, and the 3D targets data. The second is a non linear refinement of all the target parameters and the extrinsic ones. Scope of the extrinsic calibration process is to define the matrices , which allows to move from the reference system to the one as follow: (6) for each couple of images . The , defined a as the coordinate normalized by mean, for both channels a first solution of can be evaluated by following the method 0 of least-squares rigid motion the minimum rotation matrix, evaluated by singular value decomposition, is as follows: The translation is derived from ((6).
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition) At this level the extrinsic parameter evaluated represents the solution which minimize the 3D error between the target parameter estimated by and calibrations. Moving both channels in the same reference system means even substitute the chessboards evaluated by both the channels (which derive from different calibrations) with a mean set. This first step does not take in account the residual of the projections on the images which, as described in the following section, will increase. The and solutions derived do not represent an optimal solution but a good starting point for a nonlinear regression method. The second step considers in fact extrinsic and target parameters evaluated as a state vector of a nonlinear system with the aim of minimize the residual correcting both the roto-translation between and system and the orientation and the position of the chessboard corners.

RESULTS
These sections reports the result of the calibration campaign provided in OAPD laboratory reporting the data bench characteristics (Section 4.1), the result of the intrinsic calibration both in the case of direct or physical model (Section 4.2) and the measurement of the extrinsic and target parameters (Section 4.3).

Calibration data bench
Geometrical calibration for the PANROVER stereo system and/or of its BPLs is based on the acquisition of 132 images each acquired with the target in a different position and orientation allowing to cover all the PF of the two cameras. The images are divided in 69 for UP channel and 63 for DW channel. Most of the images were acquired with a unique channel active; 23 were acquired synchronically by both channels in having the geometrical information for the extrinsic calibration of the stereo baseline.

Intrinsic calibration
The results of the intrinsic calibration for both cameras are reported in Table 3 for Direct model and in Table 4 Table 4 Calibration results for the physical model.
Comparing the DW optical head and the UP ones; the centre positions of the cameras differ because of the different mounting of the dioptric core of the BPL on the sensor systems. The affine transformation is approximated to an identity (in all the cases), this means that the centre of the camera is well detected and the mounting was correctly aligned: a misalignment between the projection axis and the symmetry or an off axis configuration would be solved with a correction of the affine transformation neglegible for both the cameras.  A function which makes well comparable the direct and physical models is the chief ray zenith-angle which associates, for each pixel, the corresponding angle with respect to the horizontals (equator). The function is derived by equation reported in Table  2 and shown in Figure 9. Figure 9 Angle of the optical ray as a function of the distance from the center of the camera model for the optical head and both the models considered (direct and physical).
The figure shown the chief ray function for the two models considered.
Zenith directions correspond to -90° while, 0 angle correspond to horizon, positive angles represent the hyper hemispheric extension of the PF.
Transparent regions show the limits of data acquired. For in the red/blue regions no tie point were detected. This means that in this region the models do not have any physical meaning but are the not constrained extension of the model. The difference between the direct and physical models in this plot are due to the priori constrains (see 3.2) imposed in the definition of the direct model. The constrain is well shown in Figure 10 Figure 10 IFoV as a function of the distance from the center of the camera model for the optical head and both the models considered.
The IFoV in both the cases is limited between 2 and 4 mrad. In the case of physical model the IFoV (which has no constraints ) has a maximum value at the end of the range of the tie point detected (around 800 px corresponding to the external limits of the PF FoV of Figure 7). The decreasing of the curves are due to the fact that, as shown in Table 2 and anticipated in Section 3.2, the physical model consequence is an IFoV which converge to zero for infinite .. The direct model on the other side present as expected a monotone IFoV. For each target considered the mean error and std of the residual due to the model was calculated. The results point at that the direct model guarantees a mean error of 2.34 px with a std of 1.04 px while the physical one (despite the presence of the notphysically maximum) is limited to a mean of 0.69 px and a std of 0.47.

Target and extrinsic calibration
The performances of the target and extrinsic calibrations can be well understood by analyzing the residual of the projection of the 3D corner coordinates through the model defined by intrinsic parameters.
The standard deviation of the residuals in the different phases of the process are reported in  Table 5 Mean value of the standard deviation of the residuals in the three pipeline phases.
Considering the results provided in the previous section only physical results are here reported. A more detailed plot of the residuals for each of the 19chessboards considered is shown in Figure 11. Figure 11 Residual std associated to each chessboard acquired for the and channels in the three phases of the calibration.
The distribution of the error in the reprojection of the tie point is shown in f Figures 11 and 12. Each tie point of the calibration is depicted in the Figure 11with a spot with a dimension proportional to its error. Two example images are plotted as background.
The spots have a 0 ray for 0.1 px error and a maximum dimension of 10 px when the error achieve 2.96 px. Color scale has the same range moving from best reprojection (in green) to theworst one (in red).

(a)
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition) (b) Figure 12 Distribution of the reprojection error for the UP (a) and DW (b) channels.
Except for high error peakin the UP channel, due a single chessboard image, the reprojection errors have the same distribution in both channels. The most external region of the PF has a not correct projection definition probably due to the presence of the maximum behavior shown in Figure 10. The monotonicity of the IFoV (even limited to the PF field of view ) should solve this problem if we may include a constraint in the regression systems.

CONCLUSION
In this paper, we presented a novel stereo set-up based on hyper hemispherical lenses oriented to planetary rovers and the pipeline technique proposed for the geometrical calibration of the setup. The procedure is an extension of the calibration tool for single central camera. The user is only asked to collect a few images of a chess board, and to click on its corner points. This technique does not use any specific model of the omnidirectional sensor and allows to define the intrinsic and extrinsic parameters of the stereo setup.