A GRAPH BASED BUNDLE ADJUSTMENT FOR INS-CAMERA CALIBRATION

Abstract. In this paper, we present a graph based approach for performing the system calibration of a sensor suite containing a fixed mounted camera and an inertial navigation system. The aim of the presented work is to obtain accurate direct georeferencing of camera images collected with small unmanned aerial systems. Prerequisite for using the pose measurements from the inertial navigation system as exterior orientation for the camera is the knowledge of the static offsets between these devices. Furthermore, the intrinsic parameters of the camera obtained in a laboratory tend to deviate slightly from the values during flights. This induces an in-flight calibration of the intrinsic camera parameters in addition to the mounting offsets between the two devices. The optimization of these values can be done by introducing them as parameters into a bundle adjustment process. We show how to solve this by exploiting a graph optimization framework, which is designed for the least square optimization of general error functions.


INTRODUCTION
There is ongoing research regarding the photogrammetric usage of small unmanned aerial systems (UAS) with payload capabilities of a few kilograms.These platforms are usually equipped with a GPS corrected inertial navigation system (INS) and a camera (Eisenbeiss, 2008).The exterior orientations of the aerial images are determinable by the usage of a bundle adjustment (BA) software.Prerequisites are a sufficient number of ground control points (GCP) and high image overlaps between adjacent flight strips.The observations of the GCPs have to be identified in the images as input for the BA.This makes clear that the BA is a time consuming post processing step, which has to be performed after the measurement flight is completed.
In contrast to this development, time-critical surveillance and rescue tasks have a high demand for a more flexible flight planning and the direct determination of object positions (Schikora et al., 2010).Despite the great development in visual navigation (Engel et al., 2012), the usage of a high-precision INS leads to the most accurate pose information in outdoor scenarios at realtime.The utilization of the measured positions and attitudes for the exterior camera orientations eliminates the need for the time consuming post processing in form of a BA to obtain the camera poses.Required is the knowledge of the static coordinate system transformation between the rigid mounted sensors.More precisely the position offset (lever-arm) and the angle misalignments (boresight) have to be estimated (Figure 1).This is known as INS-camera calibration.The most accurate calibration procedures estimate these parameters with an extended Kalman filter (Weiss and Achtelik, 2012) or integrate them as unknowns in a BA (Pinto and Forlani, 2002).
Presently, no freely available BA package is able to perform an INS-camera calibration and the adaptation of the BA implementations is an error prone and time consuming task.In contrast to this, the general graph optimization framework (g 2 o) is directly designed for the least square optimization of general error functions (Kuemmerle and Grisetti, 2011).The problem has to be embedded in a graph by representing the parameters to be optimized as vertices and the observations between them as edges.Further requirements are the definition of error functions for the In this paper, we present a graph based BA approach for the INScamera calibration.The paper starts with an overview on related research areas.It follows a description of the system setup which is focused on in this work.After the general definition of the problem, it is restated into a graph structure and the design of the error functions is described.Finally the approach is analyzed in numerical studies and the achieved results are discussed.

RELATED WORK
A related process of the INS-camera calibration is the so called hand-eye calibration.Given a camera mounted on a robot arm, the rigid-body transformation between the coordinate systems of these devices is estimated.As a result, measurements from the acquired images can be transformed in the robot arm coordinate system.This is necessary to interact with objects recognized and located in the images.The calibration is done by estimating the rigid body transformation from corresponding poses.A direct solution can be computed by firstly optimizing the rotational part and solving the equations for the translation afterwards (Tsai and Lenz, 1989).In contrast, it was shown that the nonlinear optimization for rotation and translation at the same time leads to more robust results in case of noise and measurement errors (Horaud and Dornaika, 1995).The motion of the robot arm is typically obtained from encoders, whereas nearly all approaches determine the camera movement by observing a calibration pattern.In contrast to that, the camera poses can be determined by a structure-from-motion approach.As a drawback, the camera movement can only be estimated up to a similarity transformation.This leads to an unknown scale, which as well has to be estimated during the calibration process.The obtained results are not as accurate as the calibrations from methods using camera calibration patterns (Andreff et al., 2001).However the approach has the advantage of being feasible without any additional equipment and therefore allows a recalibration during operation.
The algorithms developed for the hand-eye calibration had a big influence on another related problem: The calibration between an inertial measurement unit (IMU) and a camera.These devices can be combined to a vision-aided inertial navigation system.Measurements of the IMU in form of rotational velocities and linear accelerations can be integrated to determine the positions, velocities and attitudes of the device.During this process small estimation errors are summed up over time, which should be corrected by an aided sensor.Typically this is achieved by exploiting GPS measurements, which is not possible in space, underwater or indoor applications.An alternative is the usage of a camera in addition to the IMU.Prerequisite for exploiting camera based corrections is the knowledge of the transformation between the two devices.These offsets can be computed with modified hand-eye calibration algorithms (Lobo and Dias, 2007).The fact that consecutive measurements from shaft encoders are uncorrelated in contrast to data from an IMU results in different noise characteristics.Considering these time correlations leads to a higher accuracy of the estimations (Mirzaei and Roumeliotis, 2008).The authors utilize an extended Kalman filter (EKF) for estimating the pose transformation, which facilitates the computation of the covariance for the estimated parameters as an indicator of the achieved accuracy.In (Weiss and Achtelik, 2012) the authors describe an approach for navigating an UAS based on measurements from an IMU and a camera.By not using a calibration pattern, they realize the online estimation of the mounting parameters between these sensors with an EKF formulation.
To calibrate the transformation between a GPS-aided INS and a camera, the algorithms for the IMU-camera calibration can be used.This is based on the fact that an IMU is a fundamental component of an INS.The only prerequisite for applying an IMUcamera calibration is that the raw measurements of the IMU are accessible.However this is not the predominantly used method to perform a system calibration between an INS and a camera.The commercially available INS provide an integrated filtering process, exploiting the GPS measurements to correct the IMU estimations.In conjunction with GPS correction signals from ground control stations an absolute accuracy in a range of few centimeters for the position and a few hundredths of a degree for the attitude is achievable.Thus the INS provides a reliable stand alone source describing its own movement.This leads to estimation of the rigid-body transformation between the INS and the camera with methods similar to the hand-eye calibration.In a first step the camera movement is calculated with a structure from motion (SFM) approach and refined in a BA procedure.The observations of ground control points are used to scale the 3D-model to real world coordinates.In a second step, the transformation between the two devices is estimated by relating these absolute camera poses to time synchronized measurements from the INS.This widely used approach is known as two step procedure (Cramer et al., 2000).The advantage is that each bundle adjustment package can be used without modifications.On the other hand, the integration of the mounting parameters as variables to optimize in the BA is possible.This approach is known as single-step calibration and induces a simpler calibration of the mounting offsets due to more flexible flight courses (Pinto and Forlani, 2002).The simultaneous optimization of the rigid-body transformation between the devices and the intrinsic camera parameters should consider correlations between these values.An analysis of the flight pattern influence on the calibration parameters is discussed in (Kersting et al., 2011).The authors also state that at least one GCP is needed for the estimation of the vertical lever-arm and that the addition of multiple GCP does not improve the estimation results identifiable.The integrated sensor orientation has been examined and discussed widely within the OEEPE test (Heipke et al., 2002).They conclude that direct georeferencing, even though it does not achieve the accuracies of the classical bundle adjustment, is a serious alternative for many applications.
The investigations for the INS-camera calibration performed in the last decades are targeted at manned aircrafts equipped with high-precision INS and aerial cameras at high altitude.In contrast to this our goal is the direct georeferencing with small UAS.In this work, we investigate how the calibration process can be performed with an open source toolkit for general graph optimization to bypass the need for a commercial BA package.

SYSTEM SETUP
The objective of this research is the estimation of the rigid body transformation between a camera and a high-precision INS.The latter is based on fiber optic gyroscopes, which have a stability up to some hundredths of a degree per hour.In combination with real time kinematic enhanced GPS measurements very accurate pose information are generated.An absolute accuracy of ±2cm in the position and 0.01 • for the attitude angles are achieved in our system setup.This pose information shall be exploited for the determination of the exterior orientation of an optical camera.Precondition for this are a rigid mounting and a hardware synchronization between the two devices.If these conditions are satisfied, the mounting offsets (Figure 2) can be estimated by processing data from a measurement flight.Nearly every outdoor application utilizing a camera mounted on an aerial platform induces that the camera focus has been set to infinity.Furthermore, wide angle lenses are frequently the right choice for a given task.The minimal distance for a focused image and the big opening angle lead to an impractical size for the calibration pattern and make standard camera calibration procedures laborious.An easier estimation of these parameters can be achieved by performing an in-flight calibration, which uses a structure from motion (SFM) approach on the images acquired during a flight.Another advantage of this is that the intrinsic camera parameters are estimated during real conditions, which tends to differ slightly in comparison to a laboratory calibration due to climate and temperature changes (Jacobsen, 2001).
The INS measures the attitude as Euler angles, namely yaw, pitch and roll, with regard to local navigation systems.The latter are local east, north, up (ENU) coordinate systems, each with the origin in the current device position described through a GPS measurement in form of latitude, longitude and altitude (Figure 3).This implies tiny differences between the orientations of these coordinate systems and thus also in the measured attitude for successive timestamps.Our INS-camera calibration will be performed in an area of only a few hundred square meters, which allows to neglect these differences.Furthermore, we perform the INScamera calibration in an ENU coordinate system with the origin at a ground control point in the middle of the observed area.This prevents the need for earth curvature corrections which occur in mapping frames like the universal transverse mercator (UTM) coordinate system.To obtain the transformation from the position information of the INS given in latitude, longitude and altitude to ENU coordinates the earth-centered, earth-fixed coordinate system (Figure 3) is used as an intermediate step.

North2
Figure 3: Both, the navigation systems of the INS and the world reference frame are based on east, north, up (ENU) coordinate systems (green).They are local Cartesian coordinate systems with the origin tangential to the earth ellipsoid.Further we use the global earth-centered, earth fixed (ECEF) coordinate system (blue), to convert between ENU-coordinates and GPS measurements taken in latitude (ϕ) and longitude (λ) as polar coordinates (orange).

DEFINITIONS
We define the world reference frame W as ENU coordinate system with the origin being approximately in the middle of our calibration area.The rigid body motion gC i W describes the camera pose at the middle exposure time ti, i = 1, 2, ..., n with regard to the world frame W. Likewise specifies gI i W the corresponding configuration of the INS as rigid body motion.
In general, a rigid body motion g ∈ SE(3) describes how the points of a rigid object change over time.Instead of considering the continuous path of the movement, we bring into focus the mapping between the initial and the final configuration of the rigid body motion.This movement can be described by a rotation matrix R ∈ SO(3) and a translation vector t ∈ R 3 .Conse-quently the rigid body displacement G of a 3D point p ∈ R 3 can be performed by The representation of the rotational part in form of the overdetermined rotation matrix R is not suitable for the optimization performed in this work.A minimal representation is required.Being aware of the singularities which can occur by using Euler angles, we represent a rigid body motion by twist coordinates ξ = (v, ω) ∈ R 6 .Thereby, v ∈ R 3 describes the translational and the skew-symmetric matrix ω ∈ so(3) the rotational part of the full motion.The rotation angle in radians is encoded as ||ω||2.An element ξ ∈ se(3) can be written in its homogeneous representation as Given ξ ∈ se(3), we get a rigid body motion by the matrix exponential, which is defined as the always converging power series: This leads to an alternative formulation of Equation ( 1) using twist coordinates to determine the transformation of a 3D point p ∈ R 3 according to the rigid body motion g as follows: For more details on the used representation of rigid body motions we refer to (Ma et al., 2003).
Further we define the set of intrinsic camera calibration parameters k = {fx, fy, ox, oy, k1, k2}, whereby (fx, fy) describe the focal length, (ox, oy) the principal point and (k1, k2) the radial distortion of the camera.The projection π performs the mapping from a transformed 3D point G(g, p) = (x, y, z) to pixel coordinates as with the radial distortion factor d being defined by

PROBLEM FORMULATION
In order to describe the rigid body motion gC i W of the camera using the measured rigid body motion gI i W of the INS (Figure 4), the devices have to be rigidly mounted.This induces that the offsets between them are static and especially comprises that the rigid body motions describing the movements from the INS to the camera at various exposure times are equal ∀k, l ∈ {1, 2, ..., n} : This results in the problem considered in this work, the estimation of the rigid body motion gCI out of synchronized data from measurement flights.As a first step a SFM approach, which delivers an initial sparse 3D structure of the observed area and the corresponding pixel observations, has to be performed (Hartley and Zisserman, 2004).The refinement of this is usually done in a BA process, where the initial estimation of the 3D points and camera poses are optimized.Given a number of n images associated with the camera poses Ci and m 3D points pj with corresponding pixel observations xij.The classical BA approach minimizes the reprojection error of the 3D points as follows: The estimations for the intrinsic camera parameters k tend to differ slightly from a laboratory calibration, due to the climate and environmental conditions.Thus it is an advantage to optimize k by using images from a measurement flight in the classical BA procedure.Nevertheless, errors can be introduced to the intrinsic parameters, which are compensated by the independent camera positions.Therefore the joint calibration of the intrinsic camera parameters in conjunction with the mounting offsets gCI promises more accurate results.The latter are introduced by adding a term which measures how well the parameters of the camera poses gC i W in composition with the mounting offsets gCI satisfy the INS measurements.This is realized by comparing a synthetic measurement generated out of the actual camera pose and mounting offsets with the measured INS pose by using the inverse and the composition of rigid body motions.Adding this constraint in the form of an additional sum to Equation ( 10), we specify our objective function: (11)

GRAPH OPTIMIZATION
The non linear least square problem defined in Equation ( 11) can be optimized by using g 2 o, the general graph optimization frame-work (Kuemmerle and Grisetti, 2011).The problem has to be embedded in a graph by introducing the variables to optimize as nodes and the observations between them as edges.In the following we will present the restatement of our objective function into the graph-based formulation.
The calibration parameters represented as the intrinsic camera parameters k and the mounting offsets gCI are added as nodes to the graph.Furthermore we add each 3D point pj and each camera pose gC i W as a node to the graph.The connection between these nodes is given by inserting our observations as edges into the graph.A pixel measurement connects three different nodes, namely: a camera, a 3D-point and the intrinsic calibration parameters.This constraint can be realized with a hyperedge, which is able to connect an arbitrary number of nodes.The edge of a INS measurement connects the corresponding rigid body motion of the camera with the mounting offsets.A visualization of the graph is given in Figure 5.
The objective function of the stated problem can be illustrated by a hyper-graph.The measurements (boxes) are presented as links between the nodes concerning each multiple sets of variables (circles).For improved overview multiple state variables and measurements of the same type are visualized in a stacked view unrelated to their number of occurrence.
Further, we have to define error functions, which measure how well our measurements are described by the state variables they are connecting.Our first constraint measures the error occurring by the reprojection of a 3D point into the image, in the same form as in Equation ( 10).The error function for this constraint can be expressed as: The resulting error vector has dimension two and is 0 if the pixel observation is perfectly described by the state variables.Our second error function states how well the INS measurements can be described by the composition of the camera poses gC i W and the mounting offsets gCI as follows: Using the twist representation for the rigid body motions, we receive a 6-dimensional error vector, which is 0 if the parameters perfectly satisfy the measurement.
Without limiting the generality, we refer to the whole state vector [k, gC i W, pj, gCI] as y and formulate our optimization problem as follows: The g 2 o framework uses the Levenberg-Marquardt (LM) algorithm to compute a numerical solution of Equation ( 14) and therefore needs a good initial guess y of the state vector.Iteratively, the first order Taylor expansion around the current guess y is used to approximate Equation ( 14) and optimize the local increments ∆y by solving the resulting sparse linear system.The center for the next iteration is obtained by adding the optimized increments to the initial guess.This is done by using the motion composition for the state variables represented by rigid body motions and a simple addition for the 3D points and intrinsic camera parameters.For a detailed description of the LM algorithm we refer the reader to (Lourakis and Argyros, 2009).

EVALUATION
In this section, we present results from numerical studies to validate the proposed approach.The main purpose of this is to analyze the influence of various flight configuration and settings on the calibration parameters by comparing the estimations with the known ground truth.
In the simulations we set the mounting parameters consisting of the lever-arm components xL, yL and zL as well as the angle misalignments in form of the Euler angles yaw = ψB, pitch = θB and roll = φB (Figure 2) according to the ground truth defined in Table 1.The initialization with θB = 180 • is a result of the definition of the INS frame as ENU coordinate system, while the downward directed camera is modeled with the z-coordinate pointing in viewing direction.The intrinsic camera parameters stated in Table 1 describe the modeled camera with an image size of 3296 by 2472 pixels using a wide angle lens.The simulated flight courses consist of INS poses sampled along straight lines according to the camera exposure times, determined by the velocity of the UAS and the cameras frames per second.For a more realistic flight path, the ideal poses are modified by a small random factor and the resulting poses are considered as ground truth.One GCP is defined in the middle of the observed area.A fixed number of additional 3D points are sampled in the observed area.The pixel observations are determined by projecting the 3D points into the images according to Equation (6) using both, the generated camera poses and the ground truth of the intrinsic camera parameters defined in Table 1.A realistic number of observations is produced by defining a probability of detection of 50% instead of using all projections of the 3D-points which are in the field of view of a camera.Concerning noise, we simulate the image observations with an accuracy of 0.5 pixels.Further we add white Gaussian noise to the INS poses using a standard deviation of σL = 0.02 for the position components and σB = 0.01 for the Euler angles.
The principle of the flight configuration stated as optimal in (Kersting et al., 2011)  distance of 20 m (Figure 6a).We sampled our INS poses along these lines twice, once in each direction.This results in an image side overlap of 50% for poses on the adjacent lines.The same maneuver was simulated at a flying height of 30 m resulting in a side overlap of about 66%.Two different altitudes in combination with a GCP were used to decouple the high correlation between the lever-arm and the focal length as well as the principal point.The definition of 20 m strip length in conjunction with a speed of 36 km/h and 5 images per second leads to a total number of 80 images with more than 93% image overlap in flying direction.This course was used in Simulation 1, 2 and 3 by sampling 1000, 3000 and 6000 points.The results in Table 2 show that increasing the point number leads to more accurate results for the calibration parameters as assumed.Remarkable is that the angle misalignment is nearly the same for Simulation 2 and 3, which shows that increasing this value further will have little influence on these parameters.The error in the lever arm component zL is for all three Simulations about four times bigger than in the other directions, which is also reflected in the estimation of the correlated focal length and principal point.The first three simulations show that the accuracy of the lever-arm optimization is worse than usual measurements out of constructions drawing or terrestrial measurement.Due to this observation, we fix the lever-arm components in the following experiments.
For Simulation 4, the same flight and parameter configuration as in Simulation 2 was used.The optimization leads to better results in all estimated parameters (Table 3).The fixed lever-arm also removes the requirement of two flight altitudes.We investigated the influence of the flight course by performing simulations of the patterns visualized in (Figure 6), whereby for each simulation a total number of 3000 points was sampled in the observed  area.By performing the quadratic course in Simulation 5 and the star pattern in Simulation 6 only at an altitude of 20 m, we get the same number of cameras as in all other simulations.In comparison with Simulation 4, the errors of Simulation 5 are slightly smaller for the intrinsic camera parameters, but higher for the roll angle.The star pattern performed in Simulation 6 leads to a bigger error in the yaw angle (Table 3).Overall the influence of the investigated flight courses on the accuracies of the estimated calibration parameters is small.Another observation was made by performing the flight course depicted in Figure 6a at altitudes of 200 m and 300 m in Simulation 7, which is ten times higher than in Simulation 4. As expected, the errors in the angle misalignment are smaller due to there bigger influence on the object points at higher altitudes (Table 3).2.2965e −5 2.8315e −5 3.7536e −5 2.5814e −5 k2 2.5092e −5 1.8952e −5 2.3544e −5 1.0906e −5

RMSE
Table 3: This table shows the RMSE of the calibration parameters achieved by performing 100 Monte Carlo trials for the optimization with an fixed lever-arm for various flight maneuvers.

CONCLUSION AND FUTURE WORK
In this paper we presented a graph-based approach for the INScamera calibration.The high correlation between the lever-arm of the mounting offsets and the intrinsic camera parameters should be decoupled by performing the flight maneuvers at two different altitudes.Even though, the produced results showed that the accuracy of the lever-arm optimization is worse than usual measurements out of construction drawings or terrestrial measurements.
Our simulations confirmed the recommendation to perform only the optimization of the intrinsic camera parameters and the misalignment angles between the devices in the system calibration (Kresse et al., 2006).Moreover, this eliminates the constraints of using two different altitudes.In conjunction with the small influence of the flight patterns a recalibration during each operation seems possible.Regarding the usage of small UAS, which in general are used at low altitude, we conclude that the calibration of the angle misalignment should be performed as high as possible to achieve the most accurate results.
Future work will investigate the proposed procedure in more detail.Based on the gathered insight, we will perform real flight experiments and evaluate the approach further.

Figure 1 :
Figure 1: The calibration of the static coordinate system offsets between a camera and an INS enables the direct georeferencing of images collected with small UAS.observations and good initial values for the state variables.The numerical solution of the problem can be computed with an implementation of the Levenberg-Marquardt algorithm.

Figure 2 :
Figure 2: The rigid mounting of a camera and a INS induces static offsets between the underlying coordinate systems denoted by C and I.More precisely position offsets xL, yL and zL as well as angle misalignments, here visualized in form of Euler angles yaw = ψB, pitch = θB and roll = φB, arise.These have to be determined in an INS-camera calibration.

Figure 4 :
Figure 4: Both, the poses of the INS frame I and of the camera frame C can be described by rigid body motions with regard to the world frame W. The estimation of the static rigid body motion gCI between the devices (dotted arrow), enables by composition with the INS measurements gI i W the description of the camera poses gC i W.

Figure 6 :
Figure 6: The three flight configurations investigated in this study are visualized in a top view.Each line is sampled in both directions, resulting in an image overlap of approximately 100%.

Table 1 :
was used to define our first simulated flight course.At a flying height of 20 m we defined two lines with a This table states the initialization values for the optimization process and the ground truth values used in our experiments.

Table 2 :
This table shows the RMSE of the calibration parameters achieved by performing 100 Monte Carlo trials for the first flight course with an increasing number of 3D-points.