POSE VERSUS STATE: ARE SENSOR POSITION AND ATTITUDE SUFFICIENT FOR MODERN PHOTOGRAMMETRY AND REMOTE SENSING?

We investigate the advantages of using what we call sensor state parameters or sensor state to describe the geometrical relationship between a sensor and the 3D space or the 4D time-space that extend the traditional image pose or orientation (position and attitude) to the image state. In our concept, at some point in time t, the sensor state is a 12-dimensional vector composed of four 3-dimensional subvectors p(t), v(t), γ(t) and ω(t). Roughly speaking, p(t) is the sensor’s position, v(t) its linear velocity, γ(t) its attitude and ω(t) its angular velocity. It is clear that the state concept extends the pose or orientation ones and attempts to describe both a sensor’s statics (p(t),γ(t)) and dynamics (v(t),ω(t)). It is also clear that if p(t), γ(t) are known for all t within some time interval of interest, then v(t) and ω(t) can be derived from the former. We present three methods to compute the state parameters, two for the continuous case and one for the discrete case. The first two methods rely on the availability of inertial measurements and their derived time-Position-Velocity-Attitude (tPVA) trajectories. The first method extends the INS mechanization equations and the second method derives the IMU angular velocities from INS mechanization equations’ output data. The third method derives lineal and angular velocities from relative orientation parameters. We illustrate how sensor states can be applied to solve practical problems. For this purpose we have selected three cases: multi-sensor synchronization calibration, correction of image motion blur (translational and rotational) and accurate orientation of images acquired with focal-plane shutters.


INTRODUCTION AND MOTIVATION
For many years, sensor position and attitude parameters -orientation parameters-have served well the needs of photogrammetry, remote sensing and their sister and customer disciplines; i.e., the traditional 3D reconstruction tasks with traditional sensors can be performed with them.However, in recent times, the complexity and variety of photogrammetric and remote sensing tasks as well as the variety of sensing devices has blossomed.The cost of the sensing devices in use -including but not limited to cameras-ranges from a few hundred to a million euros (C).Sensing techniques have diversified from film cameras to digital ones (including multispectral and hyperspectral frequency bands), to LiDAR and radar.In addition to geometric 3D scene reconstruction (a genuine photogrammetric task) and to terrain classification (a genuine remote sensing tasks), images are restored, integrated with other images, geometrically and radiometrically calibrated, time aligned to a common time reference frame, and processed in many other ways.For this purpose, it is many times convenient to know the sensor's velocity or the ratio of change of its attitude matrix.
In this paper we explore the convenience, advantages and disadvantages of using what we call "sensor state parameters" -"camera state parameters" when appropriate-to describe the geometrical relationship between a sensor and the 3D space or the 4D time-space.In our concept, at some point in time t, the sensor state is a 12-dimensional vector composed of four 3-dimensional subvectors p(t), v(t), γ(t) and ω(t).Roughly speaking, p(t) is the sensor's position, v(t) its linear velocity, γ(t) its attitude and ω(t) its angular velocity.It is clear that the "state" concept extends the "pose" or "orientation" ones and attempts to describe both a sensor's statics (p(t),γ(t)) and dynamics (v(t),ω(t)).It is also clear that if p(t), γ(t) are known for all t within some time interval of interest, then v(t) and ω(t) can be derived from the former.However, this is not always the case, as it is inconvenient to keep the camera trajectory {p(t), γ(t)} t and make it available to potential users.
First, we propose a modification of the INS mechanization equations to include the angular velocity as an unknown state.Second, we describe a post-processing method where measured angular rates are corrected with the angular rate sensors estimated systematic errors.Last, we derive sensor linear and angular velocities from relative orientation parameters.
Three theoretical cases will be presented: multi-sensor synchronization calibration, correction of image motion blur (translational and rotational) and accurate orientation of images acquired with focal-plane shutters.
In the multi-sensor synchronization calibration, i.e., in the estimation of time delays between the instrumental time coordinate reference frames (CRFs) of the various sensors of a system, we will build upon our own results where we used the knowledge of p(t), v(t), γ(t) and ω(t) for the estimation of the δt j i between the CRFs of instruments i and j.We claim, that using the 12 state parameters makes the use of the aerial control models in our extended aerial control models straightforward and, therefore, the knowledge of the state parameters and their covariance matrices becomes a big advantage.
In image translational and rotational motion deblurring, we build upon the technique presented by N. Joshi in 2010.N. Joshi used, for the first time, inertial measurement units for dense, per-pixel spatially-varying image deblurring.In our proposal their method can be further simplified because we propose to use INS/GNSS integrated results and not only INS results, thus avoiding their smart, though complex, INS drift correction.
In the orientation of images acquired with cameras whose shutter is of the focal-plane type, we propose a new orientation model that extends the classical colinearity equations and that takes into account that, although all image rows are exposed during a time interval of the same length, not all rows are exposed with the same central time.For this purpose, the twelve image state parameters are used to derive more accurate orientation parameters for each image row.
The core of the paper is organized in four sections, 2 to 5: introduction of the sensor state parameters (section 2; with frame, notation and naming conventions in section 2.1 and with the definition of sensor states in 2.2); mathematical structure of pose and state parameters (section 3); their computation (section 4); and some of their applications (section 5; to image deblurring in section 5.1, to multi-sensor system synchronization in section 5.2 and to orientation and calibration of low-cost and amateur cameras in section 5.3).

Reference frames, notation and naming conventions
We begin by describing the coordinate systems and reference frames -i.e., coordinate reference frames (CRF) or just framesof our models.
For the object frame we have selected the local [terrestrial] geodetic cartesian frame parameterized by the east (e), north (n) and up (u) coordinates.This type of frame will be denoted by l. c will be camera or imaging instrument frame and b the inertial measurement unit (IMU) frame.When mounted on a vehicle, we will assume that both c and b are defined following the (forward-leftup) convention.
In the equations we will use superscripts to indicate the frame of vectors like in p l or v l ; superscripts and subscripts to indicate the "to" and "from" frames, respectively, of rotation matrices and their angular parametrization like in R(γ) l c or γ l c ; superscripts and double subscripts like in Ω(ω) c lc or γ c lc to indicate the angular velocity matrix or vector of the frame c with respect to the frame l measured in the frame c.Time dependency, also for frames, will be written as usually like in p l (t), γ c lc (t) or x c(t) .
As usual, the superscript T will indicate transpose of a vector or matrix.If clear from the context, in order to simplify notation and facilitate lecture, we will avoid its use and write (e, n, u) instead of the formally correct (e, n, u) T .
Last, if a rotation or angular velocity matrix is parametrized by angles γ l c and ω c lc respectively, we will write R(γ) l c and Ω(ω) c lc and neither R(γ l c ) nor Ω(ω c lc ).Time dependency of these matrices will be written R(γ) l c (t) and Ω(ω) c lc (t).

The concept of sensor state
We define s l c the state of a sensing instrument or, simply, the sensor state or sensor state parameters as the duodecuple -the quadruple of triples- where p l = (e, n, u) l , v l = (ve, vn, vu) l , γ l c = (ω, ϕ, κ) l c and ω c lc = (ωx, ωy, ωz) c lc .p l is the position vector of the origin of the instrumental CRF in the l frame and v l its velocity vector.γ l c is a parametrization of R(γ) l c , the rotation matrix from c to l -the sensor attitude matrix-and ω c lc is the vector of instantaneous angular velocities of the c frame with respect to the l frame, referred to the c frame.ω c lc can be also seen as a parameterization of the angular velocity skew-symmetric matrix Ω(ω) c lc -the sensor angular velocity matrix- We will write W (n) to denote the set of all real skew-symmetric n × n matrices.Following the notation conventions of section 2.1 in the sequel we will write R(γ) l c and Ω(ω) c lc for the rotation and angular velocity matrices respectively.Last, when it is clear from the context we will write the sensor state defined in 1 as Clearly, the sensor state parameters (p, v, γ, ω) generalize and include the sensor pose or exterior orientation parameters (p, γ).If they are time dependent differentiable functions they are related as follows (2) We borrowed the term "state" from mechanics and navigation.

MATHEMATICAL STRUCTURE OF POSE AND STATE PARAMETERS
The sensor's pose or exterior orientation parameters p l and γ l cstrictly speaking p l c and γ l c -describe the translation and rotation of the sensing instrument 3D reference frame c with respect to a world or object 3D reference frame l.In other words, (p l , γ l c ) represent a rigid body transformation in 3 from x c in x l In the mathematical abstraction, the set of affine maps ϕ of 3 , ϕ(x) = Rx + p, where R ∈ SO(3) and p ∈ 3 with the binary operator • defined as is the group of rigid motions SE(3), the Special Euclidean group of 3 .We recall that SO(3) is the Special Orthogonal group of 3 -i.e., the group of rotation matrices in 3 -and that SE(3) is a semidirect product of SO(3) and 3 , SE(3) = SO(3) 3 (Gallier, 2011).
Let us now assume that the sensor moves; i.e., that the reference frame c is a function of time c(t) and that the reference frame l remains static.Then, it holds Let us also assume now that x c(t) does not move with respect to the c(t) instrumental reference frame; i.e., ẋc(t) = 0.In this case ẋc(t) = 0. Thus, the dynamics (x l (t), ẋl (t)) in l of a point x c(t) , static with respect to the instrumental reference frame c(t), can be derived from p(t), v(t), γ(t) and ω(t).Therefore, p(t), v(t), γ(t) and ω(t) characterize the dynamics of the instrument.

COMPUTATION OF SENSOR STATES
In this section we present two approaches to estimate the sensor state parameters; one based on the measurements and/or results of INS/GNSS integration and one based on the relative orientation and angular velocities between consecutive images of image sequences.

States from inertial and GNSS measurements
Sensors states can be computed in a number of ways.Since the beginning of the nineteen nineties integrated INS/GNSS systems (Schwarz et al., 1993) have been used to directly estimate or to indirectly contribute to the estimation of the orientation parameters of imaging instruments.In the latter case the estimation method is known as Integrated Sensor Orientation (ISO) and in the former as Direct Georeferencing (DG) or Direct Sensor Orientation (DiSO).When neither INS/GNSS nor GNSS aerial control observations are available (or of sufficient precision and accuracy) sensor orientation has to resort to the old Aerial Triangulation (AT) or Indirect Sensor Orientation (InSO).
INS/GNSS systems provide high-frequency -typically, from 50 to 1000 Hz-time-Position-Velocity (tPVA) trajectory solutions that, as indicated, have been used for DiSO or ISO orientation in P&RS.As already pointed in (Blázquez, 2008), the velocity information of INS/GNSS-derived trajectories has not been fully exploited.The tPVA trajectories can be easily transformed into imaging instruments orientation parameters and velocity parameters by propagating them through lever arms (spatial shifts) and rotation matrices (angular shifts, also known as boresight angles) into a part of the state parameters s l c , namely the nine components (p l , v l , γ l c ).However, for the purposes of this article, the instrument angular velocities ω c lc are still missing.
The tPVA INS/GNSS-derived trajectories result from the [numerical] integration of the inertial [stochastic differential] equations of motion also known as inertial mechanization equations (Jekeli, 2001, Titterton andWeston, 2004).These stochastic differential equations (SDE) generate the corresponding stochastic dynamical system (SDS) that, together with external static [update] measurements can be processed with a Kalman filtering and smoothing strategy.GNSS carrier phase measurements or their derived positions are the usual update measurements.In other words, INS/GNSS integration as usual with an augmented state vector with the inertial measurement unit (IMU) error states and, possibly, other error modeling states.If we take into account that where b is the coordinate reference frame (CRF) of the inertial measurement unit, our problem would be solved if we were able to estimate ω b lb in or from the Kalman filter and smoother.
In order to do so, we can use the IMU angular rate raw measurements and correct them according to the IMU angular rate calibration model and states.Or we can introduce a new [3D vector] state in the KFS for ω b lb .There are advantages and disadvantages to each approach.In the latter approach, the extension of the KFS state vector with ω b lb requires a dynamical model for ω b lb be it through the trivial random walk model ωb lb = 0 + w with a large variance for w and then use the IMU angular rate measurements as update measurements.This would introduce update steps at the frequency of the IMU measurements which is computationally expensive and not recommended for the numerical integration algorithms.However the solution trajectory is conceptually "clean" and the state covariance matrices are directly available from the trajectory solution.The former approach, although less rigorous in principle, requires no modification of the INS/GNSS software which, for the P&RS community is more attractive and, in general, has less impact in the existing installed software base and production lines.

States from orientation parameters
In the previous section we have described how to determine the sensor state parameters from INS/GNSS systems.In general we favor this approach: INS/GNSS integration delivers an [almost] drift-free and [sufficient] high frequency trajectory from where sensor states can be interpolated.Of course, "almost" and "sufficient" are context dependent attributes.However, there may be situations where dense trajectory information is not available and, still, where sensor state parameters are required.In the case of image sequences like video images or photogrammetric blocks, a sensor's state parameters can easily be derived from its own and neighboring exterior orientation parameters if the latter are time tagged.Let us assume then the sequence {o l c (ti)}i of exterior orientation parameters v l (ti) can be derived from p l (ti) by standard numerical differentiation.ω l c (t k ) can be derived from the series {γ l c (ti)}i by, for instance, computing the relative attitude matrix and noting that the matrix logarithm of a rotation matrix is a skew symmetric matrix (Higham, 2008) which defines a recursive relation that can be exploited in the context of an extended bundle adjustment from starting and ending non-rotating instruments.

Image deblurring
Motion blurring occurs when, during the time exposure of an image, either the image moves, or imaged objects move or both.Motion blur becomes apparent for combinations of relatively long exposure times and/or rapid motions.The problem is well known in photogrammetry and traditionally solved with [mechanical] forward motion compensation (FMC) techniques under the assumption of dominant translational and negligible rotational motion effects.FMC systems add complexity -therefore costto the image acquisition systems and, sometimes, like in UAS photogrammetry, they also add non carriable critical additional weight.(The topic of motion blur in UAS photogrammetry has been recently addressed in (Sieberth et al., 2013).)Image deblurring is an area of growing interest.
An observed blurred image B can be thought as an ideal, original, unblurred image L -the latent image-that has been degraded by motion.It is customary to model this image degradation as where v is positive additive noise and k is a convolution kernel that translates the camera/object 3D motion into 2D image "motion" that, in imaging terms, can be interpreted as a point spread function (PSF).k is also known as the blur function.Image deblurring is the process of recovering L from known B, known v and, known or unknown, k.Image deblurring is a deconvolution problem.If the blur function k is known, L can be recovered with a non-blind deconvolution method.However, for images with just orientation parameters (p l (t), γ l c (t)), k is unknown and must be recovered together with L with a blind-deconvolution method.Blind deconvolution is an ill-posed problem, as the observed blurred image B contributes insufficient measurements.If the camera trajectory {p l (t), v l (t), γ l c (t), ω c lc (t)}t and the imaged object are known, k can be unambiguously determined.According to this, in P&RS, depending on the phase of the map production line, k can be determined.Moreover, considering the short exposure times and long distances camera-object, k can be approximated with just the knowledge of the image state parameters at the central exposure time t, p l (t), v l (t), γ l c (t), ω c lc (t) , or of the state parameters and the average height above ground.
For known k, the Lucy-Richardson iterative deconvolution method, also known as the Richardson-Lucy algorithm (Lucy, 1974), is usually applied.An example of a blind deconvolution algorithm, for unknown or poorly determined k can be found in (Lam and Goodman, 2000).Blind deconvolution is more complex and inefficient that non-blind deconvolution and therefore, considerable research has been devoted to estimate or measure, in one way or another, camera/image motion parameters.In (Lelégard et al., 2012) a method without the knowledge of the motion parameters is presented although the use of inertial measurements is suggested as an alternative.In (Shah and Schickler, 2012) rotational motion is extracted from single images by means of transparency cues of blurred objects in the image.In (Joshi et al., 2010) translation and rotational motion is derived from inertial measurements and, from those, an aided blind deconvolution algorithm is proposed.
Since exposure times in aerial photography are short, the use of the image state parameters shall be sufficient to characterize the dynamics of the image during the exposure time interval.
The (Joshi et al., 2010) approach can be simplified if, instead of a free INS trajectory solution, a driftless INS/GNSS or any other driftless trajectory is used; i.e., the blur function kernel k can easily be computed.

Sensor space-time calibration
In (Blázquez, 2008) and (Blázquez and Colomina, 2012) we presented a concept and the corresponding mathematical models for four dimensional space-time calibration of multi-sensor P&RS data acquisition systems as an extension of the integrated sensor orientation method.The key idea behind the concept was that INS/GNSS-derived linear and angular velocity parameters can be used to link synchronization errors to spatial errors.In this way, spatial control information -aerial or terrestrial-and an appropriate block geometry allow for simultaneous sensor calibration, synchronization error correction (time calibration) and even GNSS-related shift parameter estimation.The mathematical models are not reproduced here and the reader is referred to the mentioned references for details.

Orientation of low-cost and amateur cameras
It is often the case, that mass-market small-format cameras are used for mapping professional applications not only because of their low cost but also because of their low weight, relative high resolution -currently, up to 25 Mpx,-excellent optics and, in general, high performance-to-price ratio.
where e = E/v, ∆t(l) = (l − n/2)/v, E is the width (height) of the shutter sliding window, n is the row size (height) of the image sensor and v is the moving speed of the window.We will say that the central exposure time of line l or, simply, the exposure time t(l) of line l is t(l) = t + ∆t(l).Figure 1 depicts a simplified focal-plane shutter concept and relates it to the parameters E, n and v.For instance, for a a Sony NEX-7 camera with exposure time intervals of 1/4000 s, v ≈ 35 m/s.
In the above formulas we have used a (row, column) coordinate system whose origin is in the (up, left) image corner, where rows (lines) and columns are numbered 0, 1, . . .top-to-down and leftto-right respectively, and where the shutter window moves topto-down.We used the traditional photogrammetric (x, y) PPScentered coordinate system, then where ∆t(y) = −y/v, and where e, E, n, v have to be given in the appropriate physical units.Analogously, we will say that the central exposure time, or exposure time, t(y) of line y is t(y) = t + ∆t(y).
For the sake of clarity we will now neglect image distortions and atmospheric refraction.If the image state is p l , v l , γ l c , ω l lc and a ground point x l is projected into the image point (x, y), then we can write the classical colinearity condition (10) The above equations 9 and 10 are rigorous models for focalplane shutter cameras that take into account the non-simultaneous imaging effect of their shutter mechanism.Similar to the geometry of pushbroom cameras, if p l , v l , γ l c , ω l lc are known, imageto-ground -in general, image-to-object-transformations are straightforward.On the contrary, object-to-image transformations may involve iterative -simple and fast, though-algorithms.

CONCLUSIONS
We have explored the extension of camera exterior orientation or pose parameters (position and attitude) to what we call camera or image state parameters (position, velocity, attitude and attitude rates).The state parameters capture the dynamics of the camera.
We have shown the advantages of the state parameters over the orientation parameters with three examples: image deblurring, spatio-temporal calibration of imaging instruments and orientation of low-cost amateur cameras with focal-plane shutters.In the three cases, the state parameters, as we have formulated them, appear in a natural and independent way when trying to cope with some problems caused and/or related to the sensing instrument motion.
We plan to further investigate the mathematical structure of the state parameters in the 3 × 3 × SO(3) × W (3) space and the estimation of angular velocity states in the INS/GNSS Kalman filter and smoother.We also plan to experimentally asses the practical usefulness of the proposed methods for image deblurring and orientation of focal-plane shutter camera images.
The exposure time e of many, if not most, of these cameras is controlled by a focal-plane shutter located in front of the image sensor near the camera focal plane.Focal-plane shutters move a window aperture to uncover the image sensor in such a way that every pixel is exposed to light during a time interval of length e.Thus, if t is the image central exposure time, not all pixels or row of pixels (line) are exposed during the time interval [t − e/2, t + e/2].Instead, the actual exposure time interval T (l) of a line l is