AUTOMATIC TEXTURE MAPPING WITH AN OMNIDIRECTIONAL CAMERA MOUNTED ON A VEHICLE TOWARDS LARGE SCALE 3 D CITY MODELS

Today high resolution panoramic images with competitive quality have been widely used for rendering in some commercial systems. However the potential applications such as mapping, augmented reality and modelling which need accurate orientation information are still poorly studied. Urban models can be quickly obtained from aerial images or LIDAR, however with limited quality or efficiency due to low resolution textures and manual texture mapping work flow. We combine an Extended Kalman Filter (EKF) with the traditional Structure from Motion (SFM) method without any prior information based on a general camera model which can handle various kinds of omnidirectional and other kind of single perspective image sequences even with unconnected or weakly connected frames. The orientation results is then applied to mapping the textures from panoramas to the existing building models obtained from aerial photogrammetry. It turns out to largely improve the quality of the models and the efficiency of the modelling procedure.


INTRODUCTION
The emergence of panoramas can date back to the panoramic paintings by Robert Barker even before the invention of camera.A short history of panoramas can be found in Thomas's work [1].During the past hundreds of years, various panorama cameras have been built with the invention of smaller and more flexible cameras based on line scanning or stitching techniques.Large amounts of panoramas distributed all over the world have been collected through panorama sensors mounted on vehicles.Although panoramic images have been widely used for rendering in image-based immersive systems such as QuickTime VR [2], Google street view (Vincent, 2007), Bing streetside and streetslide [3], and etc., the applications of the panoramas have not been limited due to the complexity of the imaging model and the difficulty of accurate orientating, compared to ordinary digital images which have played a significant role in scene reconstruction.Panorama-based reconstruction and modelling also have been studied in several researches.On the one hand, panorama-based method is inferior to conventional methods such as aerial photogrammetry and LiDAR concerning the construction of large-scale 3D city; on the other hand, high resolution panoramas enable the extraction of high quality texture for rendering which is still unattainable through conventional methods, therefore panoramas can be combined with aerial images and LiDAR to make better results.
The geometric model of various panorama cameras has been widely studied.Single perspective camera can be simply modelled as the pinhole model with projective geometry (Hartley,2004), while due to the large distortion, the geometry of fisheye lens cameras has to be treated separately (Brauer, 2001;Schwalbe, 2005, Ying, 2006).Baker (1998) formulated the imaging model of catadioptric cameras with different types of mirror, and many researches (Kang, 2000;Micusik, 2003;Mei, 2004;Scaramuzza, 2009;) have been done for calibration and the epipolar geometry of those omnidirectional images.The geometry of polycentric cameras (Tang, 2001;Huang, 2001;Shum,2004 ) and stitching-based multi-camera rigs (Bakstein, 2004;Szeliski,2006) have also been studied.At the same time, numerous general camera models (Seitz, 2001;Yu, 2004;Sturm, 2005;Ponce,2009) have been introduced during the last decades.Geyer (2001) proposed a unified theory for centric panoramic system with a projective mapping from the sphere to a plane with a projection centre on the perpendicular to the plane.
The SfM techniques have been widely used in both photogrammetry and computer vision society, including early fraction-based methods, the commonly used bundle adjustment (BA) based methods and filters based methods.These techniques emphasise on accuracy and speed respectively.Before Snavely (2008) introduced the Skeletal graphs based methods, most of the BA based methods can only handle image sets with limited amounts.Key frame based methods (Klein, 2007;Klein, 2008) only consider a subset of the image sets, while local optimization techniques propagating information forwards and backward with a window of frames through filters (Andrew, 2003;Hauke, 2010) or local bundle adjustment (Nister, 2005;Pollefeys, 2008) make full use of the continuity of the sequential images; however the local techniques usually suffer from shifts of the camera trajectory and both of them introduce loss in accuracy especially when double-backs or loops exist.For panorama sequences, revisiting of the existing features is considerably common, so a full bundle adjustment is necessary for accurate pose estimation.Nevertheless, due to the large distortion of panoramic image and occlusions caused by pedestrian bridge and etc., unconnected frames and numerous miss matching of some frames are unavoidable and will defeat the pure bundle adjust method.
Accurately oriented panoramas can be used for reconstruction (Luhmann, 2004;Micusik, 2004;Hwang, 2006;Micusik, 2009).Debevec (1996) introduced the image based modelling (IBM) technique which can model building and texturing simultaneously through a view-dependent texture mapping method.Debevec's pioneering work Faç ade inspired the commercial development of IBM softwares such as Canoma, Google Sketchup, ImageModeler, PhotoModeler and Videotrace.Foreseeing the potential of panoramas, Autodesk Inc even extended the panorama-based IBM modules.However, usually we can only get the streetside vertical facades of the buildings due to the reachability of the vehicles.Fortunately the whole model of the buildings can be obtained from aerial images or LIDAR automatically or semi-automatically.Fusing panoramic images with other data sources tends to be a better choice.Panoramas can be fused with point clouds for rendering and species classification (Scheibe, 2004;Salemi, 2005;Haala, 2004;Schwalbe, 2005).Haala (2005) proposed a method for texture extracting and mapping method with panoramas however needs interactions.
Considering that the most commonly used cameras are stitching based cameras such as Ladybug and Eyesis which can be treated as a single perspective panoramic camera, we introduced the bundle-based projection model for calibrated cameras due to its simplicity and its linearity which makes it adaptive to the projective geometry of ordinary single perspective camera algebraically.We also adopted the Extended Kalman Filter (EKF) which is widely used in simultaneous localization and mapping (SLAM) in the SfM procedure to "jump" out of the snare of the unconnected frame through smoothing.

Imaging model
Generally, all of the images with a large field of view in one direction or more can be regarded as panoramic images.During the last century, various kinds of panoramic sensors have been invented.According to the imaging systems, most panoramic sensors can be catalogued into fisheye lens cameras, catadioptric cameras, polycentric cameras, line-scanning cameras and stitching-based cameras.Fisheye lens cameras are the simplest device though with limited field of view and low quality.Catadioptric cameras suffer from the same problem.Polycentric cameras are cheap and can be easily constructed but the projection geometry is quite complicated due to multiple perspective centres.Line-scanning cameras get rid of the problems stated above, but its usage is limited to still scene because of low imaging rate.Due to the smart size, flexibility, high quality and imaging rate, stitching-based cameras became the most commonly used panoramic sensor.Such sensors have been mounted on unmanned vehicles and various mobile mapping systems.
For standard single perspective cameras, all scene points X on the same line of sight project to single image point x which can be formulated with projective geometry as ,( 0) where  is a nonzero scalar value, xw and xi are homogeneous coordinates of 3D object points and 2D image points.The 3×4 matrix P is the projection matrix formulated with the camera matrix and camera pose.Such a representation allows the whole line passing through the perspective centre project to a single point on the image plane which will not cause any problem for ordinary digital images.However, for panoramic image with an omnidirectional field of view, any line passing through the perspective centre should have two images corresponding to the two object points the line reaches.Thus the line of sight should be regarded as two rays instead of one line , ( 0) where xc is the Cartesian ray under the camera coordinates system.For panoramic images, the relationship between the image points and the ray cannot be simply represented with a linear function just like the camera matrix for standard perspective camera.The imaging model can be regarded as a pose recover process and an imaging process under Cartesian coordinates as , ( 0, ) where g is the imaging function, R and c are rotation matrix and the perspective centre of the camera.All of the coordinate X, xc and x are in Cartesian coordinates, unless otherwise specified, the rest of the paper will follow such a notation.
With a calibrated panoramic sensor, the ray corresponding to any image points can be recovered with g -1 .Then, the general bundle based imaging model can be simply represented by a linear formula (3).Considering that any ray in the 3D space has only 2 degree of freedom, we chose the longitude and latitude parameterization of such rays, and the imaging function should be The imaging model can be represented as ( , ) Where Ci is the camera parameters of camera i, Xj is the objective Cartesian coordinates of feature j, and xi j is the image of feature j on the parameterized panoramic image i.
The camera parameters including the perspective centre of the camera and the direction of the camera, are of 3 degree of freedom.Considering the stability, we parameterize the direction of the camera with the vector part p=(q1,q2,q3) T of a unit quaternion q=(q0,q1,q2,q3) T during bundle adjustment instead of the traditional Euler angles.Thus we can get , , , , )

Epipolar geometry
According to epipolar geometry, giving two correspondences image points and their homogeneous coordinates xi and x'i captured by two calibrated perspective camera, we have 0 Where F is the rank 2 3×3 fundamental matrix with 7 degree of freedom.Once the camera is calibrated, the rays xc and x'c can be recovered from the calibrated camera matrix, 2 freedom related to the camera matrix can be eliminated from the fundamental matrix.0 Where E is the rank 2 3×3 essential matrix with 5 degree of freedom.For corresponding rays in two panoramas, the epipolar geometry can directly adopt the essential matrix formulation and can be calculated through direct linear 8-point algorithm or more stable 5 points algorithms.

SFM WITH EKF
Automatic feature detection and matching are the most important part of the automation of many of the computer applications including SfM.Though lots of feature detectors and descriptors (Tuytelaars, 2007) have been invented, the problem of feature detection seems always a matter.Because of the large illumination difference in the open space, descriptors use the photometric information directly can hardly get an acceptable result.What's more, except from the vary of illumination and perspective distortion due to view point change and occlusion and etc., the anisotropic distortion caused by parameterizing of sphere panorama is the most troublesome problem for matching and tracking.For equirectangular projection the distortion varies sharply from the equator to the pole.Scale Invariant Feature Transform (SIFT) detector (Lowe, 2004) which is invariant to scale and rotation and partially invariant to illumination and affine transformation has been believed overmatching other detectors (Mikolajczyk, 2005).KTL (Tomasi, 1991) is one of the most commonly used for feature tracking of image sequences.During our research, we compared the performance of the KTL tracker and SIFT matching on several panoramic image sequences, the KTL tracker cannot provide sufficient matching results due to street trees and moving crowds and vehicles.By contrast SIFT matching always obtains better performances.To remove bad corresondences, we adopt the hypothesize-and-test architecture based on the epipolar geometry through RANSAC () and Nister's 5 points altorithm (Nister, 2005) on each pair of images with sufficient connecting coarse correspondences.
The SfM pipeline begins with the choice of initial image pairs.For SfM with unordered image collections the initial pair needs to be carefully chosen because the reconstruction of the initial pair may sticks in the local minimum or with limited numbers of triangulated points with sufficient accuracy.Because most of the objects in panoramic sequences are trees, road, vehicles and buildings, we want the initial pair full of textures coming from buildings which are much easier for feature detection and matching.Once the initial pair is chosen, Nister's 5 points algorithm followed with a non-linear optimization is applied for the estimation of the relative pose of the image pair.The initial scene is then triangulated from the correspondences with the initial pose.After that bundle adjustment which aims to minimize the following function is applied to the existing cameras and the triangulated parameters.( , )) Where 12 12 ( , , , ) ( , , , ) For the task of automatic texture mapping with panorama sequences, there is no claim for real time reaction, we prefer to estimate the pose of every frame incrementally and run full bundle adjustment after every coarse initialization with the direct linear transformation (DLT) method.However, losing tracking due to the trees covering the street or even a single pedestrian bridge make the situation more complicated.What if a frame without sufficient correspondences or even unconnected exists?Filtering based SfM which needs less correspondences (Civera, 2009) and has been used in SLAM for long will be the solution.
We combined an EKF filter during our SfM process.Because we can do bundle adjustment after every initialization, we introduced an easy constructing EKF filter.To predict the camera pose we employ the instantaneous constant velocity assumption proposed in (Davison, 2007).The dynamic of the camera motion is described by a linear velocity v=(vx,vy,vz) T and an angular velocity w=(wx,wy,wz) T parameterized as the vector part of an unit quaterion.The sate vector X of the filter is the made up of the pose and dynamic parameters of the camera, thus ( , , ) By simply chosing the camera parameters after bundle adjustment as the measurements, the observation Z of the filtering system is just the state vector itself.The transition model and the observation model will be Where q is a function mapping the vector part to a unit quaternion, ⊙ is the multiplication of two quaternions, dx and dz are the process and observation noises which are both assumed to be zero mean multivariate Gaussian noises with covariance Qk and Rk obtained from the bundle adjustment result accordingly.Fi and Hi are the Jacobians of function f and h respect to the updated state vector Xi-1 and the predicted state vector Xi,i-1.
The update is performed with the standard EKF equations: The SfM module and the filter are relatively parallel with only data exchange.the SfM module provides the bundle adjustment results as the measurements of the filter (see figure 1).The filter will not intervene in the SfM process except the unconnected or weakly connected frame occurs.Once it happens, the filter predicts an estimate of the initial pose of the camera with the information accumulated during the long run.The accuracy of the initial estimate though coarse but is sufficient for the initialization of bundle adjustment.

AUTOMATIC TEXTURE MAPPING
Previous 3D building models were constructed manually, while nowadays it can be automated or semi-automated with airborne imagery, ground imagery airborne LiDAR and ground LiDAR (Musialski, 2012).For large 3D city modelling, the state-ofthe-art aerial image based automatic urban reconstruction method (Zebedin, 2008) is believed to be the best solution.Though the building can be reconstructed and textured properly, the detail of the facades of the buildings will lose due to the limitation of the view point of airborne images.
Most of the texture mapping methods adopted the viewdependent methods introduced by Debevec (1996).We follow such a method, and because of the omnidirectional coverage of the panoramic images, we only have to concern the viewpoint without considering the orientation of the standard digital images.For simplicity, we chose the nearest panorama as the sources for the facades of the buildings.To get the image patches for texture mapping we project the points of the facade onto the equirectangular panoramic image according to the imaging model, and resample the largely distorted image patch (see figure 2) onto a plane parallel to the facade.We tested our method with a Pointgrey Ladybug 2 camera mounted on a vehicle.The panoramic image sequences are captured along the Weijing Street located in Tianjing where the buildings along the street have been reconstructed accurately through aerial photogrammetry (see figure 3).The pose of the panoramic image sequences are accurately estimated from the combined method proposed above following with a registration between the triangulated points and the 3D digital models.Finally, we apply the automatic texture mapping algorithm on the data sets (see figure 4 and figure 5). 5. CONCLUSION 3D building models reconstructed from aerial images or point clouds can be automated or semi-automated with sufficient accuracy.While for immersive virtual reality applications, the models should have rich textures.Nowadays, high resolution panoramic image sequences with rich texture information can be easily obtained through state-of-the-art multi-camera rigs.Within the paper, we aims to automatic texture mapping with panoramic image sequences for existing 3D models and address the key problem as the accurate pose estimation of the image sequences.
Due to moving objects and occlusions in the image sequence, even the continuative pair may be unconnected or weakly connected without sufficient correspondences.We combined an EKF filter with the traditional SfM procedure.Through the intervention of the filter by providing camera pose predictions the unconnected frames are "skipped" from the initialization process of SfM.The drawback of the method is that it cannot handle such sequences with long continuous unconnected frame segments.Finally, we apply the accurately oriented panoramas for automatic texture mapping with the viewdependent method.

Figure 1 .
Figure 1.Pipeline of the SfM algorithm with an EKF filter