PHOTOGRAMMETRIC RECONSTRUCTION WITH BAYESIAN INFORMATION

Nowadays photogrammetry and laser scanning methods are the most wide spread surveying techniques. Laser scanning methods usually allow to obtain more accurate results with respect to photogrammetry, but their use have some issues, e.g. related to the high cost of the instrumentation and the typical need of high qualified personnel to acquire experimental data on the field. Differently, photogrammetric reconstruction can be achieved by means of low cost devices and by persons without specific training. Furthermore, the recent diffusion of smart devices (e.g. smartphones) embedded with imaging and positioning sensors (i.e. standard camera, GNSS receiver, inertial measurement unit) is opening the possibility of integrating more information in the photogrammetric reconstruction procedure, in order to increase its computational efficiency, its robustness and accuracy. In accordance with the above observations, this paper examines and validates new possibilities for the integration of information provided by the inertial measurement unit (IMU) into the photogrammetric reconstruction procedure, and, to be more specific, into the procedure for solving the feature matching and the bundle adjustment problems.


INTRODUCTION
Most of the Mobile Mapping Systems (MMSs) developed in the last twenty years consider the use of remote sensing/imaging devices mounted on quite expensive terrestrial or airborne vehicles.
However, thanks to the diffusion of a large quantity of devices embedded with several small sensors (typically based on MEMS), the development of small and relatively cheap mobile mapping systems have been taken in consideration in recent years.For instance, smartphones have good potential in this sense thanks to the concentration of several sensors in easily portable and widely diffused devices.However, data acquisition by means of smartphones is limited by the restricted mobility of the person carrying the device (movements are typically approximately constrained to a 2D plane/surface).Differently, drones/unmanned aerial vehicles have the advantage of allowing to acquire measurements in a much larger area in the same time lapse, while moving on (almost) all the free space 3D positions, if needed.
The availability of different types of surveying devices allow the development of a wider range of mobile mapping solutions (Chiang et al., 2003, Toth and Grejner-Brzezinska, 1997, Toth, 2001, Pirotti et al., 2014, Kraus and Pfeifer, 1998, Remondino et al., 2011), and, consequently, the availability of more spatial data that can enable the development of even more applications, e.g.Geographic information systems (GIS), location based services, object recognition (El-Sheimy and Schwarz, 1998, Guarnieri et al., 2015, Piragnolo et al., 2015, Pfeifer et al., 2004, Masiero et al., 2015b, Facco et al., 2009, Facco et al., 2010).Thanks to the small size, the cheap cost and the easy usability of cameras, independently of the specific considered device in most of the cases a standard camera is used as imaging sensor, and photogrammetry is used in order to obtain reconstructions of the observed scenes (Prosdocimi et al., 2015).
Photogrammetry is one of the most used techniques for 3D reconstruction, monitoring and surveying.It is widely used in several applications and in different working conditions.The accuracy of photogrammetry reconstruction methods changes depending on the working conditions (e.g. the number of acquired images, lighting conditions, baselines between images), and it is strictly related to the success of the solution of the Structure from Motion (SfM) problem (Hartley andZisserman, 2003, Ma et al., 2003).
Despite its diffused use and the ever growing improvements to the reconstruction technique, photogrammetry still does not reach the same level of reliability of laser scanning surveying techniques (which can be considered the current state of the art for 3D reconstructions, (Remondino et al., 2005)) and the same level of applications: more specifically, significant issues may occur in photogrammetric reconstructions when in presence of lighting problems or when the object of interest is not sufficiently textured.However, photogrammetry has several advantages with respect to laser scanning techniques: in particular, it relies on the use of much cheaper devices, surveying is usually faster and it can be performed by less trained personnel with respect to terrestrial laser scanning.
The recent development of positioning systems based on the integration of information provided by different sensors allows to reduce the issues related to the use of GNSS as positioning approach (Lukianto and Sternberg, 2011, Masiero et al., 2014a, Saeedi et al., 2014, Toth et al., 2015): thanks to the use of inertial navigation systems, radio signals (WiFi and Bluetooth communications, UWB sensors) and (if available) a priori geometric constraints on the environment, it is possible to obtain reliable and quite accurate estimates of the position and attitude of the mobile mapping device, where the level of estimation accuracy depends on the specific considered system and the working conditions (e.g. the environment characteristics).This paper considers the use of position and orientation information (provided by the above mentioned navigation systems) in order to improve the computational efficiency and the robustness of the photogrammetric reconstruction algorithm.Three changes with respect to the standard photogrammetric reconstruction algorithm are considered: • First, exploiting the device position and attitude information, an approximate essential matrix is computed.As shown in section 3., this allows to reduce the number of incorrect feature matches by imposing (approximate) epipolar constraints.
• Then, more accurate epipolar constraints are obtained by using the RANdom SAmple Consensus (RANSAC (Fischler and Bolles, 1981)) algorithm: the efficiency and robustness of the latter are improved thanks to the use of a two-step learning procedure, where a reduced complexity RANSAC is executed first, while (similarly to the locally optimized RANSAC (Chum et al., 2003)) the full RANSAC runs only on the inliers of the estimated reduced complexity model (4.).Differently from (Chum et al., 2003) the reduced complexity step is enabled by the use of information provided by the navigation system.
• Finally, in section 5. a modified bundle adjustment optimization problem is considered in order to improve the reconstruction results in the panoramic-like image acquisition procedure recently considered in (Piermattei et al., 2015).
It is worth to notice that in all the paper the imaging camera is assumed to be calibrated (i.e. the values of the interior calibration parameters are assumed to be known) (Heikkila and Silven, 1997, Habib and Morgan, 2003, Remondino and Fraser, 2006, Karel and Pfeifer, 2009, Balletti et al., 2014).

REVIEW OF THE 3D RECONSTRUCTION PROCEDURE
Photogrammetry aims at reconstructing the 3D structure of a scene by means of a certain number of photos (i.e.used as remote sensing measurements) taken by different points of view.Each of such photos can be approximately considered a prospective projection (on the corresponding image plane) of what visible from that point of view.Then, the rationale of photogrammetric reconstruction is as follows: given the positions of the projections (on at least two image planes) of a 3D point q, then an estimate q of the 3D coordinates of q can be obtained by means of triangulation (Masiero and Cenedese, 2012).When the camera positions during image acquisitions are not known, then both camera positions and 3D points of the scene have to be reconstructed from the acquired images.
A more detailed description of the reconstruction procedure can be as follows: • Feature extraction.A set of feature points is extracted in each image and their corresponding descriptors are computed (Lowe, 1999, Bay et al., 2008, Leutenegger et al., 2011, Alahi et al., 2012).Such points can be considered as the most characteristic of the images.
• Feature matching based on the appearance.Features are initially matched taking into account of their appearance similarity, based on the use of the previously computed descriptor.k-d tree search algorithms are typically used in order to make this matching step fast and quite reliable (e.g. the best bin first method (Beis and Lowe, 1997)).
• Feature matching based on the estimated geometry.Depending on the specific considered case, a certain number of the matches computed at the previous step can be wrong (due to several causes, e.g. the influence of sensor noise, noise due to the atmospheric turbulence (Beghi et al., 2008), the poor texture in the images (Piermattei et al., 2015, Masiero andChiuso, 2006), repetitive patterns (Morel andYu, 2009, Morel andYu, 2011)).In this step wrongly matched features are detected by taking into account of the geometry of the scene (Hartley and Zisserman, 2003).Indeed, the projections mj 1 and mj 2 (expressed in homogeneous coordinates) of a 3D point into two different image planes (j1 and j2) have to obey to the epipolar constraint, that is usually expressed by means of the fundamental matrix Fj 1 j 2 : Since also the fundamental matrix is usually estimated from the feature matches (by means of the RANSAC (Fischler and Bolles, 1981)) and of the eight-point algorithms (Hartley, 1997, Longuet-Higgins, 1981)), then the epipolar constraint is usually just an approximation of the real relation between the two projections.
• Structure from Motion.Once a set of feature matches have been computed, such matching points are used in order to estimate the 3D positions of the points (and the positions and orientations of the cameras, if necessary).This is the SfM problem, and a solution is usually computed by solving the bundle adjustment (Triggs et al., 1999) by means of an iterative optimization algorithm (Agarwal et al., 2010, Byröd andAström, 2010).Alternatively, a solution can be obtained by means of the Tomasi and Kanades factorization (Tomasi and Kanade, 1992, Brand, 2002, Masiero et al., 2014b).
• Dense matching.Finally, starting from the set of 3D points computed at the previous step, a dense point cloud is obtained, typically by iterative (local) pixel matching taking into account of the surface regularity (Furukawa andPonce, 2010, Furukawa et al., 2010).
From the above summary, it is clear that the core of the reconstruction procedure is given by the correct matching of corresponding points in different images.As depicted above, this is usually done by using only the information provided by the camera.However, the diffusion of smart devices (embedded with several sensors) makes realistic of use the information provided by other sensors (e.g. the inertial navigation system (Masiero et al., 2014c, Masiero et al., 2015a)) in order to improve the overall procedure, in terms of both speed and reliability, if possible.Accordingly with this observation, this paper investigates in the next sections the integration of the information provided by the navigation system in certain of the steps of the feature matching procedure and in the bundle adjustment problem.
Nowadays, several feature extractors and descriptors have been considered in the literature.Among this large variety of choices, in this paper the scale-invariant feature transform (SIFT, (Lowe, 1999)), which can be considered as the state of the art in the last decade, has been used.Nevertheless, the proposed approach can be easily adapted in order to work with all the other possible alternatives.

APPROXIMATE EPIPOLAR CONSTRAINT
Despite the epipolar constraint (1) is expressed by using the fundamental matrix Fj 1 j 2 , if the values of the camera interior parameters are known, then a similar constraint can be expressed by means of the essential matrix: where Ej 1 j 2 stands for the essential matrix corresponding to the j1-th and j2-th views, and m j 1 , m j 2 are the normalized homogeneous coordinates of the considered feature point in the two image planes.Even if the the camera has not been calibrated a rough estimate of the interior camera parameters can typically be obtained from the operative system of the device (Fusiello and Irsara, 2011).When lens distortion is not negligible, then the normalized coordinates are assumed to have been computed in order to compensate distortion as well (Heikkila andSilven, 1997, Claus andFitzgibbon, 2005).
For simplicity of exposition, let the coordinate system be positioned in the optical center of the camera during the acquisition from the j2-th point of view, and let it have the same orientation of the camera during such acquisition.Furthermore, let tj 1 j 2 and Rj 1 j 2 be translation and rotation from the coordinate system of j2 to that of j1, respectively, Then, the essential matrix Ej 1 j 2 can be expressed as follows: where [tj 1 j 2 ]× is the skew-symmetric matrix corresponding to the cross-product operator of tj 1 j 2 i.e. [tj 1 j 2 ]×x = tj 1 j 2 × x, for all the possible values of the vector x.
In this paper the device considered for image acquisition is assumed to be provided of a navigation system running during the survey.In particular, the device is assumed to be provided of embedded sensors in order to be able to estimate both the position and orientation changes (nowadays smartphones are typically provided of GNSS receiver, and an inertial measurement unit (IMU), e.g.3-axis accelerometer, gyroscope and magnetometer).Hence, in this paper estimates ( tj 1 j 2 , Rj 1 j 2 ) of the translation vector and of the rotation matrix between two camera views (j1, j2) are assumed to be provided by the navigation system.It is worth to notice that, being the IMU measurements independent of any other external device the estimates tj 1 j 2 and Rj 1 j 2 are available also in challenging environments for the GNSS positioning method.
Exploiting the information provided by the navigation system an estimate Êj 1 j 2 of the essential matrix can be obtained by simply substituting the values of the estimates in (3).Then, (2) (with the estimated value of the essential matrix Êj 1 j 2 ) can be used in order to discard wrong matches.Since Êj 1 j 2 has been obtained by noisy measurements of the translation vector and of the rotation matrix, a not so stringent threshold has to be used in on order to discriminate wrong matches, i.e.only a subset of the wrong matches can be detected in this way.

ESTIMATION OF THE ESSENTIAL MATRIX
Equation (2), with the essential matrix estimated from the matched features, can be used in order to obtain a more robust wrong feature matching detector.In this case the estimation of the essential matrix can be done by using the RANSAC and (for instance) the eight point algorithm (Hartley, 1997).
More specifically, such estimation procedure can be summarized as follows: let n be the number of couples of feature matching candidates.Then the algorithm repeats for N iterations the following procedure: • randomly sample 8 couples of feature matching candidates among the n possible ones.
• compute an essential matrix estimate from the 8 sampled couples by using the eight point algorithm.
• compute the fitting error (usually expressed taken into account of the inliers of the current model) of the estimated essential matrix on the n possible couples.
• if the fitting error of the current estimate of the essential matrix is better than the previous best fitting error, than the current estimate is set as the best estimate of the essential matrix and the current set of inliers is stored.
After N iterations a more robust estimate of the essential matrix is usually obtained by recomputing it on all the set of inliers of the best obtained estimate.
The number of iterations N is usually determined taking into account of the probability β of a couple of matching features to be wrong (Chum et al., 2003).To be more specific, N is computed by assuming that whenever a set of 8 correct couples is drawn, then a good estimate of the essential matrix is obtained (e.g. the inliers of such estimate are (most of all) the correct couples of matching features).Unfortunately, this is typically an optimistic assumption, and hence the RANSAC algorithm has to be iterated for a larger number of iterations with respect to the expected one in order to obtain a reliable estimation.
The rationale of this section is that of exploiting the estimate of the change of orientation provided by the navigation system i order to make the estimation of the essential matrix faster and possibly more robust.The RANSAC-like procedure considered here is a two-step algorithm.
• First, the orientation information provided by the navigation system is used in order to compute Rj 1 j 2 , and the RANSAC algorithm is used to estimate only tj 1 j 2 .Since tj 1 j 2 can be linearly estimated by using only 3 couples of candidate matching features, this part of the procedure is much faster with respect to that previously considering the eight point algorithm.
• Similarly to the locally optimized RANSAC (Chum et al., 2003), only when a new best model has been obtained, then a RANSAC with the eight point algorithm is used to estimate the complete essential matrix by using only the inliers of the estimate of the previous step.Since this second step is performed only a small number of times, and the cardinality of the inliers used in this step is smaller with respect to n, then the overall computational complexity of the algorithm is usually much smaller than that of the original RANSAC.

BUNDLE ADJUSTMENT REVISED
A panoramic-like image acquisition procedure has been recently considered in (Piermattei et al., 2015): in such procedure several images are acquired in each of the considered acquisition positions by simply changing the orientation of the camera.To be more specific, in each considered position a sequence of images is typically acquired in to have a panoramic-like view of the scene, as shown for instance in Fig. 1.
On the one hand, as shown in (Piermattei et al., 2015) the panoramiclike image acquisition has the advantage of ensuring a better coverage of the scene.However, on the other hand, since several images acquired from approximately the same position, this kind of acquisition procedure provide some information that can be exploited in the solution of the SfM problem.
The standard bundle adjustment optimization problem used to estimate the parameters of interest is: where qi is the estimated 3D position of the i-th feature point, ( tj, Rj) are the estimated position and orientation matrix of the camera during the j-th image acquisition, mij are the coordinates of the i-th feature on the image plane during the j-th image acquisition, and mij are the corresponding reprojected coordinates accordingly with the current values of the estimation parameters {( tj, Rj), ∀j, qi, i = 1, . . ., n}.In order to simplify the notation, conventionally the sum in ( 4) is assumed to be limited to all (and only) the values of i and j for which a corresponding measurement mij is available.
In the panoramic-like image acquisition it is possible to group the image indexes accordingly to their acquisition positions: if the j-th and j -th images have been acquired in (approximately) the same position then tj ≈ t j , hence tj = t j + ∆tj, where ∆tj is defined as the difference between tj and t j .Let σ∆t be the uncertainty on the value of ∆tj (i.e. on the (small) translation between tj and t j ), and let σpix be the feature measurement error, then, accordingly to the prior provided by the navigation system, (4) can be reformulated as follows: where g(j) is the index of the first image acquired from (approximately) the position tj, e.g.g(j) = j if the j-th image is the first (or the only) image acquired from tj.
As shown in section 6., the prior information about camera positions introduced in the revised bundle adjustment optimization problem (5) allows to reduce the estimation errors on the reconstruction parameters.

RESULTS AND DISCUSSION
The results shown in this section have been obtained by acquiring measurements (images and data provided by the IMU) with an LG Google Nexus 5 (Fig. 2), which is provided with 3-axis accelerometer, 3-axis gyroscope, 3-axis magnetometer, barometer, GNSS and WiFi receiver, and 8 Mpixel camera.Hence, the navigation system, similar to that in (Masiero et al., 2014a), has been implemented in the Android environment (Android 4.4 KitKat).Nevertheless, similar results can be obtained by considering different devices and different operative systems.Furthermore, similar results are also expected if using different navigation systems (e.g.other pedestrian navigation systems (Ruiz et al., 2012, Widyawan et al., 2012, Foxlin, 2005)).
As previously assumed the camera embedded in the device has been calibrated by using the OpenCV camera calibration toolbox (Intel Research, 2000), and hence feature point coordinates have been expressed as normalized homogeneous coordinates.
First, the approximate epipolar constraint approach, presented in section 3., for detecting wrong feature matches is investigate in a case study (Villa XXV Aprile, Mirano, Italy).As shown in Fig. 3(a) and (b), 5 wrong matches have been correctly detected.
It is worth to notice that this method can hardly detect all the wrong feature matches: this is due to the use of a rough approximate of the essential matrix Êj 1 j 2 , that in this approach has been directly computed by noisy measurements of the translation vector and of the rotation matrix.
Since the approximate essential matrix Êj 1 j 2 computed as described in section 3. is usually a quite rough approximation of the real one, then a more accurate estimation should be used for obtaining a more reliable description of the epipolar geometry of the scene, and, consequently, a more trustworthy wrong feature matching detection.To this purpose, the procedure described in section 4. has been applied to the same case study previously considered.
(a) (b) Figure 3. Wrong feature matches detected by using an approximate epipolar constraint.
Fig. 4 compares the computational complexity of the standard RANSAC with the linear (eight point) algorithm for the estimation of the essential matrix with that of the approach proposed in section 4..It is worth to notice that computational times depend on the specific implementation of the algorithms, on the considered programming language and on the machine used to execute them.The results reported in the figure have been obtained by running the algorithms in Matlab R , on an Intel R Core TM i7-4790 (3.60 GHz).
The values of β in Fig. 4 have been chosen in order to allow a correct estimation of the essential matrix (smaller values of β allows to significantly reduce the computational burden, however the value of the estimated essential matrix is frequently far from the correct one).
Finally, the bundle adjustment formulation for panoramic-like image acquisitions is tested in a synthetic example in order to validate just the bundle adjustment procedure, without having to deal with other external factors.In the simulation considered here 100 feature points randomly distributed are assumed to be visible in the observed scene.Image measurements are assumed to be taken from 3 different positions randomly drawn at a mean distance of 15 m, approximately, from the feature points.Then, to simulate the panoramic-like image acquisition method, in each position 3 images have been acquired changing the camera orientation.In order to make the simulation more statistically robust, 100 Monte Carlo simulations have been considered.Fig. 5 shows the comparison of the root mean square error (RMSE) of the standard bundle adjustment (blue dashed line) with that proposed in  In our future works the modified bundle adjustment method proposed in section 5. will be stressfully validated on a large variety of cases, including experimental data of natural environments and human buildings/infrastructures.
Furthermore, the proposed two-step RANSAC procedure will be compared with other recent Bayesian developments of the random sample consensus algorithm (Kang, 2015).

CONCLUSIONS
This paper proposed certain changes with respect to the standard photogrammetric reconstruction procedure in order to take properly advantage of the information provided by the navigation system, that can be included in most of the currently used smart mobile devices.
Three changes have been considered: the use of an approximate epipolar constraint (completely derived by means of the navigation system information), a computationally efficient method to estimate the essential matrix and a slight change of the bundle adjustment optimization function in order to exploit a priori information on camera positions when dealing with panoramic-like image acquisitions.
The proposed methods have been validated on experimental data and on a synthetic case (in the modified bundle adjustment case) showing a significant performance improvement with respect to the classical methods in terms of either error reduction or computational time.
Finally, it is worth to notice that the use of the information provided by the navigation system as a prior for the estimation procedure forces the solution to be relatively close to the prior, ensuring an increase of robustness in the estimation.

Figure 1 .
Figure 1.Panoramic-like image acquisition of the fac ¸ade of Villa XXV Aprile, Mirano, Italy.

Figure 4 .
Figure 4. Comparison of the computational burden required by the standard RANSAC for the linear algorithm for the estimation of the essential matrix (blue dashed line) with that of the method proposed in section 4. (red solid line).

Figure 5 .
Figure 5.Comparison of the RMSE of the standard bundle adjustment (blue dashed line) with that proposed in section 5. (red solid line).The reported RMS errors have been computed in 100 independent simulations.