AUTOMATED AND ACCURATE ORIENTATION OF COMPLEX IMAGE SEQUENCES

: The paper illustrates an automated methodology capable of finding tie points in different categories of images for a successive orientation and camera pose estimation procedure. The algorithmic implementation is encapsulated into a software called ATiPE. The entire procedure combines several algorithms of both Computer Vision (CV) and Photogrammetry in order to obtain accurate results in an automated way. Although there exist numerous efficient solutions for images taken with the traditional aerial block geometry, the complexity and diversity of image network geometry in close-range applications makes the automatic identification of tie points a very complicated task. The reported examples were made available for the 3D-ARCH 2011 conference and include images featuring different characteristics in terms of resolution, network geometry, calibration information and external constraints (ground control points, known distances). In addition, some further examples are shown, that demonstrate the capability of the orientation procedure to cope with a large variety of block configurations.


INTRODUCTION
The capability of obtaining accurate measurements with images is the primary goal of photogrammetry.Despite the dramatic changes and improvements at sensor and computer technology levels, the underlying mathematical models for block orientation, that are based on perspective projection, remain essentially unchanged (Fraser, 2005).Many 3D modeling procedures have been developed in the field of close-range photogrammetry.Here the possible presence of convergent images, large geometric and radiometric changes between the images and the unavailability of sensors for direct orientation make the achievement of automation a more complex issue than in aerial photogrammetry.Indeed, the reduction of the human work coupled with the conservation of the final accuracy, completeness and level of detail, is a fundamental requirement to extend the use of photogrammetry to a wider number of users.The standard image-based 3D modeling process consists of four main stages: (1) camera calibration; (2) image orientation; (3) 3D point cloud extraction and (4) surface reconstruction and texturing.
Step (1) is today a well assessed task, which can be performed in automatic manner to adapt a large variety of cameras for photogrammetric applications (Remondino and Fraser, 2006;Barazzetti et al., 2011).On the other hand there is a lack of an automated and reliable procedure in the image orientation step (2) where, until now, solutions able to automatically compute the orientation of markerless sets of images are limited to the scientific community (Roncella et al., 2005;Läbe and Förstner, 2006;Remondino and Ressl, 2006;Barazzetti et al., 2010a).The complexity and diversity of image block geometry in closerange applications makes the identification of tie points more complex than in aerial photogrammetry.Thus markerless image orientation is still an open research topic in the photogrammetric community, where both precision and reliability of the final solution play an essential role.Until the end of 2010, all close-range software packages implementing automatic procedures for inner/outer orientation and 3D reconstruction were based on targets (Ganci and Handley, 1998;Cronk et al., 2006;Jazayeri et al., 2010).This approach is very useful for laboratory and industrial applications, but in many practical outdoor situations targets cannot be applied to the object.The release of PhotoModeler 2010 (Eos Systems Inc., Canada) has opened the era of commercial solutions capable of orienting terrestrial pinhole images.In the Computer Vision (CV) community, most approaches try to solve at the same time for interior and exterior orientation parameters, leading to the well-known Structure from Motion (SfM) methods.However, these procedures (Nister, 2004;Vergauwen and Van Gool, 2006) generally have a scarce accuracy for photogrammetric surveys.Recently the SfM concept has made great improvements, with the capability to automatically orient huge numbers of images, notwithstanding the achievable 3D reconstructions are useful mainly for visualization, object-based navigation, annotation transfer or image browsing purpose (Snavely et al., 2008;Agarwal et al., 2009;Farenzena et al., 2009;Strecha et al., 2010).The point cloud extraction step (3) can be afforded today by several approaches and procedures, manual, semi-automated or fully automated according to the scene and project requirements.Some of the automated procedures have been already implemented into commercial solutions (e.g.CLORAMA, PhotoModeler Scanner, etc.).These procedures can be classified in two main groups.If the final result of the photogrammetric survey is a vector model, manual (interactive) measurements are still the best approach to obtain accurate and sharp edges.This is the typical case when the object to be modeled can be completely defined by geometric primitives.The second case is related to free-form objects, where the automatic reconstruction of the surface can be performed using dense matching techniques, which deliver a dense point cloud similar to range-based sensors results.This step can nowadays be automatically performed with satisfactory and accurate results (Hirschmueller, 2008;Remondino et al., 2008;Vu et al., 2009;Furukawa and Ponce, 2010).The last stage (4) concerns the creation of structured 3D data from the unstructured dense or sparse point cloud obtained at stage (3) for texturing, visualization or other possible applications.The description of this task is out of the scope of this paper, but the reader is referred to the specific literature (see e.g.Remondino, 2003;Guidi et al., 2010).A complete pipeline for accurate and automated image-based 3D modeling is shown in Barazzetti et al. (2010b).
The goal of this paper is to review ATiPE (Automatic Tie Point Extraction - Barazzetti et al., 2010a;Barazzetti, 2011), a procedure developed for the automatic orientation of closerange image blocks.After a short description of the implemented methodology, a set of results achieved using several complex image sequences are illustrated.Some data used in this paper are available for a special session on "automated image orientation" at the ISPRS 3D-ARCH 2011 workshop in Trento, Italy.They include low and high resolution images, calibrated and uncalibrated cameras, ordered sequences and sparse blocks, dataset with external constraints like ground control points (GCPs) and known distances for accuracy analyses.

Overview
The implementation (ATiPE) consists in a methodology capable of determining a set of correspondences between different categories of data (Barazzetti et al., 2010a;Barazzetti, 2011).The general formulation of the problem makes the whole processing possible with a high degree of automation.However, it is remarkable that different data acquired with diverse sensors contain corresponding 2D features linked by a relationship that can often be mathematically defined as a linear transformation.This concept can be applied for outlier rejection in datasets that contain a certain number of incorrect correspondences derived with feature-based matching (FBM) algorithms.Therefore, due to this common formalization of the matching problem, the proposed solution can be adapted to work with many different categories of images.In particular, ATiPE has been applied for the alignment and registration of: − pinhole images (Barazzetti et al., 2010a) acquired with standard CCD/CMOS terrestrial cameras (see Section 2.2) to form large and complex image sequences or blocks; − spherical images (Barazzetti et al., 2010c) which are matched and orientated with a strategy based on a spherical unwarping.The partitioning of the sphere into zones, which are independently matched and then combined, transforms the data into local pinhole images.
Then the estimation of the camera poses can be carried out through a photogrammetric bundle adjustment in spherical coordinates (Fangi, 2006); − range data: laser scanning point clouds can be registered using the images produced from 3D points and their intensity values (Alba et al., 2010).Although this procedure aligns a set of scans without any initial manual approximation, some limits are present in the case of highly convergent scans.As things stand now, it is difficult to forecast a massive use of such a method in complex practical projects.Further developments are necessary to improve the repeatability of FBM operators; − multispectral images (Remondino et al., 2011) acquired from the same viewpoint with a dedicated digital camera mounting different inferential filters.Such images can be considered connected by a projective transformation (homography) and can therefore be automatically aligned by extracting homologues points.Indeed the different filters produce some misalignment of the images, which need to be perfectly overlapped for further diagnostic analyses and restoration works.

Automatic orientation of pinhole images
ATiPE allows the extraction of image correspondences starting from a block of images and, possibly but not strictly necessary, camera calibration parameters.According to the structure of the block, different strategies can be applied.
An unorganized block of n images is made up of (n 2 -n)/2 combinations of stereo-pairs, which are initially analyzed independently for the pairwise identification of the image correspondences and then progressively combined and concatenated.The procedure for automated tie point extraction works with image pairs.For each pair, homologues points are sought using the SIFT (Lowe, 2004) or SURF (Bay et al., 2008) feature operators, using a quadratic or a kd-tree search for the comparison of the descriptors.Outliers are then rejected with a robust estimation of the relative orientation based on the fundamental matrix (Hartley and Zissermann, 2004) by using 7 correspondences.If the calibration parameters are known, the essential matrix (Longuet-Higgins, 1981) is used.These operations are repeated for all image pair combinations in order to complete the pairwise matching phase.
If the images form an ordered sequence (figures 1 and 2), the number of image combinations to be processed is reduced to n-2, with a consequent computational time improvement.Then all pairs are split into triplets for a successive outlier rejection stage.In case of unordered sequences or sparse blocks, the analysis of triplets is avoided.The data are then organized into tracks and the comparison of the numerical values of all image points gives the set of image correspondences for the entire block.This completes the basic elaboration and allows the user to run a bundle adjustment and derive the orientation parameters.Indeed the automatically extracted pixel coordinates of the homologous image points can be imported in most commercial and research photogrammetric packages for image orientation and sparse geometry reconstruction.The mathematical model used for network orientation is the photogrammetric bundle adjustment based on the non-linear collinearity equations and Least Squares (Gauss-Markov).Good initial values are needed for the linearization using a Taylor series expansion.Rather than trying to obtain initial values for all unknown parameters, an incremental approach is used: starting from the relative orientation of an initial image pair, a combination of resections, intersections and bundles leads to the final solution.This procedure may be indented as a progressive stabilization of the image block since the number of 3D rays per point increases.The adjustment can be solved using internal or external constraints, achieving accuracy superior to 1:100,000 if a good image network, precise image points and calibration parameters are available (Mikhail et al., 2001).The basic tie point extraction pipeline described in this paragraph can be improved by using some refining techniques described in the following sub-paragraphs.These are able to speed up the processing and to refine the quality of the image coordinates in terms of precision.These steps are optional and the user has to select them.
Figure 1: Examples of ordered sequences and sparse blocks oriented with a photogrammetric bundle adjustment using a set of tie points automatically extracted and matched via feature-based operators.

2.2.1
Tie-point decimation.Feature-based operators like SIFT or SURF could provide a large number of image points even in the case of deformities like scale variations, radiometric changes, convergent angle views and wide baselines.This is also emphasized in the case of well-textured scenes or very high-resolution images.But too many tie points (observations) in the bundle adjustment can produce serious computational problems.Therefore after the matching of all image-pair combinations, the number of extracted tie points can be reduced according to their multiplicity (i.e. the number of images in which the same point is visible).A regular grid is projected onto each image and for each cell only the point with the highest multiplicity is stored.Obviously, the same point must be kept for the other images.The size of the cell depends on the geometric resolution of the images (for a 12 Mpx image a good choice is 200×150 px).Therefore the user has to manually set the size of each cell, according to the geometric resolution of the original images.The use of the decimation strategy not only improves the quality of the result in terms of geometric distribution of the correspondence, but also in terms of CPU time.

Visibility map.
For blocks containing several tens of unordered photos, the processing time can significantly increase.Among all possible image pair combinations in a photogrammetric block, only a limited number of pairs share homologues points, although this number is not known a priori.The remaining pairs can be therefore removed from the processing.The method used to discard these useless pairs is a visibility map, which is estimated at the beginning of the procedure.The visibility map contains the connections between all image pairs sharing tie points and can be estimated as follows: − visibility map from images: if high-resolution images are employed, a preliminary elaboration with compressed images (e.g. less than 2 Mpx) is rapidly performed.This provides the image combinations of the whole block.Then, the same matching procedure is repeated with the original image resolution but taking into account the produced map; − visibility from GPS/INS data: these values, combined with an approximate DSM of the scene, allow the estimation of the overlap between the images.The method is faster than the previous one but it can be applied to images with a configuration similar to an aerial block.In some cases, the DSM can be approximated by a plane.

LSM refinement.
As shown in Remondino (2006), image coordinates of homologous points extracted using feature-based matching (FBM) methods can be refined with area-based matching (ABM) approaches, in particular with the Least Squares matching (LSM - Grün, 1985).This ensures a sub-pixel accuracy of the locations and therefore a higher accuracy of the bundle solution.

Corner detection.
It also noteworthy that a FBM matching (with or without LSM) normally provides results worse than a traditional manual orientation with interactive measurements.This is mainly due to the tie point redundancy and distribution.In fact, with manual measurements the same point can be easily identified in several convergent images, while a feature-based operator has less repeatability.Moreover, the larger is the number of images in which the same point appears, the better is the precision of the global adjustment.
According to this consideration, the FAST interest operator (Rosten and Drummond, 2006) is included in the pipeline to assure a large number of corners under a higher repeatability and also with a better distribution in the images.The corners are automatically extracted in the images, but the user has fixed the operator threshold by visually checking at least one image in order to verify the point distribution.Using the exterior orientation parameters computed with a precedent FBM approach, homologues rays are compared in order to find corresponding points in the object space.It is also possible to specify the minimum number of images in which a point to be used during the orientation step must appear (a good choice is 4).Then, image point locations are improved via LSM by fixing the position of a feature (template) and searching for the remaining points (slaves).

Image enhancement.
In some cases the radiometric content of the images is not sufficient to achieve a good number of well distributed image correspondences extracted with the FAST operator.To overcome this drawback a pre-processing procedure can be applied in order to stretch the radiometric information of the images by locally forcing the grey value mean and contrast (dynamic range) to fit certain target values (Wallis, 1976).

Orientation of long and ordered image sequences
Some results for long and ordered image sequences are shown in Figure 2. In all cases, the SIFT and SURF operators were alternatively used for the initial FBM, while the refining procedure were not employed.All datasets feature several repetitive elements, sometimes with a uniform texture and moving objects.In these cases a restrictive threshold during the comparison of the descriptors with the ratio test removes many good image correspondences.This means that the threshold should be modified according to the texture of the images, obtaining values of about 0.7-0.8.Obviously, several incorrect correspondences still remain in the dataset and they should be removed with the analysis of the epipolar geometry.However, the robust estimation of the fundamental matrix does not allow for the complete removal of mismatches lying on the epipolar lines, which are almost parallel and almost horizontal in the case of "quasi linear motion."Thus the epipolar lines will be aligned with architectural objects like doors and windows.As the fundamental matrix cannot detects these outliers, all remaining incorrect correspondences must be removed during the iterations of the least squares (LS) bundle adjustment.Therefore a robust photogrammetric bundle formulation plays an essential role, because multiple data for the same 3D points can be combined.The redundancy of the LS system can be the solution to overcome the drawbacks given by the use of the fundamental matrix only.A valid alternative is the estimation of the trifocal tensor (Hartley and Zisserman, 2004).This encapsulates the geometry of an uncalibrated image triplets and is the core for the analysis of image sequences with an overlap between three consecutives images.

Orientation of unordered image sequences
In case of unordered images sequences the search for image correspondences must process all the image combinations.Some examples are shown in Figure 3, with the estimated camera poses and statistical analyses after bundle adjustment.
For some datasets, a self-calibrating bundle adjustment was necessary due to the unknown interior parameters (just an approximated value of the focal length was available from the EXIF image header).Despite the achieved solution, the network design (especially without rolled images) is not adequate for an accurate camera calibration procedure.Therefore it is always better to pre-calibrate the camera with the most adequate network and then acquire the images for scene reconstruction using the same camera settings.Further information about this subject are reported in Remondino and Fraser (2006) and Barazzetti et al. (2011).

Orientation of irregular block
In photogrammetric applications, images are acquired taking into consideration the quality of the final product.Irregularities in the block geometry might be due to a lack of control in data acquisition, like in the example "Duomo Spire" reported in Figure 4.However, images normally feature a regular distribution in space.On the other hand, in recent years the diffusion of the so-called photo-tourism applications (Snavely et al., 2008) gave rise to a new typology of sparse blocks with very irregular baselines and image scales.These images are not very useful for real photogrammetric surveys, where a particular attention must be paid during the image acquisition phase.An example of this kind of block is given by the dataset "Piazza Dante" (Figure 5).Despite the unconventional block geometry, this example was correctly oriented with the proposed procedure although several problems were found during the bundle adjustment phase.The extraction of tie points was performed in automatic way, but the computation of the bundle adjustment required some user interaction.This can be considered as a manual sequence of resections and intersection, where the order of the images strongly influenced the final result.With this in mind, the procedure cannot be considered as fully automated.In general, it seems that it is rather difficult to complete the automated orientation phase if images have an irregular distribution, very short baselines and roll variations without changing the 3D location of the perspective center.This is a fundamental difference between perspective and projective bundle adjustment approaches, where approaches like Photosynth are instead able to handle these situations.

ACCURACY AND PERFORMANCE ANALYSIS
The metric accuracy of the final orientation results has a fundamental importance for any photogrammetric projects.It is common to use external information acquired with different sensors (e.g.theodolites, GNSS receivers, etc.) and add them to the bundle adjustment in order to remove the so called "datum ambiguity."In some cases this external information can be compared to photogrammetric measurements to determine the accuracy of the project.
Figure 6 shows several targets distributed on the MySon G1 temple (Vietnam), that were measured with a theodolite.The survey was also carried out photogrammetrically using 18 images acquired with a calibrated Nikon D80 equipped with a 18 mm lens.The image coordinates of the targets were manually measured with the LSM method and their 3D coordinates were computed in the photogrammetric project only as intersections of homologous rays.The exterior orientation parameters previously computed were fixed.For the datum problem, 5 targets were used as GCPs (marked with a circle in Figure 6).These observations were also considered as fixed 3D points, leading to a worsening of the project sigma-naught (0.83 px) with respect to the free-net solution (0.7 px).All the remaining targets ( 16) on the façade were assumed as independent check points, and were matched in at least 4 images, although they were often visible in more images.The comparison between photogrammetric and geodetic coordinates is shown in table 1 where it can be seen how the standard deviation of the differences is lower than 6 mm.In addition, the absolute values of maximum and minimum discrepancies are less than 1.6 cm, and demonstrate a relative accuracy of about 1:2500.This result is sufficient for this kind of survey, considering the size of the object (15 m wide) and the average image scale (about 1:600).Table 1: Accuracy results for the Myson sequence.

CONCLUSION
In this paper the ATiPE procedure for the automatic tie point extraction and orientation of terrestrial image blocks has been presented In particular, its application to different kinds of block configuration depicting architectural object and complex scenes has been discussed.The example reported show that ATiPE can work in an automated way with images featuring different characteristics.
The consistency and distribution of tie points matched is often sufficient to achieve a precision useful for real photogrammetric surveys.A typical block, created for photo-tourism application, was also correctly oriented, notwithstanding this example (Piazza Dante) resulted in some problems in the computation of the bundle adjustment, which required some manual decisions to include all images in the orientation procedure.This problem was also found for the other datasets, were the bundle adjustment implementations today available cannot automatically complete the orientation phase.So if the block has a complex and irregular geometry some operations have to be done by an human operator.On the other hand this problem is not due to the failure of the tie point extraction phase, but it depends on the absence of a regular block structure which makes complex the computation of the approximate values during the bundle adjustment.This demonstrate that, although images can be relatively orientated among them, an automatic, rigorous and precise result can be obtained if the standard rules of photogrammetry are followed.On the other hand, with the availability of automatic procedures for image orientation, an improvement of the orientation techniques is expected in addition to a wider use of photogrammetry for 3D modeling projects.In addition, powerful algorithms and software packages capable of reconstructing the 3D surface of objects are becoming more popular.However, the design of the photogrammetric block still plays a key-role in terms of the quality of the achievable results.Attention has to be focused on this topic, i.e. the definition of simple and basic rules for block design in close-range applications, at least for the most common practical situations.

Figure 2 :
Figure 2: Different ordered image sequences with the estimated camera poses and statistical analyses.

Figure 5 :
Figure 5: The camera poses for the "Piazza Dante" dataset.

Figure 6 :
Figure 6: The Myson temple sequence.The target used for accuracy analysis with highlighted those employed as GCPs, and the recovered camera poses of the sequence.
Examples of unordered image sequences.
Figure 4: Example of unordered image blocks.