ACCURACY OF TYPICAL PHOTOGRAMMETRIC NETWORKS IN CULTURAL HERITAGE 3 D MODELING PROJECTS

The easy generation of 3D geometries (point clouds or polygonal models) with fully automated image-based methods poses nontrivial problems on how to check a posteriori the quality of the achieved results. Clear statements and procedures on how to plan the camera network, execute the survey and use automatic tools to achieve the prefixed requirements are still an open issue. Although such issues had been discussed and solved some years ago, the importance of camera network geometry is today often underestimated or neglected in the cultural heritage field. In this paper different camera network geometries, with normal and convergent images, are analyzed and the accuracy of the produced results are compared to ground truth measurements.


INTRODUCTION
Network design has always been an attractive research topic in the field of non-topographic photogrammetry (Fraser, 1984).The design aims at predicting and guaranteeing an overall quality of the photogrammetric measurements, where quality means a global parameter which includes precision, reliability and economy (Fraser, 1987).Network design is a fundamental step in industrial applications, where very high accuracy is mandatory, but its importance is increasingly recognized also in the cultural heritage field.The work presented in this paper starts from the consideration that every day a large number of people is attracted by the potentialities of image-based 3D reconstruction and modelling tools.In particular, in the cultural heritage field, more and more people are nowadays producing 3D models, but very critical issues remain open: 1. what is the quality of the derived 3D models?2. are fully automated methods able to derive accurate 3D results without any shape deformation?3. is Structure from M otion (SfM ) suitable under every network configuration?These questions are not relevant if a 3D model is producing for web, AR or visualization needs but they are fundamental for documentation, conservation, preservation and replica purposes.There are already many examples on the web and even in literature, especially from computer vision community, that show the reconstruction of big squares and structures obtained with consumer-grade digital cameras and basic camera network geometry manly consisting of overlapping photographs taken with "normal" optical axes (Snavely et al., 2008;Frahm et al., 2010).Of course the focus of these applications is certainly not the accuracy and reliability of the method but the success of a fully automated reconstruction and its computational time.The ease with which nice-looking 3D models are being obtained, has resulted in the last few years in a real risk, especially by nonexperts, to cope with important metrological projects without any knowledge or photogrammetric experience.This misconception that such 3D models are also accurate is probably due to the general confidence that the popularity of image-based computer applications is also a guarantee of reliability and repeatability (Remondino et al., 2012).Nevertheless it is undoubted that such new automatic tools, if properly used, have potentialities never offered before and thanks to them there is nowadays a much more interest towards the image-based approach contrary to the range-based approach.Therefore, the photogrammetric community should state the critical aspects of most common photogrammetric networks and update the general rules given in the past (e.g. the CIPA 3X3 rules, Waldhaeusl and Ogleby, 1994).Indeed, there are many differences with respect to the past procedures of 3D restitution (Remondino et al., 2013): cameras are almost non-metric, stereoplotting is not required anymore, a higher number of images does not affect the cost of the project (analogue photographs were expensive in the past), orientation and dense image matching are almost completely automated, etc. Furthermore self-calibration has become almost a standard approach in many projects (e.g.SfM ) due to the use of compact cameras and very unstable interior parameters over time and it has been understood that a calibrated camera leads to better 3D results.Therefore previous studies on camera network (Fraser, 1984;Kraus, 1997;M ikhail et al, 2001) are today more than ever crucial to avoid unsuccessful results and guide non-experts in a proper project plan as not all the imaging networks are suitable for selfcalibration.This paper is thus inspired by these considerations and by the fact that a wrong photogrammetric planning can lead to a general deformations (bending effects) in the 3D results.The magnitude of the deformation is a function of different parameters and, although generally neglected in many applications, it can be ten times or more larger than the planned ground sample distance (GSD) (Nocerino et al., 2013).Without any 3D control measurements (or a simple similarity transformation at the end of the bundle adjustment), these deformations cannot be reckoned and a bundle adjustment solved in free network (Granshaw, 1980), although it minimizes the mean variance of object point coordinates, is not leading to correct results.The bending effect can appear especially in case of open sequences, if elongated flat objects are imaged as a sequence of photographs in a single strip (e.g. a wall, a long building façade, etc.) or in case of scenes with large planimetric dimensions but a small depth.A similar problem was investigated in Cohen et al. (2012) where symmetries are exploited to constraint the bundle adjustment and avoid deformations in object space -although a simple strengthen of the imaging network would have been a much easier and practical solution.On the basis of these evidences, driven by many interdisciplinary projects, an in-depth analysis of typical camera network configurations was performed.With respect to other studies based on simulations or small indoor scenarios (Gruen & Beyer, 2001;Krauss, 1997;Fraser, 1984) the analysis hereafter focused on real and large scenarios surveyed with consumergrade digital cameras which need on the field self-calibration.The paper intends to find and define simple rules for warning non-experts when facing critical situations.

A PROCEDURE TO EVALUATE THE ACCURACY OF PHOTOGRAMMETRIC CAMERA NETWORKS
The aim of the present work is manifold: (i) to quantify the accuracy of object space coordinates delivered by typical network configurations for archaeological and architectural applications; (ii) to propose a methodology to check the obtained results; (iii) to provide simple and clear recommendations to guide the image acquisition phase in order to reduce systematic residual errors.Therefore the interest of the article is not to investigate the finest attainable accuracy and geometrical detail with a certain camera configuration but:  to highlight if (and when) global deformation (i.e.strip deformation) of the photogrammetric model can occur;  to quantify the magnitude of the warp using a suitable ground truth;  to correlate the deformation with the camera network geometry.In the last years, within research projects, summer schools and workshops activities, the authors collected redundant 3D measurements of interesting case studies surveyed with different technologies.Two case studies are hereafter considered: 1. a 65 m long per 3.5 m high ancient wall (Fig. 1) of a Greek-Roman settlement in Paestum (Italy): the object has a ratio length to height of about 18.5:1; 2. a Roman theatre, ca 65x55x8 m (Fig. 2) in Ventimiglia (Italy).In both cases, the camera network comprises normal / vertical images (green pyramids in Fig. 1a and Fig. 2a) and convergent / aerial oblique images (red pyramids in Fig. 1a and 2a).
To evaluate the accuracy of the networks and possible deformations in object space, ground truth measurements obtained with other techniques are used.After the photogrammetric processing, metrics and statistics are derived for both datasets to highlight possible deformations of the photogrammetric results in terms of:  Root M ean Square Error (RM SE) of the computed object coordinates with respect to some Check Points (CPs);  image observation residuals and standard deviations (STDVs) of computed object coordinates and camera parameters;  color-coded error maps of Euclidean distances between the laser scanner data and the photogrammetric sparse point cloud (only for the Paestum wall).For the orientation and triangulation of the images, Agisoft Photoscan was employed for the automatic extraction of the homologues points.These tie points were then automatically filtered in order to regularize the distribution in object space, but preserving connectivity and high multiplicity (Nocerino et al., 2013).A self-calibrating bundle adjustment was then run to obtain main statistical values and correlations between interior orientation and additional parameters, all information not available in Photoscan.GCPs in the bundle adjustment.The datum definition in the free network solution was performed computing a similarity transformation with scale factor.The achieved results were compared with some available ground truth data.

THE PAES TUM WALL
The wall dataset comprises:  15 reference object points measured with a topographic surveying using a Topcon GPT-7001i total station;  A 3D point cloud obtained with a phase-shift Leica HDS7000 terrestrial laser scanner;  58 images acquired with a 14.2 M egapixel Nikon D3100 (DXformat sensor, 5.26 m pixel size) mounting a Nikkor 35 mm prime lens.The survey was planned and executed to meet the requirement of 1:50 drawing scale.Assuming a plotting accuracy of 0.2 mm, the minimum ground sample distance (GSD) and measurement accuracy should not exceed 10 mm.

The topographic network
Several targets, distributed all over the wall (65 m long) on a regular grid (Fig. 2b), were measured with a surveying network, consisting of 3 stations.After the adjustment of the observations, a global 3D accuracy of about 6-8 mm was obtained from free network solution.The X-axis is running along the wall, the Z-axis is along the vertical direction and the Y-axis is perpendicular to the wall to complete the clockwise reference frame.

The terrestrial laser scanning survey
A Leica HDS7000 range-based sensor (range noise less than 1 mm for a measurement distance of 25 m) was used to acquire a single scan of the wall with 3.1mm @ 10 m resolution, proving an average sampling step of about 10mm on the entire wall.According to Wunderlich et al. (2013) the overall accuracy of the instrument is better than 3 mm.The range-based was roto-translated with a rigid similarity transformation into the topographic reference system and a mesh model generated as reference for the photogrammetric outcomes.

The photogrammetric survey
The photogrammetric acquisition (fixed focus distance) was carried out trying to get the best images despite the space constraint in front of the wall and to optimize the photographic field of view.A 35 mm prime lens was used in order to avoid further uncertainties that can arise employing a zoom lens.An average distance of about 10 m (≈1:285 photo-scale) was maintained from the object, yielding a mean GSD less than 2 mm.The images were acquired with a mean overlap between two consecutive normal views of about 75-80% to guarantee the automated identification of homologous features and a minimum number of 3-4 intersecting rays for improving reliability .The mean baseline between adjacent images was about 1.5 m (max 5 m), implying a B/D ratio of 0.1 (max 0.5).M oreover convergent images were also acquired, yielding convergent angles of 50° and a B/D ratio of 0.8.The photogrammetric processing was done considering two imaging configurations: (i) a network with only normal images (green pyramids in Fig. 1a) and (ii) a network with normal and convergent (red pyramids in Fig. 1a).Figure 3 reports the distributions of the computed 3D points obtained from the two imaging configurations according to intersection angles and multiplicity (number of intersecting optical rays).For the free network solution (version A and C), all the measured targets (Section 3.1) were used for computing a similarity transformation to scale, rotate and translate the reference frame of free network solution.In the constrained solutions (version B and D) 5 GCPs (points 3, 6, 16, 21 and 23 in Fig. 1a) were used in the bundle adjustment and the others were used as CPs.Table 1 reports the RM SEs of the image observations and CPs as well as the STDVs of the interior parameters (from the covariance matrix).Version A shows the maximum discrepancies in object space (Y is the depth axis).The inclusion of 5 GCPs improves the accuracy of the normal network (B) in the object space, even if the STDVs of the focal length is still quite high.The free network solution for the convergent geometry (C) delivers the best solution in terms of RM SEs in object space, with significantly lower STDVs of interior orientation parameters.The inclusion in the bundle of 5 GCPs (D) does not improve the final accuracy, probably because of low accuracy of the topographic points.Figure 4 shows the uncertainty ellipsoids (1) of the computed 3D points (version A).An equivalent visualization of point's uncertainty is proposed in Figure 5, where each 3D point is colored according to its STDV computed in the bundle adjustment.In version A (Fig. 5a), the distribution of the STDVs shows a warp of the photogrammetric model that reaches a maximum at the extremities of the wall (ca 30 mm).The inclusion of 5GCPs (version B, Fig. 5b) makes more homogeneous the distribution of the STDVs.In agreement with theory, the free network provides the lowest STDVs of object points for the convergent configuration (version C, Fig. 5c), while in the constrained solution, the STDVs of the points uniformly increase (version D, Fig. 5d) accordingly to the accuracy of the employed GCPs.

Comparisons between laser scanner and photogrammetry
The photogrammetric sparse point clouds resulting from the 4 bundle adjustment versions were compared with the laser scanning data in terms of Euclidean distances.Figure 6 reports the differences: version A (Fig. 6a) shows deformation of the photogrammetric model greater than 30 mm (absolute value) at the extremities and central part of the wall, corresponding to a maximum absolute deviation of more than 60 mm.In version B (Fig. 6b), the deformations are reduced to a value comparable with the accuracy of GCPs.When also convergent images are used (Fig. 6c-d), the magnitude of the deformations of the photogrammetric outcomes improves with the constrained solution (Fig. 6d) depicting more homogenous deviations than free network (Fig. 6c).A good agreement can be therefore observed between the magnitude and the distribution of points' STDVs (Fig. 5) and deviations between laser and photogrammetric 3D points (Fig. 6).Only in version C, the statistics from the bundle adjustment (Fig. 5d) seem to underestimate the real deformation highlighted in the comparison with the laser scanning data.

Normal imaging network
The

THE VENTIMIGLIA THEATRE
The theatre was surveyed within a restoration project founded by the Italian M inistry for Cultural Heritage (Nocerino et al., 2013).The main deliverable's requirement was a plan of the entire theatre at 1:20 drawing scale, which was the leading parameter for planning the survey.Considering a plotting accuracy of 0.2 mm, images were acquired with a GSD smaller than 4 mm and, consequently, accuracy better than 4 mm had to be guaranteed.The dataset comprises:  23 targeted points measured through a topographic surveying using a Topcon GPT-7001i total station;  58 images acquired with a model helicopter (RPAS) equipped with a 24 M egapixel Nikon D3X (full frame sensor, 5.95 m pixel size) mounting a Nikkor 50 mm lens.

The topographic network
The targets were evenly distributed within the theatre's area (Fig. 2b) and each of them was measured from at least two survey stations with a prism pole centered on the target with a tripod.After the adjustment of the observations, a global 3D accuracy of about 4 mm was obtained, with the Z-axis is along the vertical direction.

The photogrammetric survey
Both vertical and oblique images (with a mean angle  from the vertical of about 45°) were acquired over the theatre with a GSD of about 3 mm.Ten vertical and fifteen oblique aerial strips were realized, with the flight height for the oblique images reduced with respect to the vertical flight in order to take into account the image scale variation deriving from the non-vertical camera set up.The datum definition of the datum (scale and coordinate reference system) was realized using the reference points measured with the total station.A camera calibration was carried out after the flight.The analyses were performed on the four bundle block versions.For the free network solution (version A and C), all the measured targets were used as CPs.In the constrained solutions (version B and D), 5 GCPs (red points 2, 6, 9, 20 and 24 in Fig. 2b) were used in the bundle adjustment and the others were used as CPs.Version A is the most uncontrolled solution with the highest RM SEs for Z coordinates and the highest STDVs for the focal length.Version B shows that the inclusion of 5 well distributed GCPs can improve the accuracy of the bundle outcomes.The results of scenarios C and D show no statistically significant differences, suggesting that the block configuration is sufficiently strengthen by oblique images even without GCPs.
Figure 7 shows the ellipsoids of uncertainty @ 1xSTDV as 2D colormap for the different bundle versions.The distribution of the STDVs may explain a warp of the photogrammetric model that exceeds 20 mm @ 1xSTDV in the central and external parts (Fig. 7a).The inclusion of 5GCPs (version B, Fig. 7b) reduces and makes more homogeneous the distribution of the STDVs.
Similarly to the wall experiment, the use of vertical and oblique imagery (version C, Fig. 7c) provides the lowest STDVs -in agreement with theory -while in the constrained solution the STDVs of the points uniformly increase (version D, Fig. 7d).
As no dense ground truth was available, further comparisons were made between polygonal models generated interpolating the computed 3D tie points of the different bundle versions (Fig. 8).Euclidean distances were calculated between the models, using version C as reference.This analysis clearly shows the high deformations of the pure vertical block configuration thus confirming the high RM SE in Table 4.It is worth to notice that the deformation of version A can lead to errors up to 50 mm when measuring height differences between the center and border areas of the theatre (Fig. 8b).In version B, the deformations are reduced due to the inclusion of some GCPs in the bundle adjustment which remove unmodelled systematic errors of version A (Fig. 8c).The comparison between C and D shows that there are no significant benefits using the GCPs as the network includes a mixture of vertical and oblique images (Fig. 8a).Increasing the number of GCPs did not significantly improve the results of the bundle.

CONCLUS IONS
The paper reported some experiences gained during the 3D surveying and modeling in archaeological scenarios.The studies highlight the importance of proper camera network geometry and that the inclusion of convergent images strengthen considerably the network geometry and help to avoid global deformation of the image-based 3D results.From the analysis of bundle adjustment statistics in terms of STDVs of the computed 3D points, it is possible to predict with good approximation the bending of the 3D model and it can be used as a measurement of the network's global accuracy.From the geometric comparison evaluations, although the available ground truth data were not very accurate, we demonstrated how bending and deformation effects can be substantially large without good network geometry.This is the case of open sequences (or even parallel strips) featuring only vertical / nadir images.Due to the available fully automated methods for image orientation, a correct planning of the camera network is today more than ever crucial to avoid unsuccessful results and guide non-experts in achieving accurate results.A proper project plan is mandatory in particular if a self-calibration procedure is run as not all the imaging networks are suitable for calibrating digital cameras.

Figure 1 :
Figure 1: Camera network of the Paestum wall (a) with normal / vertical (green) and convergent / oblique (red) images.Orthophoto of the wall with the GCPs (red points) and CPs (blue points) distribution (b).

Figure 3 :
Figure 3: Distribution of the computed 3D points colored according to their intersecting angles: a) network with only normal images (mean= 24°, max=39°), b) network with normal and convergent images ( mean=45°; max=89°).Distribution of the computed 3D points colored according to their multiplicity in the images: c) network with only normal images (mean=3; max=5), d) network with normal and convergent images (mean=5; max=9).

Figure 4 :
Figure 4: 3D points' uncertainty displayed with the error ellipsoids for the normal geometry free network (version A).

Figure 5 :
Figure 5: Color map of 3D points' uncertainties for the free networks solution (a, c) and constrained solutions (b, d).

Figure 6 :
Figure 6: Euclidean distances between laser and photogrammetric data for the free network solutions (a, c) and constrained solutions (b, d).

Figure 7 :
Figure 7: 3D point uncertainty as a color map: free network with only vertical images (a), contrained adjustment (5 GCPs) wit h only vertical images (b), free network with vertical and oblique images (c) and constrained adjustment (5 GCPs) with vertical and oblique images (d).
Table 2 reports the RM SEs of the image observations and CPs as well as the STDVs of the interior parameters (from the covariance matrix).