ORIENTATION OF OBLIQUE AIRBORNE IMAGE SETS - EXPERIENCES FROM THE ISPRS/EUROSDR BENCHMARK ON MULTI-PLATFORM PHOTOGRAMMETRY

: During the last decade the use of airborne multi camera systems increased significantly. The development in digital camera technology allows mounting several mid-or small-format cameras efficiently onto one platform and thus enables image capture under different angles. Those oblique images turn out to be interesting for a number of applications since lateral parts of elevated objects, like buildings or trees, are visible. However, occlusion or illumination differences might challenge image processing. From an image orientation point of view those multi-camera systems bring the advantage of a better ray intersection geometry compared to nadir-only image blocks. On the other hand, varying scale, occlusion and atmospheric influences which are difficult to model impose problems to the image matching and bundle adjustment tasks. In order to understand current limitations of image orientation approaches and the influence of different parameters such as image overlap or GCP distribution, a commonly available dataset was released. The originally captured data comprises of a state-of-the-art image block with very high overlap, but in the first stage of the so-called ISPRS/EUROSDR benchmark on multi-platform photogrammetry only a reduced set of images was released. In this paper some first results obtained with this dataset are presented. They refer to different aspects like tie point matching across the viewing directions, influence of the oblique images onto the bundle adjustment, the role of image overlap and GCP distribution. As far as the tie point matching is concerned we observed that matching of overlapping images pointing to the same cardinal direction, or between nadir and oblique views in general is quite successful. Due to the quite different perspective between images of different viewing directions the standard tie point matching, for instance based on interest points does not work well. How to address occlusion and ambiguities due to different views onto objects is clearly a non-solved research problem so far. In our experiments we also confirm that the obtainable height accuracy is better when all images are used in bundle block adjustment. This was also shown in other research before and is confirmed here. Not surprisingly, the large overlap of 80/80% provides much better object space accuracy – random errors seem to be about 2-3fold smaller compared to the 60/60% overlap. A comparison of different software approaches shows that newly emerged commercial packages, initially intended to work with small frame image blocks, do perform very well.


INTRODUCTION
During the last decade airborne multi camera systems which are observing the scene under slanted views from all four cardinal directions have become mature (Remondino and Gerke, 2015).Opposed to nadir-looking airborne images, oblique images enable observation of lateral parts of elevated objects such as buildings or trees.Those images are used for instance for visualisation, identification of objects, 3D mapping and automatic building detection or verification (Frommholz et al., 2015, Haala et al., 2015, Remondino et al., 2016).A survey amongst users and vendors of airborne oblique camera systems, software developers and researchers was initiated by EuroSDR (Gerke and Remondino, 2014).It revealed that for geometric applications the standard photogrammetric products are still having deficits, especially regarding bundle block adjustment automation and accuracy (Rupnik et al., 2015).In particular tie point matching across different viewing directions is a so-far not sufficiently solved problem (Hartmann et al., 2016).In terms of ray intersection geometry such oblique cameras offer a better configuration compared to nadir views only.The relevant question is thus, if in large blocks this advantage can be exploited.In order to enable researchers and companies to compare methods and results, ISPRS and EuroSDR released a benchmark on multi-platform photogrammetry (Nex et al., 2015).In the framework of this project, a block of airborne oblique images, captured by the IGI Pentacam system, is provided to the community with an image orientation task assigned to it.In this paper we present first orientation results obtained with different software packages, both commercial products and research prototypes.In particular we aim to shed some light on the following aspects: Jacobsen and Gerke (2016) analysed the 5 cameras in detail to better understand whether the provided pre-flight calibration of camera internals should be used or a self-calibration (in-flight) be done.It was found that in particular in the oblique looking cameras systematic errors are evident and should be modelled in order to exploit the geometric accuracy potential.In addition radial distortions seem to be larger than documented in the calibration report.Therefore, in this paper we restrict ourselves to experiments where (at least some of) the interior image parameters get estimated in the bundle block adjustment.
In the context of the benchmark, no precise GNSS/IMU information was undisclosed.Therefore, for the experiments documented in this paper such data was not used, as well.Many of the mentioned influencing parameters have been investigated in detail by Rupnik et al. (2015).They found through simulation and real data cases that multi-view image blocks perform better in terms of obtainable object space accuracy.Concerning the influence of self-calibration a more complex observation was made: whether accurate GNSS information is used a-priori or not does have significant impact on the final accuracy.The impact of GCP distribution patterns on the final result differed between case study areas.
In this paper we like to address similar questions using the benchmark dataset.Since the data is available to every interested researcher all different aspects can be assessed individually and compared.In Rupnik et al. (2015) it was also stated that an overlap of 80% (forward) and 60% (sidelap) constitutes a good compromise between obtainable accuracy and economic considerations.In order to make the impact of overlap more obvious we concentrate on the two configurations 80/80% and 60/60%.

DATASET
Data acquisition took place in the context of the ISPRS scientific initiatives 2013/14 and 2014/15 and was also supported by EuroSDR.In May 2014 two Pentacam image blocks were flown in Dortmund city center and Zeche Zollern, close to Dortmund.While the former dataset is being used for the image orientation benchmark part, the latter one has been released for a dense image matching benchmark in close cooperation with EuroSDR (Haala, 2016).For a detailed description check (Nex et al., 2015).Some key data for the city center block are: 1260 images (i.e.252 stations) have been captured over the center of Dortmund in an 80/80% overlap configuration, the total coverage is approximately 3.9x2.8km².The used nadir camera (Hasselblad) in the IGI Pentacam housing has a 50mm lens and a resolution of 50MP@16bit RGB.In contrast the four Hasselblad cameras mounted for the oblique views have an 80mm lens, but the same image resolution.The average ground sampling distance (GSD) is 10cm for both, nadir and oblique views, and varies from 8 to 12cm for the latter.
For the first phase of the benchmark only a subset of this data is currently being released, according to a 60/60% overlap.Also exterior orientation elements retrieved from the onboard GNSS/INS are not released in full quality, but rounded to the full meter, or degree, respectively.The motivation to apply those restrictions was that in a first phase an imperfect data should challenge the bundle block adjustment (BBA) and remaining deficits are easier to identify.An RTK-GNSS survey was performed in order to capture 33 well-distributed 3D ground points in total.Out of those 10 were selected and undisclosed to the benchmark participants to be used as ground control points.The remaining 3D points are used as check points for accuracy assessment.For this paper, however, some experiments are conducted using different GCP distribution patterns, and partly also using all images (80/80% overlap).

DESCRIPTION OF EXPERIMENTS
The experiments description is subdivided into two parts: In the first part of experiments, section 4, all images and GCP were used and partly compared to the reduced set.Here we focus on how successful tie point matching is, compare the effect on accuracy if only the nadir images are used -opposed to the full set of images, and finally analyse the influence of image overlap and GCP distribution.For those tests mainly two software packages have been used: Pix4Dmapper by Pix4D (www.pix4d.com)and the BLUH system by the Leibniz University Hannover.Although only a limited number of systems got used we believe that the general observations might be extrapolated, especially since only different settings and the influence of variations in input data are compared.
In the second part (section 5) only the image subset released to participants and respective GCPs were used in order to compare the performance of different BBA approaches.

Tie point matching
Tie point matching across viewing directions is more challenging compared to nadir-only image blocks.Indeed points on lateral parts of objects are well visible, but for matching those points a possible occlusion and symmetry need to be considered.For example see Fig. 1: this base of a steel tower is well visible, also its lateral parts.The same point is indicated by the green cross.Most key point descriptors only take into account the local grey value distribution and do not consider occlusions.A matching algorithm based on feature vectors derived purely from grey value distribution most likely find wrong matching mates, hence a successful matching across the viewing directions depends on reliable outlier filtering.Table 1 indicates the number of matches across viewing directions obtained using Pix4D, in this case in the 80/80% overlap scenario.The numbers are computed by ranking all matches of a certain image combination.The cells show the 75% percentile.For instance Front/Back=767 means that the 75% rank of all matches between the front camera image and the back camera image (i.e. in the same strip, or across strips, in both flying directions) is 767 matches.Within the same camera (i.e. the main diagonal in the table) we observe most matches, but also in the left -right and frontback combinations: due to the high overlap of strips the respective cameras observe the scene from actually the same cardinal direction.Matching between images enclosing a large direction difference is not observed.Nadir to oblique views get matched less compared to pairs pointing to the same direction.This observation is reasonable because the aforementioned occlusion problems are less evident when the scene is observed from the same direction.

Nadir-only vs. Pentacam
In terms of BBA accuracy the use of oblique views should be beneficial: the ray geometry helps not only during image selfcalibration but also stabilizes the block.This was already shown in earlier research (Rupnik et al., 2015).On the other hand the large variations in scale impose challenges to the BBA.
In this experiment we employed the full overlap which is 80/80% and a good, i.e. quite regular GCP distribution in order to support BBA as much as possible.In Fig. 2 the RMSE at GCP and CP are shown, separated for all three components.For the full Pentacam set all residuals are below one GSD which is 10cm on average.In the nadir-only case especially the Zcomponent accuracy decreases by more than 50% for the check points, for GCPs it is 3times worse.This observation is confirmed by the simulation and one of the real cases presented in (Rupnik et al., 2015).

Influence of overlap
A further analysis was done regarding the influence the image overlap has onto the final adjustment quality.Again, here we just focus on object point residuals, and in particular only in Z RMSE in order to make the charts easier to read.The chart in Figure 3 shows different scenarios.Besides the overlap it shows variations in GCP distribution (see section 4.4).In order to They refer to the 80/80% overlap in good GCP distribution, using the Pix4D software, and the 60/60% overlap with the same settings, and the 60/60% overlap with good distribution, but using BLUH.With the 80/80% overlap Pix4D was able to produce check points RMSE for the Z-component in around average GSD level which is a very good result, as also reported in Rupnik et al. (2015).With the same software, and the same GCP distribution, but reducing the available images to 60/60% overlap, the RMSE of the Z-component at check points increases by 60 percent to 16cm, at the same time the respective value at GCPs raises to almost 7cm.Obviously remaining errors from self-calibration cannot be compensated by ground control sufficiently anymore.Keep in mind that the number of images is 4 times smaller when the overlap reduces from 80/80% to 60/60%.
Last not least we tested the same configuration with the BLUH system.In order to better compare the BBA performance, the tie points from Pix4D were imported into BLUH.The fifth group in Fig. 3 shows the result.While the residuals at GCPs are larger than with Pix4D (9.5cm vs. 6.8cm), the check point residuals are almost half size (8.4cm vs. 16cm).However, due to different handling of GCP residuals in BLUH the GCP residuals in BLUH are in a similar size as the discrepancies at the check points, so only the residuals at check points can be used to compare different programs.
The better results at check points for BLUH confirms conclusions drawn by Jacobsen and Gerke (2016) regarding the need for additional parameters in self-calibration in order to compensate for systematic camera errors.Those parameters are implemented in BLUH, but not in Pix4D and when a reasonable number of GCPs is available, the parameters can be significantly estimated.Another important hint is that due to the thinning of strips from 80% to 60% side-lap all strips are flown in the same direction, and this adds another uncertainty, especially in the estimation of the principal point (Jacobsen and Gerke, 2016).

Influence of GCP distribution
The same chart in Figure 3 shows the experiments we use to analyze the impact of different GCP distributions.The benchmark pattern (BD) is characterized by ground control in the East and West outer edges of the area only, i.e. in the center of the block there is no support.The good distribution (GD) resembles a typical pattern which gives geometric support in the entire area.Refer to Figure 4 -the purple dots indicate the GCP positions in the two configurations.
For the interpretation of the chart in Fig. 3 it is handy to compare the 1 st with the 2 nd , the 3 rd with the 4 th and the last two bar-groups because in the respective cases only the GCP pattern changes.In addition the last two groups refer to experiments with BLUH while the other results were obtained with Pix4D.
For the full image overlap (80/80%), first and second bar-group, it actually does not seem to matter significantly which GCPs are available.Similar as in the previous subsection it seems that the good image overlap helps to compensate for deficits in image calibration.In this line of argumentation we can also interpret the next two bar-groups: 60/60% overlap, good distribution vs. benchmark distribution, use of Pix4D.When the good GCP distribution is used, results are worse compared to the 80/80% case with the same GCP availability, but still acceptable; some block deformation is visible.With the same image overlap, but the reduced number of GCP, the RMSE in Z is 25cm, which is 9cm more compared to the previous case and 50% worse compared to the full image overlap.Also with BLUH (last two bar groups) the Z error is much larger for the unfavorable GCP distribution.As explained above the residuals at GCPs are much higher for BLUH, while those at check points are comparable between BLUH and Pix4D.It seems that the additional parameters for self-calibration only lead to superior results when a good GCP distribution is provided.
In order to show the effect of different GCP distributions, the residuals of the last two configurations (BLUH, 60/60%, good vs. benchmark distribution) are shown in the plots in Figure 4.Even in the good GCP distribution we see large Y-elements in the residuals.As reported in Jacobsen and Gerke (2016), the fact that in this 60/60% configuration all strips are flown in the same direction causes some remaining systematic shifts.
Interestingly with Pix4D the X and Y residuals are much smaller for the benchmark GCP configuration.One possible explanation might be the different handling of points from oblique views within the BBA.

EXPERIMENTS USING THE BECHMARK RELEASE DATASET ONLY
In this last part of experiments we just use the dataset released for the benchmark (60/60%, only sides of the block covered with GCPs).So far we have received 5 submissions from participants, obtained by research or commercial systems.

BLUH:
The BBA handled by BLUH is based on the image coordinates exported by Pix4D, nevertheless several observations were eliminated by automatic error detection.The self calibration is explained in detail in (Jacobsen and Gerke 2016).Without GPS coordinates of the projection centers the focal length and principal point location cannot be determined significantly due to high correlation with the EO-parameters.
On the other side the significant parameters of the general 12 additional parameters of BLUH (Jacobsen 2007) and the special parameters for the image corners (Jacobsen et al. 2010) are required and lead to improvement of the results.OrientAL (Karel et al., 2013): In order to better cope with the large perspective distortions between views from different cameras on the platform, OrientAL extracted and described image feature points found in virtual views generated from the original aerial photos by applying affine transformations onto them.These affine transformations were selected according to the rotation of an affine camera sampling a half sphere centred at the image centre and such that all of the original image content gets projected.As a standard approach, OrientAL matched SIFT image feature descriptors as mutually nearest neighbours in feature space, with an upper threshold on the ratio to the distance to the second-nearest neighbour descriptors.As the cameras on the platform proved to have an unknown synchronisation accuracy, the relative orientations of the cameras on the platform were not taken into account in the bundle block adjustment.Thus, only two different observation types were used, namely image observations for the automatically extracted tie points, GCPs, and CPs, and direct observations of the GCP object space coordinates, which were treated as unknowns.Observations of GCP object space positions were given 5cm standard deviations a priori for horizontal coordinates, and 10cm for vertical coordinates, taking into account the GCP definition qualities as visible in the aerial photos, and assuming proper static differential GPS observations.GCP and CP image positions were assumed to have been observed manually, without the help of centroiding operators, and they were thus given standard deviations of 1.5 pixels a priori for each coordinate.The weights of automatic tie point observations were adapted to the statistics of their residuals after outlier removal, finally resulting in 0.5 pixels standard deviations for each coordinate a priori.In the thinned out 60%/60% dataset, all flight strips share the same direction, and thus, all images share the same azimuth.As a consequence, for this aerial image dataset with according flat object space, severe correlations resulted between the focal length of the nadir camera and the Z-coordinate of its projection centres in object space of up to 80%.Also, the y-coordinate (in columndirection of the raster images) of the principal point positions of oblique cameras and their focal lengths showed to be almost fully correlated.As it did not have a notable impact on sigma naught, y-coordinates were therefore kept constant.As no other lens distortion coefficients proved to be feasible, only radial distortion coefficients of third order were adjusted for all cameras in addition to the aforementioned parameters.

H. Hu, Southwest Jiaotong University (SWJTU):
This approach is designed for the penta-view oblique camera system.To improve the efficiency of feature matching, both nadir and oblique images are geometrically rectified to alleviate the affine deformation, using initial Exterior Orientation (EO) parameters and a rough DEM.It is preferred that the EO parameters for the nadir images are adjusted in advance and the initial EO parameters for the oblique images are obtained from the calibrated relative rotation and translation (platform parameters) between nadir and oblique views.The rectified images are matched using SIFT-like methods and subpixel accuracies are obtained using least square matching.Outliers are filtered through a RANSAC approach and spatial relationship constraints of the tie points (Hu et al., 2015).Pairwise matches are joined into tracks using the connected components algorithm (Agarwal et al., 2011) and the tracks with more image points are selected.For the combined bundle adjustment, the EO parameters for the nadir images will be optionally kept fixed if already adjusted and only EO parameters for the oblique images are estimated.In the benchmark test, the EO parameters for the oblique images are adjusted independently.For the selfcalibration models, the principal distance, principal point, Brown's distortion model and the Fourier model (Tang et al., 2012) are adjusted per camera.Furthermore, to account for possible difference in image qualities, a special weighting strategy for the image observations are adopted, which is related to the size and shape of the pixel projected onto the ground.
Commercial solutions (Pix4D and Agisoft): Oblique airborne image orientation performance in Pix4Dmapper and AgiSoft Photoscan were analysed in detail by Ostrowski and Bakuła (2016).Aforementioned approach was applied to the benchmark dataset.In Pix4Dmapper a standard processing method for aerial grid was used, enriched by geometrically verified matching and rematch.The default number of key points per image was limited to 25 000 in order to speed up the processing.
Internal orientation parameters were used as initial values for full self-calibration (in case of Pix4D full self-calibration includes: focal length, principal point, three radial and two tangential distortion parameters adjustment) also cameras positions from GNSS were used as observations during adjustment (with a-priori accuracy of 1m).Accuracy of GCP measurement (in object space) was assumed as 10 cm in both horizontal and vertical direction.
Previous experiments showed that in the AgiSoft PhotoScan processing of oblique images orientation is quite unstable.Because of that the benchmark dataset was processed without pair-preselection and with a limit of 60 000 key points per image (without any limitation of tie point number).The quality of orientation step was set to "high, which is equivalent to image matching in full scale.Similarly to Pix4D IO and EO parameters were used as initial values during adjustment.The observations accuracy of camera position (GNSS) was set to 2m and 10 cm for GCP.Accuracy of measurement in image space was set to 4 pixel in case of tie points and to 0.5 pixel for GCP.The initial image orientation step was followed by camera optimization, which was a self-calibration with adjustment of following parameters: focal length, position of principal point, three radial and two tangential distortion parameters.
The named approaches were used to adjust the benchmarkrelease dataset and the RMSE values of residuals at check points are shown in Figure 5.The planimetric residual computed from X and Y of BLUH and OrientAL are slightly larger than the respective Z values.As already discussed in 4.4 it seems that the low number of GCPs does not provide sufficient observations to compensate for systematic effects in this 60/60% block.The solution from SWJTU and the commercial systems show a bit smaller planimetric residuals in the range of the average GSD.While with the 80/80% overlap (Fig. 2) the Z-residuals are even smaller than in planimetry, in this case they are larger, and there is no significant difference between the individual solutions.
The different planimetric accuracy between BLUH/OrientAL and the other packages might be caused by different strategies on how the point observations from oblique airborne images are treated within the BBA.This is an interesting observation and need further analysis, ideally based on the full image and GCP data set.In our experiments we also found out that the obtainable height accuracy is better when all images are used in bundle block adjustment.This was also shown in other research before and is confirmed here.Therefore, the large overlap of 80/80% provides much better object space accuracy -random errors seem to be about 2-3fold smaller compared to the 60/60% overlap.Systematic errors in object space can be avoided by a good control point distribution, and the use of additional parameters in self-calibration.The BBA approaches, compared in section 5, confirm the observations from the detailed analysis.Commercial software packages which were intended to be used for unordered small frame image blocks perform very good compared to research solutions.

Figure 1 :
Figure 1: Object point as visible from different directions.Data from the Zeche Zollern dataset.

Figure 2 :
Figure 2: RMSE at GCPs and CPs: 80/80% overlap, nadir only vs. Pentacam full set.Regular GCP distribution, average GSD: 10cmcompare just the overlap, only the first, third and fifth group is of interest.They refer to the 80/80% overlap in good GCP distribution, using the Pix4D software, and the 60/60% overlap with the same settings, and the 60/60% overlap with good distribution, but using BLUH.With the 80/80% overlap Pix4D was able to produce check points RMSE for the Z-component in around average GSD level which is a very good result, as also reported inRupnik et al. (2015).With the same software, and the same GCP distribution, but reducing the available images to 60/60% overlap, the RMSE of the Z-component at check points increases by 60 percent to 16cm, at the same time the respective value at GCPs raises to almost 7cm.Obviously remaining errors from self-calibration cannot be compensated by ground control sufficiently anymore.Keep in mind that the number of images is 4 times smaller when the overlap reduces from 80/80% to 60/60%.Last not least we tested the same configuration with the BLUH system.In order to better compare the BBA performance, the tie points from Pix4D were imported into BLUH.The fifth group in Fig.3shows the result.While the residuals at GCPs are larger than with Pix4D (9.5cm vs. 6.8cm), the check point residuals are almost half size (8.4cm vs. 16cm).However, due to different handling of GCP residuals in BLUH the GCP residuals in BLUH are in a similar size as the discrepancies at the check points, so only the residuals at check points can be used to compare different programs.The better results at check points for BLUH confirms conclusions drawn byJacobsen and Gerke (2016) regarding the need for additional parameters in self-calibration in order to compensate for systematic camera errors.Those parameters are implemented in BLUH, but not in Pix4D and when a reasonable number of GCPs is available, the parameters can be significantly estimated.Another important hint is that due to the thinning of strips from 80% to 60% side-lap all strips are flown in the same direction, and this adds another uncertainty, especially in the estimation of the principal point(Jacobsen and Gerke, 2016).

Figure 5 :
Figure 5: Residuals at check points, results from different systems6.CONCLUSIONSThe results of the experiments conducted on the ISPRS/EuroSDR multi-platform benchmark let us draw some interesting conclusions.As far as the tie point matching is concerned we observed that matching of overlapping images pointing to the same cardinal direction, or between nadir and oblique views in general is quite successful.Due to the quite different perspective between images of different viewing directions the standard tie point matching, for instance based on interest points does not work well.How to address occlusion and ambiguities due to different views onto objects is clearly a non-solved research problem so far.In our experiments we also found out that the obtainable height accuracy is better when all images are used in bundle block adjustment.This was also shown in other research before and is confirmed here.Therefore, the large overlap of 80/80% provides much better object space accuracy -random errors seem to be about 2-3fold smaller compared to the 60/60% overlap.Systematic errors in object space can be avoided by a good control point distribution, and the use of additional parameters in self-calibration.The BBA approaches, compared in section 5, confirm the observations from the detailed analysis.Commercial software packages which were intended to be used for unordered small frame image blocks perform very good compared to research solutions.

Table 1 :
Matching matrix indicating number (75% percentile from all combinations) of matches across viewing directions (left/right/front/back related to the flight direction).