EFFICIENT COMPUTATION OF POSTERIOR COVARIANCE IN BUNDLE ADJUSTMENT IN DBAT FOR PROJECTS WITH LARGE NUMBER OF OBJECT POINTS

One of the major quality control parameters in bundle adjustment are the posterior estimates of the covariance of the estimated parameters. Posterior covariance computations have been part of the open source Damped Bundle Adjustment Toolbox in Matlab (DBAT) since its first public release. However, for large projects, the computation of especially the posterior covariances of object points have been time consuming. The non-zero structure of the normal matrix depends on the ordering of the parameters to be estimated. For some algorithms, the ordering of the parameters highly affect the computational effort needed to compute the results. If the parameters are ordered to have the object points first, the non-zero structure of the normal matrix forms an arrowhead. In this paper, the legacy DBAT posterior computation algorithm was compared to three other algorithms: The Classic algorithm based on the reduced normal equation, the Sparse Inverse algorithm by Takahashi, and the novel Inverse Cholesky algorithm. The Inverse Cholesky algorithm computes the explicit inverse of the Cholesky factor of the normal matrix in arrowhead ordering. The algorithms were applied to normal matrices of ten data sets of different types and sizes. The project sizes ranged from 21 images and 100 object points to over 900 images and 400,000 object points. Both self-calibration and non-self-calibration cases were investigated. The results suggest that the Inverse Cholesky algorithm is the fastest for projects up to about 300 images. For larger projects, the Classic algorithm is faster. Compared to the legacy DBAT implementation, the Inverse Cholesky algorithm provides a performance increase by one to two orders of magnitude. The largest data set was processed in about three minutes on a five year old workstation. The legacy and Inverse Cholesky algorithms were implemented in Matlab. The Classic and Sparse Inverse algorithms included code written in C. For a general toolbox as DBAT, a pure Matlab implementation is advantageous, as it removes any dependencies on, e.g., compilers. However, for a specific lab with mostly large projects, compiling and using the classic algorithm will most likely give the best performance. Nevertheless, the Inverse Cholesky algorithm is a significant addition to DBAT as it enables a relatively rapid computation of more statistical metrics, further reinforcing its application for reprocessing bundle adjustment results of black-box solutions.


Background
One of the major quality control parameters in bundle adjustment are the posterior estimates of the covariance of the estimated parameters. In photogrammetric projects oriented towards robust measuring purposes, this is not only essential but very important for quality control. Indeed, the covariance values indicate the quality of the bundle adjustment result and thus its feasibility for use as topographic and surveying products. A common practice would be to compare these posterior covariance values with the project requirements, which often times depends on the required map scale. A photogrammetric mission may be considered acceptable if and only if its quality parameters, including the covariance values, are within the project's tolerance.
A more in-depth analysis to the geometrical qualities of a photogrammetric project is also sometimes necessary, for example when errors or deviation from prior values are detected. This becomes more and more important with the increasing Since geometric quality is of the utmost importance in these spatial applications, the computation of covariance is very interesting for quality control purposes. However, for large projects, the computation of especially the posterior covariances of object points (OP) can be a time consuming process.
The Damped Bundle Adjustment Toolbox (DBAT) for Matlab was conceived as an open source toolbox to perform bundle adjustment computations (Börlin and Grussenmeyer, 2013). It has been used for several applications, most notably to reprocess the bundle adjustment results of commercial solutions (Murtiyoso et al., 2018). The main highlight of DBAT has always been its modularity (Börlin et al., 2019) and openness, with the possibility to access bundle adjustment metrics contrary to the black-box nature of several commercial software.

Related work
Various schemes that exploit the sparsity of the normal matrix to speed up the bundle adjustment computations were presented already in the early days of Bundle Adjustment (Brown, 1968(Brown, , 1976 and are now part of standard photogrammetric textbooks (e.g. Kraus, 2007;Wolf et al., 2013;Luhmann et al., 2014;Förstner and Wrobel, 2016).
In most cases, the presentations are focused on speeding up the computations during the bundle adjustment iterations, where the linearised equation has a single right-hand side. In contrast, the computation of the posterior covariance has a large number of right-hand sides, and said algorithm cannot be immediately applied. Literature that discusses algorithms for posterior covariance computations include Triggs et al. (2000); Wolf et al. (2013);Förstner and Wrobel (2016); Ila et al. (2017). However, the details of how the sparsity can be exploited are rarely given.
The method by Takahashi et al. (1980) can be used to compute subsets of the inverse of a sparse matrix. The sparse inverse (SI) method was developed for electrical networks, but can be applied to any sparse matrix problem.
Recently, Kallin (2019) presented an algorithm that computed the explicit inverse of the Cholesky factorisation of the normal matrix using a combination of sparse and dense computations. Depending on the density of a matrix, i.e., the fraction of non-zeros, dense computations can have performance advantage over sparse computations due to their more efficient use of the memory hierarchy Duff et al. (2002); Goto and Van De Geijn (2008). Furthermore, the dense algorithms are easier to parallellise due to their predictable memory access patterns. The inverse Cholesky algorithm (called VDSIBlock in Kallin (2019)) was shown to outperform the SI algorithm on small-tomedium-sized data sets.

Aim
The aim of this paper is to extend the investigation of the Inverse Cholesky algorithm to larger data sets and to compare the results with the current DBAT implementation and classic photogrammetric algorithms.

Non-zero pattern
The normal matrix has a characteristic pattern of non-zero elements. If the parameters are ordered with the external orientation (EO) parameters first, the normal matrix may be blocked as The internal block structure is visualised in Figure 1. The Nee and Noo blocks are block diagonal with 6-by-6 and 3-by-3 blocks, respectively. The non-zero coefficients of each block relate only to the EO or OP parameters of a single camera or point, respectively. In contrast, the Neo block contain 3-by-6 blocks that connect images to object points. Figure 1: Non-zero structure of the normal matrix with the EO parameters first (left). The Nee block has 6-by-6 blocks that each correspond to the EO parameters of one camera. Similarly for Noo and object points, but with 3-by-3 blocks. The Noe block consists of 3-by-6 blocks that connect an image with an object point. Non-zeros can only appear at block row i, block column j of Neo if object point i was measured in image j (right). Figure 2: If the object points are ordered first, the normal matrix forms an arrowhead matrix, where the Noo block forms the shaft of the arrow, and the remaining blocks form the tip at the lower and right borders of the matrix (left). The arrowhead shape is more clear for real projects, where the number of OP make the Noo block dominate the matrix. The right figure shows the normal matrix for the ROMA data set (see Section 4.1) with 60 images and about 25000 object points.

Arrowhead permutation
If the ordering of the unknown parameters is changed to have the object points first, the blocks of the normal matrix are swapped to With this ordering, the non-zero structure of the matrix has the shape of an arrowhead, where the large Noo block form the shaft of the arrow, and the remaining block form the "tip" of the arrow at the lower and right borders ( Figure 2).

Self-calibration
For a self-calibration project, the normal matrix also include blocks for the internal orientation (IO) parameters. With the IO parameters first, the blocking becomes where the new blocks have the width corresponding to the number of IO parameters and are typically fully dense. The corresponding arrowhead ordering is See Figure 3.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) Figure 3: A self-calibration normal matrix has an added border of dense blocks. The blocks have a thickness equal to the number of estimated IO parameters. In the standard ordering (left), the IO blocks form the left and upper border of the matrix. In the arrowhead ordering, the IO blocks instead form the lower and right border.
Algorithm 1 The classic algorithm for computing posterior covariance for object points. The sparse and dense steps are computed in Matlab using sparse and dense linear algebra routines, respectively. The C code step is computed by compiled C-code called from Matlab ("mex" code).
Require: N has standard ordering 1: A ← Nee.

Classic
The classic algorithm, as described in Wolf et al. (2013, Ch. 17-12) uses the normal matrix N in standard ordering (eq. (1)): where m and n are the number of images and object points, respectively.
Given the same partitioning of the posterior covariance the matrix Coo contain the posterior covariance for the object points. Mathematically, the matrix Coo can be computed as follows (cf. (Wolf et al., 2013, eq. (17-34-36))): The actual computation is given in Algorithm 1.

Inverse Cholesky
The Inverse Cholesky algorithm uses the arrowhead ordering (eq. (2)) Figure 4: The non-zero patterns of the Cholesky factor L (left) forms half an arrowhead. The L11 block is block diagonal with 3-by-3 lower triangular blocks. Due to the arrowhead shape of the normal matrix, the fill-in during the Cholesky factorisation is restricted to the lower triangular L22 block. The L21 block does not experience any fill-in at all. The non-zero pattern of the Cholesky inverse K = L −1 (right) is similar to that of L, except that the K21 and K22 block may experience fill-in. The K22 block will typically be close to half full.

The normal matrix is Cholesky factorised into
The non-zero structure of L forms half an arrowhead (Figure 4, left). The L11 block will be 3-by-3 block diagonal with lower triangular blocks. Due to the arrowhead structure of the normal matrix, there will be no fill-in in the L21 block during the Cholesky factorisation. Instead, the fill-in is restricted to the lower triangular L22 block.
The inverse K of the Cholesky factor is This leads to the following equations: The K11 block can be computed using sparse linear algebra and will have the same sparsity pattern as L11. In the computation of K21, K11 is known to be sparse, whereas the density of L21 and L22 may vary. The K21 block can expect a fill-in compared to L22. The K22 block will typically be close to half full. See Figure 4, right.
Using the Cholesky factor, the posterior covariance matrix can be computed as For notational convenience, let P = K11, Q = K21. Then, Note that the K22 block is not needed. The structure of the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)

3-by-3 diagonal blocks of
where ki are column vectors of K. Thus, the diagonal elements of Coo are the inner products of ki with themselves. Furthermore, the (2,1) element within each block is the inner product of ki and ki+1, and similar for the other off-diagonal elements. Due to symmetry, only the sub-diagonal elements need to be computed.
Given that the inner product can be computed as where pi is sparse with at most three non-zeros, but qi may be dense. Thus, the computation may be split into a combination of sparse and dense linear algebra.
The computation of the diagonal elements can be performed efficiently in Matlab using a combination of the Hadamard (element-wise) product and the colsum function. The call v=colsum(A) returns a vector such that vi holds the sum of the elements of the i:th column of A. Also, let the function extract(A,d) return every third column of A starting with column d.
The actual computation is given in Algorithm 2. The diagonal elements are collected in one 3n-vector d, where n is the number of object points. The sub-diagonal elements are collected in three n-vectors sij. Finally, the call C=assemble(d, s21, s31, s32) assembles the vectors to a blockdiagonal matrix C.

Sparse inverse
The SparseInv (SI) algorithm (Förstner and Wrobel, 2016;Kallin, 2019) is a C implementation of the Takahashi et al. (1980) algorithm (Davis, 2014). The SI algorithm computes a subset of the non-zero elements of the inverse C = N −1 .

Legacy DBAT
The posterior covariance computation in DBAT up until version 0.9.1 used the inverse Cholesky approach but looped over the object points.
If the normal matrix is in arrowhead ordering and the identity matrix is column blocked as Algorithm 2 The inverse Cholesky algorithm for computing posterior covariance for object points.
In an attempt to use memory more efficiently, the implemented algorithm loops over blocks of 30 points. The algorithm is given as Algorithm 3.

Total variance only
The diagonal elements of Coo can be used to compute the total variance (trace) of each point. The necessary modification of the classic algorithm is to modify step 8 to compute the diagonal elements only. In the inverse Cholesky algorithm, steps 8-15 are omitted and the matrix is assembled from the diagonal only.

Self-calibration
For self-calibration projects, the only modification of the classic Algorithm 1 is to have steps 1-2 include the IO blocks, i.e., to match No modification of the inverse Cholesky algorithm is necessary. Instead the L21 and L22 blocks are re-interpreted to correspond to all non-OP parameters.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)

Data sets
Ten data sets of varying sizes were prepared for the experiments: Camcal A 21-image camera calibration data set from Börlin and Grussenmeyer (2014).
Vexcel A 41-image vertical aerial dataset flown at an altitude of 950 m above sea level (ASL) above the city of Strasbourg with an UltraCam Osprey Mark 3 Premium camera.
Roma A 60-image terrestrial close-range data set of the Arco di Constantino monument in Rome, Italy from Börlin and Grussenmeyer (2013).
StPierre A 239-image close-range data set of the St-Pierre-le-Jeune church in Strasbourg from Murtiyoso et al. (2017).
Sewu A 210-image data set acquired by the DJI Phantom 4 RTK drone flown at 60 m ASL over the Sewu temple in Central Java, Indonesia.
UmU Five data sets from several flights with a DJI Phantom 4 V2.0 drone over the Umeå University Campus, Umeå, Sweden.
UmU-3D A 272-image data set flown at 70 m ASL in "3D" cross-hatch mode. The data set includes both east-west and north-south strips. UmU-2D A 133-image subset of UmU-3D with the eastwest strips only. UmU-3D-R The UmU-3D data set, reduced to 210 images to match Sewu. The reduction was done by removing every other north-south strip. UmU-2-lyr UmU-3D extended with another "3D" flight at 120 m ASL over a larger area. UmU-3-lyr UmU-2-lyr extended with another "3D" flight at 40 m ASL over a smaller area.
All data sets were self-calibration data sets. The Camcal and Roma images were processed by PhotoModeler Scanner 2012 1 and imported to DBAT. The remaining data sets were processed by AgiSoft Metashape v1.6 2 . During import of the Metashape projects into DBAT, all object points with fewer than 3 rays and/or an estimated intersection angle below 5 degrees were removed before processing by DBAT. The statistics for the data sets are presented in Table 1.

Algorithm variants
The following algorithm variants were implemented and used to process normal matrices: CC The classic Algorithm 1 of Section 3.1.

SI 1
The sparse inverse algorithm of Section 3.3 with the normal matrix in standard ordering.
SI 2 Same as SI 1 , but with the normal matrix in arrowhead ordering.
IC The inverse Cholesky Algorithm 2 of Section 3.2.
IC-S The inverse Cholesky Algorithm 2 where the sparse-todense-conversion step 4 was omitted and all subsequent operations were performed using sparse linear algebra.
LD The legacy DBAT Algorithm 3 of Section 3.4.

Experiments
A total of four experiments were performed: Self-calibration (SC) The unchanged normal matrices of each data set of Section 4.1 were given to each of the algorithms of Section 4.2.
No self-calibration (NSC) Same as the SC experiment, except the normal matrices were stripped of the IO blocks before processing.
Total variance only (TV) The SC and NSC experiments were repeated with the diagonal-only-variants of the CC and IC algorithm described in Section 3.5.
The effect of the number of cores The SC experiment was repeated with the CC, SI 1 , and IC algorithms where the number of available CPU cores were varied from one to six.
In all cases, the execution time for each algorithm on each data set was recorded. The normal matrix was assumed to have been computed by the last iteration of the preceding bundle adjustment. Thus, the time to compute the normal matrix was not included. Furthermore, the time to permute the normal matrix to the ordering preferred by each algorithm was excluded. Additionally, the density of the L22 and K21 blocks of the IC algorithm were recorded, as well as the density of the K22 block.
The timings were performed on an HP Z440 workstation from 2015 with a 6-core Intel Xeon E5-1650 v3 @ 3.50GHz and 64GB of RAM running Matlab 2019b Update 1 under Debian 10.

Results
The density numbers are given in Table 2. The L22 density varied between 30% and 50%. Except for the camera calibration data set, the K21 density varied between 53% and 66%. All K22 densities were very close to 50%.
The execution times for the CC, SI 1 , IC, and LD algorithms for the SC and NSC experiment are given in Table 3. The CC execution times were dominated (80-90%) by the computation of the 3-by-3 blocks in step 8. For the large problems (data sets 6-10), the IC execution times were dominated by (50-85%) by the K21 (Q) computation in step 5. The execution times for the SI 2 algorithm was within 15% of the SI 1 algorithm. The IC-S algorithm was several orders of magnitude slower than the IC algorithm. The results show that the IC algorithm is fastest on SC data sets 1-8 and on NSC data sets 1-7. The CC algorithm is fastest on the remaining data sets, with the SI algorithm somewhere in between. In both cases, the cross-over point correspond to a B density of about 2%. Except for the camera calibration data set, the IC algorithm is 20-120 times faster than the legacy LD algorithm. For the TV experiment, also presented in Table 3, the IC algorithm was fastest for SC data sets 1-7 and NSC data sets 1-6. In this case, the cross-over point corresponds to a B density of about 2.5%.
On average, the execution times for the SC data sets were 40-50% higher for the CC and SI algorithms compared to their NSC counterparts. In contrast, with the exception of the camera calibration data set, the SC and NSC times were within 1% of each other. In the diagonal-only TV experiment, the execution times decreased by about 40% for the CC algorithm and about 20% for the IC algorithm compare the full-block SC and NSC results.
The Sewu and UmU-3D-R data sets has the same number of images and object points. However, the number of image points are 38% higher in the Sewu data set. The same difference is seen in the number of rays per object point and the B-block density for the NSC data set, with a slightly lower difference for the SC data set. The increased point density translated to a 75-100% longer execution time for the CC algorithm. In contrast, the corresponding increase for the IC algorithm was about 6%.
The result of the speedup Experiment 4 are shown in Figure 5. Only the IC algorithm benefited from an increased number of cores, but with a decreasing efficiency as the number of cores are increased. Using six cores the speedup was about 2.5. On a single core, the IC algorithm was faster the the CC algorithm for data sets 1-4, and within a factor of 2 for data sets 1-8.

DISCUSSION
In this paper, several algorithm for the computation of posterior covariance of object points were applied to self-calibration and non-self-calibration data sets of varying sizes, ranging from 21-944 images. On the smallest data sets, the fastest algorithm was the Inverse Cholesky (IC) algorithm. On the largest data sets, the classic (CC) algorithm was fastest. The performance of the Sparse Inverse (SI) algorithm was somewhere in between CC and IC. The relative results between SI and IC are consistent with the results by Kallin (2019), on similar problem sizes. Number of cores Speedup compared to single core CC SI IC Figure 5: Average speedup as a function of the number of cores. Only the IC algorithm benefits from an increased number of cores, but at a diminishing rate as the number of available cores increases.
Compared to the legacy algorithm in DBAT, the IC algorithm was 1-2 orders of magnitude faster.
Several factors affect the computation time for a given algorithm, including the number of images, object points, and image points. A critical parameter appears to be the density of the mixed block of the normal matrix. As the image and point numbers increase, the density of the mixed block typically decreases. When the full 3-by-3 blocks of the posterior covariance matrix are computed, the density cross-over point for the fastest algorithm appear to be around 2%.
If the posterior correlations of the estimated object points are needed, the full 3-by-3 block must be computed. Otherwise, if only the total variance of each point is needed, only the diagonal elements of the posterior covariance have to be computed. In this case, a decrease in execution time of about 40% for the CC algorithm and 20% for the IC algorithm was observed, compared to the full block computations and the density cross-over point shifted to about 2.5%.
The IC algorithm is written completely in the Matlab language. As such, it benefits from an increased number of available cores due to the efficiency of the underlying numeric libraries used by Matlab. The performance increase was sub-linear. In contrast, the SI and CC algorithms were partially implemented in the C language and called from Matlab. The C code was not optimised for multiple cores. Further performance improvements are likely if the C code is parallellised, although the level is uncertain.
For a general toolbox as DBAT, a pure Matlab implementation is advantageous, as it removes any dependencies on, e.g., compilers. However, for a specific lab with mostly large projects, compiling and using the classic algorithm will give improved performance.
Compared to the previous version of DBAT, users can expect posterior covariance computations that are a few orders of magnitude faster. In summary, the computation time for the largest, 944-image dataset in this paper decreased by a factor of over 100. The actual computation time was around three minutes.

CONCLUSIONS
On medium-sized data sets of up to about 300 images, the Matlab-only IC algorithm is faster than the classic algorithm. On larger data sets, the classic algorithm is faster. However, on data sets up to about 900 images, and if a computation time of a few minutes at the end of a bundle adjustment processing is acceptable, the IC algorithm is still a useful high-level alternative.
We have shown throughout this paper that the novel IC algorithm implemented in DBAT managed to accelerate its computation of covariance values significantly. Experiments by comparing it to other methods, including the classical approach, showed that the results are comparable, albeit with a few caveats. For DBAT this is a significant addition as it enables a relatively rapid computation of more statistical metrics, further reinforcing its application for reprocessing bundle adjustment results of black-box solutions. Further investigations and experiments in this regard may also reveal other options for optimisation.