EVALUATION OF AN AREA-BASED MATCHING ALGORITHM WITH ADVANCED SHAPE MODELS

Nowadays, the scientific institutions involved in planetary mapping are working on new strategies to produce accurate high resolution DTMs from space images at planetary scale, usually dealing with extremely large data volumes. From a methodological point of view, despite the introduction of a series of new algorithms for image matching (e.g. the Semi Global Matching) that yield superior results (especially because they produce usually smooth and continuous surfaces) with lower processing times, the preference in this field still goes to well established area-based matching techniques. Many efforts are consequently directed to improve each phase of the photogrammetric process, from image pre-processing to DTM interpolation. In this context, the Dense Matcher software (DM) developed at the University of Parma has been recently optimized to cope with very high resolution images provided by the most recent missions (LROC NAC and HiRISE) focusing the efforts mainly to the improvement of the correlation phase and the process automation. Important changes have been made to the correlation algorithm, still maintaining its high performance in terms of precision and accuracy, by implementing an advanced version of the Least Squares Matching (LSM) algorithm. In particular, an iterative algorithm has been developed to adapt the geometric transformation in image resampling using different shape functions as originally proposed by other authors in different applications.


INTRODUCTION
The authors are part of a team in charge of the development of a STereo Camera (STC) for the ESA-JAXA mission BepiColombo to Mercury (Cremonese et al, 2009). STC will provide the images for the global mapping in stereo mode of the entire Hermean surface with a ground resolution varying from 50 m at the equator to about 115 m at the poles. In order to estimate and characterize the actual stereo reconstruction capabilities of STC, an indoor Stereo Validation Setup (SVS) has been developed and tested in laboratory with a functional breadboard. The SVS test is performed comparing with reference data the DTMs produced from the stereo pairs of a series of rock samples by Dense Matcher (DM), a software developed at the University of Parma. Originally designed for use in close range photogrammetry (Re C., 2012.), Dense Matcher has been optimized to cope with very high resolution images provided by the most recent missions (LROC NAC and HiRISE). Recently, improvements have been made to the image correlation kernel and the process automation. The paper focus on the evaluation of the matching performances of DM in planetary mapping scenarios using different datasets: with computer-generated images, with real images acquired by the SVS as well as with real images from HiRISE, LROC-NAC, MESSENGER-MDIS. For the real planetary images, the comparison with other software (Ames Stereo Pipeline (NASA), VICAR (DLR)) provides important information about the capability of DM to reconstruct the surfaces of the planets.
The accuracy should be computed through comparison with uncorrelated data that didn't participate in the generation of the DTM. (Karel & al., 2006). To this aim, as far as the procedures for the stereo validation and calibration of STC are concerned, reference data from a high accuracy laser scanner have been used. Laser altimeter data (LOLA) have been used as reference to evaluate the vertical accuracy of DTM from LRO-NAC images.

THE DENSE MATCHER PROGRAM
The DTM generation program Dense Matcher implements the NCC (Normalized Cross Correlation (Lewis, 1995) method, the Least Squares Matching (LSM) method (Gruen, 1986) and the Multiphoto Geometrically Constrained Matching (MGCM) method (Gruen & Baltsavias, 1988). The software, written in C# object-oriented language, was first developed for close-range applications and has been improved and adapted to planetary applications. More specifically, important changes have been made to the correlation kernel, still maintaining its high performance in terms of precision and accuracy by implementing an advanced version of the Least Squares Matching (LSM) algorithm. An iterative algorithm has been developed to adapt the geometric transformation in image resampling using different shape functions. Bethmann (Bethmann & al., 2010) showed that using different shape functions to model the geometric transformation in LSM can bring higher accuracy and solve, in some cases, numerical problems like pixel-locking (Stein, Andreas, & Larry, 2006).

Planetary Mapping with Dense Matcher
In the last decade, the great progress in high-resolution planetary imaging (with Ground-Sample-Distance (GSD) that can reach 25 cm) has two outstanding examples with the High Resolution Imaging Science Experiment (HiRISE) on Mars Reconnaissance Orbiter and the NAC of the Lunar Reconnaissance Orbiter Camera (LROC) on LRO. These instruments have provided the largest data-volume ever obtained from any space mission, with an astonishing capability of capturing details and features on the planetary surfaces. Nowadays the most important institutes involved in planetary mapping are developing strategies to cope with these extremely large data volumes though still maintaining high accuracy standards. Despite the diffusion of a series of new algorithms that yield superior results especially in qualitative terms (smooth and continuous surfaces) and in terms of processing time, the common trend is to opt for well established area-based techniques and the efforts are oriented to refine the method improving each phase of the process. The challenge is to create an efficient and automated process capable of generating high quality DTMs with minimal human intervention. For this reason, also the problem of processing extremely large images makes the implementation of ad-hoc processing strategies mandatory.

Workflow summary
The software processing workflow (shown in Figure 1) starts with a series of pre-processing steps. The stereo pair images (usually Experiment Data Record files -EDR files) are first preprocessed with Isis3 (Eliason, 1997) (conversion into cub file, radiometric calibration and optional pre-rectification with cam2map). Since finding image correspondences is much easier if perspective differences are minimized, the software allows two kind of image rectification: ortho-rectification (usually performed already by the cam2map module) or, alternatively, epipolar image rectification (when the images are acquired by a frame camera). The matching area is determined with a mask on the reference, also called master, image (i.e. the image where the points that should be recognized are located) at the original resolution. A regular grid is generated within the matching area on the master image; the coordinates of the approximated location of grid points on the slave image (i.e. the image where corresponding points should be identified) can be computed according to the type of image pre-processing: if the images are ortho-rectified and the DTM used for ortho-rectification is accurate enough, the slave's coordinates of corresponding points can be considered to be the same of those on the master image. On the contrary, when the DTM used for the projection is not accurate enough, the produced images can show misalignments and a small parallax could arise both in x and y directions. A constant parallax value can be input by the user or the program can estimate the parallax field by feature based matching (FBM). To improve the computational performances, the program decomposes the reference image into small, user defined size tiles. The search for corresponding points is carried out iteratively in an image pyramid (coarse-to-fine hierarchical approach). Different pyramid levels (generally 3) are generated from the original images by sub-sampling the images of the previous level. Resampling is done by a weighted Gaussian average of neighbouring pixel of the previous level. The SURF operator (Bay & al., 2008) or a simple rotation invariant interest operator (e.g. Harris operator (Harris et al., 1988), or Foerstner operator (Foerstner, 1987)) can be used to find first correspondences for this purpose. Once a set of consistent points is found (i.e. the disparities in each tile should be similar, since usually small terrain portions are considered in a single tile) a filtering module using the Random Sample Consensus (RANSAC) algorithm (Fischler & Bolles, 1981) remove eventual outliers. The disparity for each point is computed and the mean value is assigned to the whole tile as an approximated parallax value for the matching procedure. The matching phase begins with an NCC matching step to improve the initial disparity map (optionally at each level of the pyramid). For each pixel on the reference image, the approximate correspondence on the slave image is computed at a sub-pixel level with a parabola fitting cross-correlation algorithm ((Lewis, 1995)). The matches are then refined using a Least-Squares Matching algorithm (see section 2.3.4) that should provide the highest level of accuracy. The matched points are transferred to the next (higher) resolution level where the disparities of the additional points are predicted from the neighbourhoods with a bilinear interpolation technique, providing new starting location (approximate values) for the subsequent matching procedures. This process is repeated up to the original image resolution level. Normally an affine transformation is used as geometric transformation, but alternative shape functions (projective and polynomial) are also implemented. Matching blunders are removed with a Region Growing approach (see section 2.3.3); right now, the identified outliers are not replaced since the current implementation of DM lacks a "hole filling" strategy. Finally, the 3D-forward ray intersection calculation and raster DEM interpolation exploiting should be performed. Since our interest has been more focused on developing the matching phase and giving automation to the process, the object coordinates have been computed using the Ames Stereo Pipeline (ASP - Broxton et al., 2008) triangulation code. For this purpose we inject the DM disparity map in the ASP framework by suitably modifying the DM output to produce the input data in the format accepted by ASP.

Strategies
Working with space images means managing large amount of data: the nominal maximum size of red HiRISE images is about 20000 × 126000 pixels, while NAC images have about 5000 × 52000 pixels. In order to guarantee good computational performances, therefore, efforts have been put in the optimization of the processes through tiling and grid-matching. Furthermore, the image matching performance is influenced by illumination conditions, occlusions, shadows, poor texture, atmospheric dust and steep terrain. Blunders and mismatches might arise in such cases; most are rejected by a Region Growing strategy.

Tiling
Many improvements have been made to the DM code to increase its computational performance and processing speed. The adopted tile-based strategy ( Figure 2) divides the master image into squared tiles which are independently and consecutively elaborated by directly accessing the grey values in the raw data file (usually an ISIS cub file). The idea is to decompose the processing of the whole image into chunks easier to handle; moreover, image pyramids can be applied to each tile. This step aims at the pre-alignment of the images. The estimated mean disparity value provides a starting location to the points on the right image with respect the left one helping the image matching in the search of the homologous.

Gridded Matching and blunder elimination
In the latter version of DM a match grid is defined. The match grid has the same dimensions of the matching mask (ROI) and a matrix structure where for each node the access is possible in terms of: coordinates (x,y), parallax values, correlation coefficient etc. Grid points uniformly distributed over the whole image often lie in poorly textured regions or occluded areas: the search for the match of a given grid point has a higher possibility of yielding an ambiguous match or even no matching candidates. The internal quality measurement in the matching process is based on a minimum acceptable value for the correlation coefficient, which is effective but not error-free. In order to further remove blunders and to make the final product more robust and accurate, filters can be applied to the set of matched points before computing their 3-D coordinates. The strategy implemented is based on a Region Growing algorithm applied to the disparity map. The points are sorted with decreasing values of the NCC coefficient and examined from the top to the bottom of the list. For each point, the difference of disparities w.r.t. the 8-neighbours points is computed. If it is lower than a threshold the points are aggregated to form a region. The procedure is repeated on neighbouring points until no new data can be added to the current region. A new region is set and associated to a new point, restarting the aggregation process. At the end, regions made of single points or with a number of points below a threshold are discarded.

Using Projective and Polynomial Shape Functions in the Least Squares Matching
In the LSM optimization, commonly, an affine transformation is used as geometric transformation; however, perspective changes due to rough terrain morphology are difficult to accommodate by an area-based stereo correlator with such model only. Many authors (Sutton & al., 1988) (Lu & Cary, 2000) found that the use of a simplified shape function leads to lower computational efforts but provides inaccuracies when significant changing in the terrain curvature occurs. On the contrary, polynomial shape functions with higher orders lead to numerical instability due to possible over-parameterization of the equation system, even when not statistically significant parameters are rejected. In this case, good accuracies can be achieved only using a large template window to maintain high redundancy in the equation system. It has been proposed (Bethmann & al., 2010) to extend the functional model of the geometric transformation of the LSM in cases where images are convergent and/or the object cannot be considered flat within the template. The paper showed that using different shape functions to model the geometric transformation in LSM can bring higher accuracy and solve, in some cases, numerical problems like pixel-locking and furthermore can improve the details in shape reconstruction. The motivation for using advanced shape models is to implement a more accurate or realistic modelling of the map between the two images, trying to overcome the limitations of the affine model when its assumptions (especially along terrain edges) are not met. Two extended functional models have been suggested; the first aims specifically at convergent image pairs, the second is meant to improve the implicit modelling of the object shape with rough terrain. If the object is flat within the template window, the relationship between the images is a (non-linear) projective transformation that depends on 8 parameters: ( , ) = 0 + 1 + 2 1+ 1 + 2 ( , ) = 0 + 1 + 2 If the object is not flat within the template window, modelling the relationship between the two images requires knowledge of the object shape; for this purpose, a polynomial function can be applied: (2) DM, in particular, implements a second degree polynomial shape function (12 parameters overall). Using such a model on a real image pair of a curved object, a fourfold accuracy increase in DTM reconstruction for the polynomial model w.r.t. the affine one has been reported in (Bethmann & al., 2010). An evaluation of these extended models has been performed using synthetic and real images: the results are shown in paragraph 3.

DESCRIPTION OF TESTS
The evaluation of the alternative models has been carried out in terms of accuracy. The test scenarios have been organized The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-4, 2014 ISPRS Technical Commission IV Symposium, 14 -16 May 2014, Suzhou, China considering many processing variables and different set of images, in order to identify the difficulties in the matching. The shift parameters in the geometric model can be used to assess the quality of the estimate for they describe, for each functional model, the position of the template centre. In order to evaluate the LS system convergence behaviour, the standard deviations and mutual correlations of the geometric parameters have been investigated. Unlike topographic networks and well-designed photogrammetric blocks, where problems of convergence are rarely encountered, in LSM often the non-linear solving system might converge very slowly, oscillate, or converge to an incorrect solution.
The accuracy and robustness of the method has been evaluated changing a series of parameters (approximate starting values, functional models, weight of the observations within the template, template sizes, texture, etc.). The quality of the solution has been assessed by the number of iterations required to reach the stability of the solution, by the value of σ0 (a posteriori standard deviation of unit weight) and in terms of NCC (crosscorrelation coefficient) achieved by the solution.

Software-generated images
In the first tests, synthetic data have been used: artificial images have been created by rendering a virtual scene reproduced in a Computer Graphics (CG) 3D modelling software. Every 3D model was draped with a texture; then, virtual cameras, simulating the satellite sensor along different orbits, has been created and placed in the scene; finally, images taken by the cameras were generated and exported. Using such synthetic images and their known exterior and interior orientation parameters (provided without error by the 3d modeller), a point cloud is generated by image matching followed by triangulation (forward intersection). The reconstruction error in object space is evaluated point-wise as the distance of each matched point from the reference 3D model. In order to highlight the potentiality and the effective advantage of using alternative projective models, a series of objects with different curvature levels and different textures are chosen (see Figure 3). The first object (Crest) is characterized by a central crest with strong slopes. In this case, when the template is located in the middle of the crest (between the slope changing) the matching (how can be noticed in the Table 8-a) is critical into reach the right solution. The second object (Hills) is shaped by slight slopes. The curvature change gradually and this virtual object reproduces quite well the geomorphological feature that can be find on a hilly planetary area. The last object (Crater) should approximately reproduce a crater feature. In this case the changings in slope are quite important in terms of influence on the matching performance. In order to evaluate the matching accuracy in image space (i.e. the discrepancy between the matched points coordinates and the nominal values) the following steps were employed to compute the exact position of the corresponding points in (slave) image space:  The reference model has been interpolated on a regular grid;  The coordinates (in 3D object space) of each template centre are computed intersecting the DTM with the projection image ray;  The 3D point is then re-projected on the slave image by collinearity equations. In the case of Crater, the average value of the NCC is the lowest of the three cases; this suggests that the terrain feature is the most difficult to accommodate. The polynomial transformation provides the best result also in this case, while the affine model has the worst performance. (A) (H) (P) Figure 6: Crater test study. a) Statistics of matching results; b) Colour Map of the distances d for each shape model

Tests with real images
The tests with real images have been performed with two different goals: to test the alternative shape models also with real data, when 3D reference data were available; to continue (Re et al, 2012) the evaluation of the accuracy of 3D reconstruction of DM w.r.t. other well established software used in DTM generation from space images.

SVS stereo pairs for STC validation
The Stereo Camera (STC) of the SIMBIO-SYS imaging suite of the BepiColombo ESA mission to Mercury is based on an innovative and compact design in which the light collected independently by two optical channels separated by ±20° with respect to nadir falls on a common bidimensional detector (Naletto, 2012). This new stereo design of STC requires a specific calibration setup for the determination of the standard optical parameters (focal length, PSF, distortion map, ...) for both channels as well as a pre-flight assessment of the capability of STC of delivering 3D data from stereo images with the prescribed accuracy. To avoid the costs and the time necessary to build and fly on Earth an STC evaluation model and test its performance, a Stereo Validation Setup has been designed and realized (Figure 7). The main idea of the SVS is to perform in laboratory (indoor) the image acquisition process scaling-down the 3D surface reconstruction problem and applying it to a known target: from the on-flight observation of a planetary surface from an altitude greater than 400 km to the observation of a small rock sample at about 1 meter distance. This is achieved using an auxiliary optical system (a collimator) in the optical path. In order to get a simplification of the scaling problem and of the spacecraft (SC) orbit simulation, the case study is restricted to the reproduction of the STC operations at periherm. Before applying it to the actual flight model, the SVS concept has been tested using a functional breadboard, where the images of a series of rock samples have been acquired (Naletto et al, 2012) with two CCD cameras. In order to provide reference data to check the accuracy of the 3D reconstruction, a survey of the sample targets with a laser scanner with a vertical accuracy of 20 μm has been performed. This figure is about 10 times better than the (scaled) expected reconstruction accuracy from STC images: therefore, the laser 3D model of the samples can be used as reference. Figure 8 shows the stereo pair of one of the samples where a some markers (pins) have been placed to act as Ground Control Point (GCP) for the image pair orientation. They can in fact be easily recognized and measured in both the images and the laser point cloud; in addition, using such points automatically makes the coordinate reference system of the photogrammetric model coincident with the laser scanner coordinate system. This facilitates the comparison between the stereo DTM and the laser scanner DTM, because there is no need in principle for any surface-to-surface alignment between the two point clouds. Several DTMs of each target have been generated using Dense Matcher and varying some of the processing parameters: in particular the template size (T17: 17×17 pixels, T21: 21×21 pixels, T25:25×25 pixels), and the shape function in the LSM (Affine, Projective and Polynomial). In Figures 9 a zoomed area of the DTM obtained with a 25x25 template, with the Affine and Polynomial shape functions is shown. From a visual examination, the use of the Polynomial transformation model seems to improve the number and quality of details (i.e. the capability to extract finer details of the surface).
On the contrary, from a metrical point of view, the improvement in the implementation of a higher degree functional model is not so evident. The Affine seems to reach better results in terms of discrepancies with respect to the laser DTM (see Table 1.).

LROC NAC: Lunokhod1
High-resolution images obtained by the Lunar Reconnaissance Orbiter (LRO) have renewed the interest in the accomplishments of this historic first rover mission, as well as other Soviet-era mission. In the images of the LRO Narrow Angle Camera (NAC), the landed vehicle and the rover can clearly be identified, and rover wheel tracks along the traverse of the vehicle can be studied (Abdrakhimov et al., 2011). The LRO NAC images provide a significantly improved location for the lander and the rover. For example, the new data allowed the research group of DLR, in cooperation with Moscow State University (I. Karachevtseva et al., 2013), to reconstruct the rover mission and give new insights into the mission achievements. The selected stereopair (M150749234L/R -M150756018L/R) was acquired on January, 27 th 2011, and has an approximate image scale (GSD) of 0.50 m/pixel and a stereo angle of ca. 34.4°. The investigation aims to compare photogrammetric DTMs from three different software (DM, DLR-Vicar (Scholten, 2005) and ASP) with LOLA Data. The images are roughly aligned using ISIS3 cam2map program. The pre-processing step has been executed with the ASP procedure, which normalize the pixel values in the left and right images to bring them into the same dynamic range. The preprocessed images are then ready for the image-matching process implemented in DM and in the ASP pipeline. The comparison between the stereo DTMs and the laser altimeter data is delicate because the spatial resolution of the datasets are very different. As far as LOLA is concerned, the distance between the shots of each profile (one of the five that compose the parallel profiles along LRO's sub-spacecraft ground track) is separated by ∼56 m and the diameter of the spot is 5 meters (Smith, 2010). The spatial resolution for stereo imaging is higher and depends directly on the image resolution (∼ 60 cm for NAC acquisitions). Comparisons between different photogrammetric products (DM-VICR-ASP) and with laser data (LOLA) have been performed. The following table summarize the results:  Table 2: Comparisons with LOLA data using different shape function and different software in the Lunokhod1 case study.
From the comparisons with LOLA data, ASP seems to give the worst results, while a better correspondence between Vicar and DM is clearly visible. As far as the comparisons between DM (using different functional models) and LOLA data are concerned, the same behaviour seen with synthetic images investigations is found: the Polynomial model works better when the size of the template is bigger.

CONCLUSION
Many improvements have been made to the DM workflow: the matching process is now optimized and capable to manage the large amount of data of orbital-space images thanks to gridmatching and tiling, that have proved very promising in terms of computational load. From the results of the tests performed, we got positive answers to the two main requirements: operate very accurately and work with extremely large data volumes. As far as the use of different shape models in the LSM is concerned, the results are not clear-cut. In the synthetic images the polynomial model always gets better NCC values; however this does not necessarily translates into accuracy improvements that are substantial only in the case "Hills" and to some extent also in "Crater". With real images of the rock samples there is no strong evidence of improvements, so more tests are needed. The topic should be investigated thoroughly in the future and an adaptive matching algorithm that can switch between transformation models according to the ground shape features, or a filter that selectively find those pixels where discontinuities arise should be developed.