DATA FUSION OF LIDAR INTO A REGION GROWING STEREO ALGORITHM

Stereo vision and LIDAR continue to dominate standoff 3D measurement techniques in photogrammetry although the two techniques are normally used in competition. Stereo matching algorithms generate dense 3D data, but perform poorly on low-texture image features. LIDAR measurements are accurate, but imaging requires scanning and produces sparse point clouds. Clearly the two techniques are complementary, but recent attempts to improve stereo matching performance on low-texture surfaces using data fusion have focused on the use of time-of-flight cameras, with comparatively little work involving LIDAR. A low-level data fusion method is shown, involving a scanning LIDAR system and a stereo camera pair. By directly imaging the LIDAR laser spot during a scan, unique stereo correspondences are obtained. These correspondences are used to seed a regiongrowing stereo matcher until the whole image is matched. The iterative nature of the acquisition process minimises the number of LIDAR points needed. This method also enables simple calibration of stereo cameras without the need for targets and trivial coregistration between the stereo and LIDAR point clouds. Examples of this data fusion technique are provided for a variety of scenes.


INTRODUCTION
Stereovision and LIDAR remain the two most commonly used methods for 3D reconstruction.Stereo camera systems can provide dense 3D information, but image matching is a computationally complex problem and reconstruction coverage and accuracy suffer if the image does not have sufficient texture.LIDAR derived ranges are accurate and are not significantly dependent on surface texture, albeit they are dependent on avoiding specular reflection angles for certain types of shiny surfaces, but in most systems the laser must be scanned to build up a 3D image.This paper proposes a novel method for integrating LIDAR ranges into a region growing stereo matching algorithm, proposed initially by (Muller and Anthony, 1987).In particular, the image of the LIDAR spot on the scene is used to provide unambiguous seed points in regions where there is low texture.

Data fusion with stereo systems
In recent years, there has been a large amount of research into integrating additional sources of range information into stereo matching algorithms.Stereo data fusion algorithms generally attempt to solve the problem of ambiguous matches arising from homogenous or repetitive texture.Obvious candidate techniques for data fusion are those that actively sense the scene.
Most research has focused on the use of Time of Flight Cameras (ToFCs) (Lange and Seitz, 2001), (Foix et al., 2011), which are now available commercially at low cost (e.g.Microsoft Kinect for Xbox One).ToFCs offer active illumination and dense 3D imaging data in real-time.However, the usable range of a couple of metres depends on the modulation frequency of the illumination and performance outdoors, due to the use of near infrared illumination, is not guaranteed.ToFC range data tends to be relatively noisy with accuracy in the millimetre scale.The use-cases for these systems are dominated by robotics, where real-time imaging is desirable for navigation and mapping.

Broadly, there are two classes of fusion algorithm:
A priori methods using ToFCs have been used with most of the common state-of-the-art stereo matching algorithms including dynamic programming (DP) (Gudmundsson et al., 2008), graph cuts (Hahne and Alexa, 2008); (Song et al, 2011), belief propagation (Jiejie Zhu et al., 2011) and semi-global matching (SGM) (Fischer et al., 2011).ToF range data is used to overcome the limitations of stereo in homogenous image regions while stereo range data is retained near depth discontinuities.First, a range interval is obtained for every pixel in the ToF system.This range interval is then mapped to each pixel in the stereo system and used as a constraint on the matching algorithm, either limiting the disparity search space or as an adjusted matching cost function.
A posteriori methods combine two (or more) complete range images, ideally producing an image that is better than any of the inputs alone.(Kuhnert and Stommel, 2006) merge ToF data with results from winner-take-all (WTA) and simulated annealing stereo.ToF data is retained in homogenous regions where stereo fails and is also used to detect blunders in the disparity map.(Beder et al., 2007) use an approach based on patchlets (Murray and Little, 2005), rectangular surface elements defined at every pixel in the disparity image.In both cases the fused data are an improvement over stereo or ToF alone, but results are limited by the low resolution of the ToF cameras.

Data fusion of stereo with LIDAR
Comparatively little work has involved LIDAR at close range.(Romero et al., 2004) use a 2D scanning LIDAR to seed the disparity map in a trinocular stereo system.Initial disparity estimates are calculated using LIDAR ranges which are then propagated through the image based on a set of rules that The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-4/W5, 2015 Indoor-Outdoor Seamless Modelling, Mapping and Navigation, 21-22 May 2015, Tokyo, Japan consider image texture and horizontal and vertical pixel correspondences.Results from a standard stereo matcher are not provided, but the LIDAR does improve matching performance in homogenous image regions.(Badino et al., 2011) follow an approach largely similar to that used with ToFCs.A 3D scanning LIDAR is used to generate a minimum and maximum range map for the scene.This is then used to constrain WTA and DP stereo matchers.The presented scenes are all outdoors and the fused data contains significantly fewer blunders than stereo matching alone.
The technique presented in this paper departs from previous work.Using a visible LIDAR scanner, it is possible to perform data fusion at a lower level by imaging the LIDAR spot as it scans through the scene.

3D IMAGING SYSTEM
The imaging system used for this research consisted of a stereo camera pair and a single point LIDAR mounted on a gimbal platform, see Figure 1.
Figure 1 Custom 3D imaging system including scanning LIDAR and stereo camera pair.

Stereo system
The stereo system comprised two Imaging Source DMK23UM01 monochrome cameras with a resolution of 1280x960px, fitted with 8mm focal length lenses set to f/4.A custom stereo bar was built using a rail (Thorlabs XT95SP) with each camera mounted on a manual rotation stage (Thorlabs RP01) and carriage (Thorlabs XT95P11).

LIDAR and mount
The LIDAR used was a Dimetix FLS-C10 with a specified accuracy of ±1mm, repeatable to ±0.3mm on natural surfaces up to a distance of 65m.This sensor uses a visible (650nm) laser beam and can operate at up to 20Hz in its highest resolution mode.The LIDAR is mounted on a Newmark GM-12 gimbal mount with a specified resolution of 300µrad and a repeatability of 20µrad.

System geometry and model
The stereo system was modelled using the conventional pinhole camera model including radial and tangential lens distortion (Hartley and Zisserman,200).Camera intrinsic and stereo calibration was performed using the OpenCV (Bradski, 2000) library implementation of (Zhang, 2000) employing a planar chessboard target.This calibration is then used to epipolar rectify the images prior to stereo reconstruction.Calibration results are shown in Table 1.The geometric model for the LIDAR allows for systematic translational and rotational offsets, similar to (Muhammad and Lacroix, 2010).Five parameters are required: two for translational offsets orthogonal to the laser beam axis, two for rotational offsets with respect to the reported position of the mount and a distance offset.This geometry is shown in Figure 2. The conversion to Cartesian coordinates is as follows: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-4/W5, 2015 Indoor-Outdoor Seamless Modelling, Mapping and Navigation, 21-22 May 2015, Tokyo, Japan Note the use of a right hand coordinate system consistent with stereo imaging conventions -the z-axis represents depth, directed into the scene.Without loss of generality, the world origin is chosen to be the centre of projection in the left camera.

Cross-Calibration
The use of a visible laser allows for simple intrinsic and extrinsic calibration of the LIDAR.The LIDAR is scanned in a raster fashion across a scene.At each orientation, a stereo pair is synchronously acquired and the location of the laser spot in the epipolar-rectified images is determined using a maximum filter.
A short exposure time of 1/5000s is used to ensure that the image of the spot is not saturated.Thus two point clouds are produced: one from the LIDAR ranges and one from the stereo reconstruction of the LIDAR laser spot.
Assuming a good stereo calibration, the LIDAR intrinsic and extrinsic parameters with respect to the stereo system may be determined using least squares minimisation.The error to was to be the Euclidean distance between corresponding points in the stereo point cloud and the LIDAR point cloud.For a stereo reconstructed point  !,! and corresponding LIDAR point  !,! , the minimisation function is: Where  and  are a rotation and translation matrix bringing the LIDAR point onto the stereo point cloud and  represents the LIDAR intrinsic parameters.
It is expected, for the system described here, that the intrinsic parameters are close to zero.Initially,  and  are estimated using a rigid body transformation between the two point clouds since there is a one-one correspondence between them.After transformation, any points with a large distance error (further from the mean by more than two standard deviations) are discarded.This effectively removes most spurious correspondences, for example LIDAR points which are occluded in one image.With these points removed, the rotation and translation are estimated again to get initial values for the minimisation.The LIDAR intrinsic parameters are initialised to be zero.The rotation offsets  and  were both fixed to be zero for better parameter convergence.The calibration scene (Figure 3) was chosen to be a wall corner to avoid coplanar calibration points.There is no particular requirement for the target to be planar, however.

Calibration Parameter
Value Translation, T (mm) (-214.10, 287.55, 225.17The registration error was found to be (1.21±1.11mm).Some error is likely due to uncertainties in determining the laser spot location; further work will investigate ways to improve this.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-4/W5, 2015 Indoor-Outdoor Seamless Modelling, Mapping and Navigation, 21-22 May 2015, Tokyo, Japan

DATA FUSION ALGORITHM
The proposed algorithm operates at a lower level of fusion prior work.It aims to directly address the issue of poor stereo matching performance in low texture regions.

GOTCHA stereo matcher
The stereo matcher used is based on the Gruen-Otto-Chau Adaptive Least Squares Correlation (GOTCHA) algorithm, (Otto and Chau, 1989).The actual GOTCHA used here is based on (Shin and Muller, 2012) which is a 5 th generation version of the original.GOTCHA is a region growing stereo matcher taking as input a stereo pair and a list of initial, proposed correspondences or tiepoints.These tiepoints are either selected manually or are automatically generated using a feature detector such as SIFT (Lowe, 2004).
The algorithm uses Adaptive Least Squares Correlation (ALSC) (Gruen, 1985) to refine and determine correspondences to subpixel accuracy, providing a disparity estimate and a confidence score.If a tiepoint is successfully matched, its neighbouring pixels are added to a priority queue, sorted by match confidence.ALSC is performed on the neighbours of the highest confidence tiepoint and any matches are added to the tiepoint queue.The process iterates until the queue is empty.Thus the disparity is grown from the initial seed points, preferentially matching from the regions with highest confidence.

Proposed algorithm
Normally the initial seed points are generated using SIFT keypoint matching.This approach excels in regions of high image texture, but results are poor in low texture regions where there tend to be multiple keypoints with similar descriptors.
Since the LIDAR spot is visible in both images, it can provide unique stereo correspondences.Instead of using a feature detector like SIFT, the laser spot locations are used directly as unambiguous seed points for GOTCHA.Imaging matching is performed using GOTCHA and gaps in the disparity map may be filled using LIDAR data.
This approach has several advantages over previous work.
Compared to methods that use ToFCs, the LIDAR has higher accuracy and can be aimed with finer resolution.By using a visible LIDAR, cross calibration between the two coordinate systems becomes straightforward; this is particularly useful for gap filling, as the two point clouds are easily co-registered.
Compared to other data fusion techniques, this method requires relatively few points from the LIDAR.

RESULTS
A demonstration scene of a white-painted laboratory brick wall was chosen to be representative of the challenges involved with imaging indoor scenes.The left camera view is shown in Figure 6.The four kiln bricks were included to provide regions of high texture, in contrast to the wooden panel.Figure 7 shows the matched SIFT keypoints for the stereo pair.
As expected, there are very few keypoints located on the panel implying a very low texture region.Figure 8 shows the result from GOTCHA using these SIFT keypoints.Match performance is good on the kiln bricks, reasonable on the painted brick wall and poor on the flat wooden panel.
Figure 8 Disparity map using SIFT keypoints as seeds for GOTCHA stereo matching.
Figure 9 Disparity map using 30000 random LIDAR points as seeds for GOTCHA stereo The match results are clearly correlated with the location of the initial tiepoints.In particular the panel provides little texture for the matcher to grow into.
Figure 9 shows the same scene, but with 30000 randomly selected LIDAR points used as seeds.There is a 26.5% increase in the number of matched pixels and improvement is visible throughout the image.Performance is still degraded in low texture regions, such as the panel and between bricks, where there are isolated matched pixels from which the disparity could not be grown.These regions would be suitable for gap filling by scanning the LIDAR.

CONCLUSIONS AND FUTURE WORK
A novel data fusion algorithm has been presented, incorporating data from a visible scanning LIDAR into a region growing stereo matcher to produce denser disparity maps without loss of accuracy.This is achieved by using the image of the LIDAR laser spot to generate unambiguous seed points for the GOTCHA stereo matcher.Imaging the LIDAR spot also allows for straightforward target-less calibration of the LIDAR system and registration between the stereo and LIDAR point clouds.
Future work will look into improving cross calibration, in particular the possibility of calibrating camera intrinsic parameters using the LIDAR alone.Additionally, smarter methods of scanning the scene, based on pre-selecting areas of low image texture could focus the number of LIDAR points required for dense reconstruction on these areas where they will have the most impact.

Figure 2
Figure 2 System geometry.A: relationship between stereo cameras and LIDAR for a single world point.B: side view of LIDAR.C: overhead view of LIDAR.Systematic angular offsets are not shown.The model parameters are:  !-LIDAR measured range  !-Reported elevation  !-Reported azimuth  -Systematic range error ℎ !"" -Horizontal translation orthogonal to laser beam  !"" -Vertical translation orthogonal to laser beam  !-Systematic elevation offset  !-Systematic azimuth offset  !,  !,  !-LIDAR world coordinates

Figure 3
Figure 3 Epipolar-rectified left image of calibration scene, locations of calibration points are plotted.
) Rotation, R (deg) (-1.83, 0.29, 0.013) Distance offset,  (mm) 21.21 Horizontal offset, ℎ !"" (mm) 1.05 Vertical offset,  !"" (mm) 15.02 Table2Calculated LIDAR intrinsic and extrinsic parameters.Calibration parameters are given in Table2.Co-registered LIDAR and stereo points are shown in Figure4from an overhead perspective.The LIDAR points are visibly more tightly clustered than the stereo points.The registration error between the two point clouds is shown in Figure5.

Figure 4
Figure 4 Overhead view of calibration points from each measurement system.

Figure 5
Figure 5 Registration error between LIDAR and Stereo point clouds.

Figure 6
Figure 6 Example scene to be reconstructed.Left stereo image, epipolar rectified.

Figure 7
Figure 7 Matched SIFT keypoints, left stereo image, epipolar rectified.Figure7shows the matched SIFT keypoints for the stereo pair.As expected, there are very few keypoints located on the panel implying a very low texture region.Figure8shows the result from GOTCHA using these SIFT keypoints.Match performance is good on the kiln bricks, reasonable on the painted brick wall and poor on the flat wooden panel.