FUSION OF AIRBORNE AND TERRESTRIAL IMAGE-BASED 3 D MODELLING FOR ROAD INFRASTRUCTURE MANAGEMENT – VISION AND FIRST EXPERIMENTS

In this paper we present the vision and proof of concept of a seamless image-based 3d modelling approach fusing airborne and mobile terrestrial imagery. The proposed fusion relies on dense stereo matching for extracting 3d point clouds which – in combination with the original airborne and terrestrial stereo imagery – create a rich 3d geoinformation and 3d measuring space. For the seamless exploitation of this space we propose using a new virtual globe technology integrating the airborne and terrestrial stereoscopic imagery with the derived 3d point clouds. The concept is applied to road and road infrastructure management and evaluated in a highway mapping project combining stereovision based mobile mapping with high-resolution multispectral airborne road corridor mapping using the new Leica RCD30 sensor.


INTRODUCTION 1.1 Background and motivation
Modern road infrastructure management depends on accurate, reliable and up-to-date geoinformation which is increasingly gathered using mobile sensors and platforms.First experiments in video and image based road navigation and infrastructure management date back over 30 years (Lippman, 1980).And first experimental mobile mapping vehicles relying on stereo imagery were developed some 20 years ago (Novak, 1993;Schwarz et al., 1993).However, over the last decade mobile LiDAR became the predominant 3d mobile mapping technology.Despite its benefits, LiDAR data remains difficult to handle and to interpret by non-geospatial professionals such as domain experts in road planning and management.They often prefer imagery over point clouds or ask for co-registered imagery complementing the LiDAR data.Over the last few years image-based 3d mobile mapping has been experiencing a revival.This is mainly due to some dramatic progress in imaging sensors, onboard data storage and imaging algorithmsnamely dense stereo and multi-image matchingas well as in distributed and parallel computing technologies such as High-Performance Computing (HPC) and Cloud Computing.All these developments enable new and very powerful image-based stereovision mobile mapping solutions.In parallel to these trends in mobile mapping we see the emergence of mediumformat airborne imaging sensors capable of capturing very high resolution multispectral imagery at high data rates.They permit photogrammetric flights with highly overlapping imaging patterns which again favour dense image matching algorithms.One of the first examples of such a new sensor is the Leica RCD30.It provides 60 MP imagery in RGB and NIR, FMC and high data capture rates making it an ideal sensor for road corridor mapping.Last but not least, we also observe progress in the (web-based) exploitation of airborne and terrestrial imagery.Google and Microsoft, for example, recently integrated (monoscopic) oblique airborne and terrestrial geospatial imagery, including vehicle-based panoramic imagery, into their map portals.Furthermore, with the emergence of web-based 3d graphics standards such as WebGL, they have started to employ image warping to support dynamic transitions between airborne and terrestrial imagery.However, these solutions currently provide neither real (stereoscopic) 3d visualisation nor accurate 3d measurements.

Road infrastructure management: characteristics and requirements
Road infrastructure management encompasses a wide spectrum of tasks and activities which are increasingly supported by 3d geodata and 3d geoinformation systems.With the introduction of accurate and highly automated 3d mobile mapping technologies, detailed 3d digitisations of the road environment are becoming available.These high fidelity digital representations of the road environment have triggered an actual paradigm shift in which a large part of the inspection and measurement tasks no longer have to be carried out in the field but can be performed at the desk of the different domain experts.This leads to a significant increase in productivity and to a significant reduction of safety hazards and traffic obstructions.
Typical road management tasks which could be supported by dense mobile mapping data range from visual inspection, simple measurements (e.g.distances or height differences), assessment of road surface irregularities, road profile extraction, road sign management to noise mitigation planning or road verge / nature strip management.The focal points of these diverse tasks vary accordingly.They include the actual road surface, road signs and gantries, safety barriers, nonbuilding structures such as bridges or tunnels, embankments, drainage as well as low-and high-growing vegetation (see Figure 1).Depending on the task, not only the focal point but also the preferred viewing direction may be different as is shown by the following examples: pavement (vertical), road signs (horizontal, along-track), traffic barriers (horizontal, cross-track) or embankment ('over-thehorizon' / vertical).

Goals and structure of this paper
The paper aims at demonstrating the practical feasibility of an interactive and accurate 3d geoinformation environment for road infrastructure management relying entirely on multiray and stereo imagery (and on their derivatives such as fully textured 3d point clouds).The main purpose of the investigations was to identify the typical requirements, the technical and operational challenges in establishing and exploiting such an environment and the limitations of the approach.
First we introduce our proposed solution of an interactive 3d information environment integrating high-resolution groundbased multi-stereo imagery with high-resolution airborne imagery.We then identify the main challenges and requirements to be met by such an integrated solution through all process phasesfrom the image acquisition and 3d information extraction to the cloud-based exploitation.We then introduce technologies for the acquisition and processing of the ground-based and airborne imagery and for their subsequent integration.In the following section of the paper the concept and the technologies are applied to and evaluated in a highway mapping project in Switzerland.The paper is concluded with first results and a discussion of future work and challenges.

Overview
We propose a seamless image-based 3d visualisation and 3d measuring space integrating very high-resolution airborne imagery and mobile multi-stereo imagery.The solution employs airborne 3d views and terrestrial 3d views and complements these views with dense, fully textured 3d point clouds derived from the airborne and ground-based imagery.This combination of 3d views and 3d point clouds provides a permanently threedimensional visualisation and measurement space and permits seamless transitions between airborne and ground-based 3d views via freely navigable 3d point clouds.
Thus, the solution offers a number of horizontal and vertical viewing perspectives with accurate 3d measurement capabilities either by means of stereoscopic digitising or by 3d monoplotting.The incorporation of dense 3d point clouds which are derived from the identical imagery and which therefore possess perfectly co-registered RGB or even RGB & NIR texture provides a number of potential benefits over exclusively image-based 3d models.For example, it enables the inclusion of dense road surface models or the direct extraction of road profiles and their inspection within the 3d visualisation environment.It furthermore frees users from the original viewing geometries and adds a greater freedom of navigation within the 3d environment.

Challenges and requirements
In order to obtain such a dense, accurate and interactive imagebased 3d visualisation and 3d measurement space the following major challenges and requirements have to be met:  Acquisition -Provision of a very high-resolution coverage of the road surface itself with a GSD ≤ 1 cm and a highresolution coverage of the entire road corridor, including 100-200 metres on either side of the road axis, with a GSD of ≤ 5 cm. Georeferencing / Co-Registration -Accurate georeferencing and in particular highly accurate co-registration of airborne and terrestrial imageryleading to accurately coregistered derived 3d data. Extraction -Automatic extraction of dense depth informationand subsequently 3d point cloudsfrom the airborne and terrestrial imagery respectively.The dense depth information is to support accurate 3d monoplotting and object extraction in image space on the one hand and a greater freedom of user navigation within the geospatial 3d environment on the other. Integration and Exploitation -All the above mentioned geospatial data, i.e. georeferenced airborne and terrestrial stereo imagery, 3d point clouds, as well as derived products such as orthoimagery, DSM and 3d objects need to be integrated into a suitable 3d software environment permitting the interactive exploitation of the rich 3d road scene.
The following three technologies are subsequently used to demonstrate and validate the proposed integrated airborne and terrestrial image-based 3d road infrastructure management approach:  ground-based image acquisition and processing: IVGI stereovision mobile mapping system and stereovision processing and exploitation software  airborne image acquisition and processing: Leica RCD30 and Leica FramePro  integration and exploitation: OpenWebGlobe 3d virtual globe technology

Stereovision based mobile mapping system
For our research we use the IVGI stereovision mobile mapping system which is being developed since 2009 as part of the SmartMobileMapping research project.The system was originally intended for road sign management and has since been developed into a multi-stereo mobile mapping system for a wide range of applications.The system consists of an Applanix POS LV 210 navigation system which is used to directly georeference the digital industrial cameras.Typically, the system is configured with multiple stereo camera systems with sensors of two (FullHD) and eleven megapixels respectively.All systems use GigE cameras with CCD sensors, global shutters, 12 bit radiometric resolution and a pixel size larger than 7 µm.The cameras are equipped with wide-angle lenses providing a large field of view of around 80 degrees and still preserving a high geometric accuracy.The sensors are mounted on a rigid platform and can be setup in various configurations with stereo bases of up to 1.5 m.Depending on the mapping mission the sensors are operated at 5 to 30 frames per second, leading to dense stereo image sequences and raw imagery in the order of one to several TB per hour.
Figure 2. IVGI stereovision mobile mapping system, configured with a forward and a backward looking stereo system and a downward looking profile scanner

Stereovision processing and exploitation software
As part of the SmartMobileMapping project a comprehensive processing and exploitation pipeline (see Fig. The introduced stereovision based mobile mapping enables absolute 3d point accuracy of 3-4 cm (1 sigma) under average GNSS conditions (Burkhard et al., 2012).Relative measurements within a single stereo frame or between points in neighbouring frames of the image sequence are better than 1cm.

Leica RCD30 multispectral camera
For the airborne image acquisition, a Leica RCD30 camera was used.The RCD30 is a four-band (RGB and NIR) medium format camera consisting of a single lens and two frame sensors behind a dichroic beam splitter (Wagner, 2011) (see Figure 4). A high frame rate of min. 1 sec providing high image overlaps at typical low flying altitudes for corridor surveys. A co-registered NIR channel enabling numerous automated classification and object extraction tasks.

Calibration:
The Leica RCD30 data can be calibrated in two different ways.The first way is based on a bundle adjustment calculation with an image block from a dedicated calibration flight whereas the second way is based on a laboratory calibration with specially designed equipment and software.Both ways lead to the same set of calibration parameters.The RCD30 calibration process is described in detail in Tempelmann et al. (2012).

Creation of distortion free multi-band images: Leica
FramePro rectifies the raw images to distortion free multi-band images with a nominal principal distance, using an equidistant grid of distortion corrections.In addition, a principal point offset correction and the mid-exposure FMC-position from the image header are applied.FramePro supports parallelization by means of OpenMP.

Dense stereo matching / 3d point cloud extraction:
In our research a prototype implementation of Leica XPro DSM was used for extracting fully textured dense 3d point clouds of the road corridor.XPro DSM is based on Semi-Global Matching (SGM) which was originally developed by Hirschmüller (2008).SGM was first adapted to the ADS line geometry (Gehrke et al., 2010) and has since been modified for frame sensors.The core algorithm of SGM aggregates the matching costs under consideration of smoothness constraints.
The minimum aggregated cost leads to the disparity map for a stereo pair and subsequently to textured 3d point clouds in object space.

Georeferencing and co-registration of ground-based and airborne imagery
If ground-based and airborne imagery and their derived products such as 3d point clouds are to be exploited within an integrated environment, georeferencing and co-registration of both data sets plays an important role.For our first experiments, the following georeferencing strategy was applied:  INS/GNSS-based direct georeferencing of the ground-based stereo imaging sensors. Measurement of 3d control point coordinates (e.g. for road markings) in the ground-based stereovision software environment using the multi-image matching tool (Eugster et al., 2012;Huber, Nebiker, & Eugster, 2011). INS/GNSS-based direct georeferencing of airborne camera data using Leica IPAS TC for tightly coupled processing. Introduction of the control points into an integrated bundle block based on directly georeferenced orientation data, adjusted using ERDAS ORIMA.
For optimal georeferencing results, ground control coordinates would normally be established based on tachymetric or GNSS survey measurements in the local reference frame.However, these measurements were not yet available for these early investigations.

OpenWebGlobe
The ultimate goal of this project is to incorporate all original and derived data from the airborne and ground-based imaging systems into a single interactive web-based 3d geoinformation environment.Such a software environment would need to provide a fully scalable support for: orthoimagery, digital terrain and surface models, dense and fully textured 3d point clouds, 2d and 3d vector data and most of all perspective imagery, both stereoscopic and monoscopic with dense depth data.Nebiker et al. (2010) proposed the use of a virtual globe technology for integrating all these data types and for fully exploiting the potential of such image-and point cloud-based 3d environments.
In our research we use the OpenWebGlobe virtual globe technology (www.openwebglobe.org).The OpenWebGlobe project was initiated by the Institute of Geomatics Engineering of the FHNW University of Applied Sciences and Arts Northwestern Switzerland (IVGI).It started in April 2011 as an Open Source Project following nearly a decade of 3d geobrowser development at the Institute.OpenWebGlobe consists of two main parts: first, the OpenWebGlobe Viewer SDK, a JavaScript Library which allows the integration of the OpenWebGlobe into custom web-applications.Second, the OpenWebGlobe Processing Tools, a bundle of tools for HPCand cloud-based bulk data processing, e.g.tiling or resampling of large geospatial data sets.This pre-processing is required by the viewer part to enable fragment-based, streamed download and visualization of data (Christen & Nebiker, 2011;Loesch, Christen, & Nebiker, 2012).

Overview
The test area consists of a 22 km highway section (A1 between Zurich and Baregg) with 3 to 5 lanes per driving direction.The terrestrial imagery was acquired on the 24 th of September 2012 in both driving directions using the IVGI stereovision mobile mapping system of the FHNW Institute of Geomatics Engineering.For this specific mission, the system featured a forward looking stereo configuration with 11 MP sensors and a sideways looking stereo configuration with full HD sensors.It was operated with 5 fps at a driving speed of 80 km/h resulting in stereo frames every 5-6 metres.The data was processed using the stereovision processing pipeline presented in Section 3.2.
The same highway section was mapped on the 28 th of September 2012 using a Leica RCD30 camera on a Pilatus Porter PC-6.Due to the winding character of the road 16 flight lines with a total of 637 images were flown.The imagery was acquired with a NAG-D lense with a focal length of 53 mm at a flying height of 400 m AGL and an image overlap of 80%.The resulting GSD of the RGBN imagery is 5 cm.The airborne data was processed using Leica FramePro and Leica IPAS TC.

Airborne and ground-based imagery
The following figures show the airborne and ground-based imagery of a highway section with a bridge crossing overhead.The figures nicely illustrate the complementary character and perspectives offered by the airborne and ground-based imagery.

3d point measurement accuracy comparison
In order to get a first assessment of the interactive 3d point measurement accuracies within the ground-based and the airborne stereo imagery, 40 well-defined road markings were interactively measured in both data sets.Based on earlier investigations and experience, the a priori point measurement accuracy should be in the range of:  approx.3-4 cm in X and Y and 2-3 cm in Z for the ground based stereovision data (Burkhard et al., 2012) and  0.5-1.0pixels, i.e. 3-5 cm, in X and Y and 0.1-0.2 0/ 00 h G or 1-2 pixels, i.e. 4-10 cm, in Z for the airborne imagery The analysis of the coordinate differences for the 40 points yielded standard deviations of the differences of approx.5 cm in X and Y and better than 10 cm in the vertical direction.Assuming similar planimetric accuracies for both systems this leads to a point coordinate accuracy of 3.5 cm in X and Y (for each system).The standard deviation of the vertical differences is also consistent with the a priori values for the Z component.
These first investigations also revealed some systematic differences between ground-based and airborne coordinate determination in the order of 10 cm in planimetry and 10 cm in height for each driving direction, i.e. for each ground-based trajectory.This is consistent with the expected direct georeferencing accuracy of the ground-based system in the challenging urban environment of the tests.In subsequent experiments, the georeferencing approach described in 5.1 will be modified and very likely improved, by co-registering the ground-based imagery to the airborne imagery using the integrated georeferencing approach described in Eugster et al. (2012).

Accuracy of extracted 3d point clouds
Dense 3d point clouds were extracted for both the ground-based and airborne imagery using the dense matching algorithms and tools discussed in sections 3.2 and 4.3.2. Figure 7 shows the left part of a raw depth map extracted from the corresponding stereo pair and overlaid with the left stereo partner.A postprocessed version of this depth map is also used for 3d monoplotting (see Figure 6).Textured 3d point clouds can easily be derived from these depth maps by projecting the image and depth information into object space.

CONCLUSIONS AND OUTLOOK
The combination of high-resolution airborne and ground-based imagery and their integration into predominantly image-based 3d modelling and 3d geoinformation services provides a powerful solution for future road infrastructure management.
We showed that the automatic extraction of depth maps and subsequent dense 3d point clouds from both types of images not only enables simple and efficient 3d measurement tasks through 3d monoplotting.The combination of highly detailed 3d imagery and fully textured 3d point clouds will also enable highly interactive and rich 3d geoinformation environments.
Figure 8 shows a web-based OpenWebGlobe scene with a first integration of perspective 3d imagery with airborne and groundbased 3d point clouds.Ongoing and future work includes the further improvement of the co-registration of ground-based and airborne data by means of integrated georeferencing.Among the ongoing investigations and developments in the OpenWebGlobe project is the incorporation of full-resolution airborne and ground-based 3d imagery with 3d measuring capabilities (prio 1: 3d monoplotting; prio 2: stereoscopic integration).There are also plans to incorporate oblique airborne imagery with the special challenge of ensuring 3d views and accurate 3d measurement capabilities for most imagery.

Figure 1 .
Figure 1.Vision-based road infrastructure managementtypical focal points and objects of interest.

Figure 3 .
Figure 3. Stereovision processing pipeline and workflow for the mobile ground-based multi-stereo imagery

Figure 4 .
Figure 4. Leica RCD30 with OC52 Operator Control and CC32 Camera Controller with GNSS/IMU The following features make the RCD30 particularly interesting for this type of road corridor survey:  A 60MP single camera head delivering high-resolution coregistered, multispectral RGBN imagery. A mechanical Forward Motion Compensation (FMC) along two axis allowing proper operation also for large drift angles.

Figure 5 .
Figure 5. Extract from RCD30 RGB imagery showing the identical location as Figure 6 (mapping vehicle was driving in the middle lane with manholes visible on either side).

Figure 6 .
Figure 6.Mono client with ground-based imagery, and overlaid 3d vector data (left), map interface (bottom right) and feature editor (top right)

Figure 7 .
Figure 7. Left stereo normal image with overlaid dense depth map (shown in left half of the image) An initial accuracy evaluation of the extracted 3d point clouds was performed by using four planar patches on the road surface as reference surfaces.These test areas were extracted from the following 3d point clouds:  3d point cloud derived by projecting the depth map of a single stereo frame into object space (ground-based raw)  3d point cloud derived by fusing the depth maps of multiple stereo frames and by projecting the interpolated depth map into object space (ground-based interpolated)  3d point cloud derived from an airborne stereo image pair (airborne)

Figure 8 .
Figure 8.First integration of perspective ground-based imagery, ground-based 3d point cloud (coloured) and airborne 3d point cloud (shown in white and grey) in OpenWebGlobe.

Table 1
shows the typical point densities of 3d point clouds extracted from the ground-based and the airborne imagery.The table also shows the respective standard deviations and max.differences from a plane fitted through the point clouds covering the four test patches with an area of ~ 22 m 2 per patch.The preliminary results of the ground-based and airborne 3d point cloud extractions yield good SDs in the order of 1 pixel or less, i.e. < 1 cm in the ground-based and < 5 cm in the airborne case.

Table 1 .
Typical point densities of different point cloud data sets together with their accuracies (standard deviations and max difference from a plane fitted into the respective point cloud)

Table 2
illustrates the complimentary character of ground-based and airborne stereo and multiray imagery for different road infrastructure management tasks:

Table 2 .
Suitability of ground-based and airborne stereo imagery for road infrastructure management