3 D-INFORMATION FUSION FROM VERY HIGH RESOLUTION SATELLITE SENSORS

In this paper we show the pre-processing and potential for environmental applications of very high resolution (VHR) satellite stereo imagery like these from WorldView-2 or Pléiades with ground sampling distances (GSD) of half a metre to a metre. To process such data first a dense digital surface model (DSM) has to be generated. Afterwards from this a digital terrain model (DTM) representing the ground and a so called normalized digital elevation model (nDEM) representing off-ground objects are derived. Combining these elevation based data with a spectral classification allows detection and extraction of objects from the satellite scenes. Beside the object extraction also the DSM and DTM can directly be used for simulation and monitoring of environmental issues. Examples are the simulation of floodings, building-volume and people estimation, simulation of noise from roads, wave-propagation for cellphones, wind and light for estimating renewable energy sources, 3D change detection, earthquake preparedness and crisis relief, urban development and sprawl of informal settlements and much more. Also outside of urban areas volume information brings literally a new dimension to earth oberservation tasks like the volume estimations of forests and illegal logging, volume of (illegal) open pit mining activities, estimation of flooding or tsunami risks, dike planning, etc. In this paper we present the preprocessing from the original level-1 satellite data to digital surface models (DSMs), corresponding VHR ortho images and derived digital terrain models (DTMs). From these components we present how a monitoring and decision fusion based 3D change detection can be realized by using different acquisitions. The results are analyzed and assessed to derive quality parameters for the presented method. Finally the usability of 3D information fusion from VHR satellite imagery is discussed and evaluated.


INTRODUCTION
Very high resolution (VHR) satellite missions like WorldView-2 or Pléiades allow the acqusition of (multi-)stereo imagery with ground sampling distances (GSD) of about half a metre.Sophisticated algorithms for dense image matching from such imagery allow the reconstruction of a detailled 3D point cloud from the acquired images with a vertical accuracy also in the range of one GSD.
Using the height-information together with the multi-spectral image data allows the classification, object-detection and extraction on a much higher level than from multispectral imagery alone.Since 3D data from optical imagery are always digital surface models (DSMs) this information is mostly useful especially in urban areas.In such areas the fusion of height-and spectral classification allows the potential detection of single objects like buildings, trees, roads and bridges and subsequently a more robust object based change detection in contrast to mostly used simple pixel based change detection algorithms.But also outside of cities the third dimension increases the worth of remote sensing data.Particularly height sensitive areas like mining or forest areas can be monitored more detailled.So the output of a mine can easily be calculated by measuring the volume differences of the mining pit and the waste dump as we show later.For deep mining or other underearth activity also the volume change of detected waste dumps can be very valuable.
In this paper we describe the preprocessing steps necessary for generating elevation and classification data which can be used in turn for object detection, change detection and monitoring.For this first a DSM has to be derived from the (multi-)stero data.The resulting 3D data have to be fused with the original imagery to reduce errors and outliers.Afterwards the digital terrain model (DTM) representing the ground level and the normalized digital elevation model (nDEM) containing the height of all elevated objects above the DTM are derived.
In parallel some multi-spectral classifications are executed, espacially the detection of vegetation, water and shadows.Afterwards all available data are fused for different purposes.If more scenes are available also all height-and spectral classified data can be fused for detailled and stable change detection and monitoring applications.
After an overview of the methods used we show in a few examples the monitoring from GeoEye and WorldView-2 data and the application of such data for change detection in an open pit mining area.In a second experiment we show 3D building detection in Ikonos summer and winter scenes and in a third experiment we use panchromatic Cartosat-1 stereo data showing a change detection application in forest areas.Finally a short conclusion on using fused multi-spectral VHR stereo data on monitoring applications from space closes the paper.

Preliminary work
This work is based on many algorithms and methods already published in other places.So the method used for dense stereo matching called Semi global matching was already described in principle in Hirschmüller (2005) andrefined in dAngelo et al. (2008) to generates DSMs from space-borne data.The extraction of DTMs from such DSMs uses approaches as described in Arefi et al. (2009) or Krauß et al. (2011).The fuzzy-classification based on derived top-of-atmopsphere (TOA) ortho images is described in Krauß et al. (2012).An additional method used here is the shadow detection shown by Makarau et al. (2011).The basic approach for change detection used in this work is the robust change detection method described in Tian and Reinartz (2011) and Tian et al. (2014).

DSM generation
The VHR sensors used here have already a very good absolute image positioning of e.g. about 4 m for WorldView-2 only requiring a slight shift for absolute geocorrection.But dense stereo matching requires a relative correlation between the stereo images of less than a pixel.To achieve this first good correlation points are derived automatically from the two stereo images and the shifts are applied to the sensor model of the images.
For reconstructing the 3D scene, we use dense stereo matching to make use of all the image information and compute a depth for each single image pixel.As for each image pixel multiple depth hypotheses need to be tested, a computational efficient cost function is needed for the image matching, which additionally has to be robust to potentially strong radiometric and perspective changes between the images.As the pure data term of the cost function is still prone to noise and performs poorly over large disparity ranges and homogeneously textured regions, a simple pixel-wise Winner-takes-all strategy solely based on the image matching function does not yield acceptable results, which is why we add additional smoothness constraints and solve the resulting optimization problem.
The Census transform CT (Zabih and Woodfill (1994)) encodes the local image structure within a small patch around a given pixel.It is defined as an ordered set of comparisons of intensity differences and therefore invariant to monotonic transformations which preserve the local pixel intensity order: with the concatenation operator and an ordered set of displacements D ⊂ R 2 , which is chosen to be a 7×9 window due to its simple fitting into 64 Bit variables.The matching cost of different Census vectors s1, s2 is then computed as their Hamming distance dH (s1, s2) -number of differing bits -where highest matching quality is achieved for minimal Hamming distance, and the costs are normed to the real-valued interval [0, 1] Using such a 7 × 9 support window for the Census transform increases the robustness of the matching function against mismatches, especially when searching through a large disparity range as is very common in remote sensing data.
In the resulting disparity space image (DSI) we now search for a functional u(x) (the disparity-or height-map), which minimizes the energy function arising from the matching costs (called data term E data ), plus additional regularization terms.As the data term is prone to errors due to some incorrect and noisy measurements, one often needs some smoothness constraints E smooth , forcing the surface of the disparity map to be locally smooth.

u(x) = argmin
u Ω This energy is non-trivial to solve, since the smoothness constraints are based on gradients of the disparity map and therefore cannot be optimized pixelwise anymore.Because of its simple implementation, low computational complexity and good regularization quality, we use semi-global matching (SGM) for optimization (Hirschmüller, 2005), minimizing the following discrete energy term The first term is the matching cost, the second and third term penalties for small and large disparity discontinuities between neighboring pixels p ∈ Nx.The key idea is now not to solve the intractable global 2D problem, but to approximate it by combining 1D solutions from different directions r, solved via dynamic programming The aggregated cost Lr(x, u(x)) in Equation ( 6) represents the cost of pixel x with disparity d = u(x) along the direction r and is being computed as The penalties are set to default values P1 = 0.4 and P2 = 0.8 (with the raw costs normed to [0, 1]).For deeper insights the reader is referred to the paper of Hirschmüller (2005).

DSM correction and Ortho projection
The such derived disparity map d now fits exactly on the first image I. Missing values due to occlusion or mismatches can be filled using the spectral similarity of neighbouring pixels in the pan-sharpened first image Ips with valid disparities as shown in fig. 1. Afterwards using the sensor-model -in the cases of VHR satellites mostly rational polynomial coefficients (RPCs)a height map h can be calculated by converting the disparities d of each pixel to true ellipsoidal heights.
Applying the filled height map h to the original imagery Ips using the sensor model delivers the ortho images Io and applying h to itself the correctly geocoded DSM D as shown in figs. 2 and 3).

Change detection
Knowing all heights and class maps for scenes of two or more epochs allow the application of the robust difference method as described in Tian et al. (2014) for detecting height changes.The robust difference between the initial DSM D1 and the second DSM D2 for the pixel (x, y) can be defined as the minimum of differences computed between the pixel D2(x, y) in the second DSM and a certain neighborhood (with window size 2w + 1) of the pixel D1(x, y) in the first DSM D1.The robust positive and negative differences X P dif (x, y) and X N dif (x, y) relative to the pixel (x, y) are defined in equations 8 and 9, respectively:  Eqs. 8 and 9 say that only the minimum value (greater than zero) in case of a positive change, or the maximum value in case of a negative change is taken, all within the defined window size.
Typically window sizes used are 3 × 3 pixels up to 7 × 7 pixels depending on the DSM qualities and the difference in resolution between the two available DSMs.
To eliminate noise these robust height changes are filtered afterwards using morphological closing and opening operations.Also for a more accurate change detection of buildings the changes are masked using the shadow mask and other land cover masks as described above.Combining the classification results (vegetation /non-vegetation and high/low masks) with the detected changes allow for a more detailled classification of the changes.

WorldView-2 -GeoEye
In this experiment we use two VHR stereo scenes acquired on 2011-03-24, 10:58 (WorldView-2) and 2012-08-12, 10:39 (Geo-Eye) over Jülich.This area contains the city of Jülich, Germany and also the largest open pit mining area for lignite in Germany.Based on these stereo images first the DSMs D1 and D2 were derived as described in 2.1.Afterwards the DTMs T1 and T2 together with the nDEMs N1 and N2 and the pansharpened TOA ortho images Itoa,j and all masks mi,j (i ∈ {h, s, v, b, w}, j ∈ {1, 2}) as described in 2.3 are calculated.Fig. 2 showed already the ortho image and fig. 3 the DSM of the GeoEye scene of this area.
Calculating the robust difference and the appropriate morphological filtering gives the changes as shown in fig. 5.The two scenes were acquired with a distances of 506 days.So in tab. 2 first the reference amounts per year have to be scaled to this time distance.Afterwards the amount is converted to volumes and compared to the measured volumes from the 3D change detection between the two scenes.Finally we see that the measured volume changes from the described automatic change detection method comply very well with the amounts Wikipedia claims without any reference for this mine.Applying a mean-shift over-segmentation to the ortho imagery and a region merging on the combination results in a land cover segmentation using only one pan-chromatic band (Tian et al., For the change detection between the two Cartosat images the overall accuracy is 0.9945 and the kappa index 0.5879.

RESULTS AND DISCUSSION
Fusing height information with multispectral classification allows more detailled monitoring and better change detection.The results from the above experiments show very promising results of the proposed processing chain.The mining volumes are derived with an accuracy of about 4.5 % for the waste and 36 % for the digged coal.Also the accuracies calculated against manually derived change masks of the multi-season Ikonos building changes and the forest changes are very promising with accuracy values larger than 99 %.
The main problems found can be traced back to mismatches in the DSM generation process and problems in the spectral classification -especially in water and shadow areas.The outliers in the unfilled DSM propagate in the DSM filling process often into larger areas destroying the height mask for buildings.More sophisticated DSM correction methods have to be investigated to create more stable and reliable DSMs.Also better DSMs can be generated using three or more stereo images.First experiments with multi-stereo WorldView-2 and Pléiades imagery show that a third image improves the quality of the DSM drastically while a fourth or fifth image lead only to minor further improvements.Also scenes acquired in the same season lead to a more consistent classification for the change detection.Using the method allows a potentially fully automatic change detection which is much more sophisticated than only image based methods as can be seen in the Ikonos summer/winter example.Also the monitoring of mining activities, illegal timber logging or monitoring the evolution of large cities are a good application area for the presented method.
But nevertheless the main disadvantage of the method is the need of pre-and post-elevation information.In most cases like disasters no pre-disaster stereo imagery or DSMs of the required resolution and region exists.
Unfortunately the Sentinel missions do not support stereo acqusitions.So we propose an additional sentinel mission covering the whole earth permanently like the indian Cartosat-1 (IRS-P5) mission with a two-or even better a three-line-stereo-scanner (like ALOS-Prism) at a ground sampling distance of the stereo scanners of at least 2.5 m and also a nadir looking multispectral VNIR scanner with also about 2.5 m GSD or better.

Figure 5 :
Figure 5: Volume changes between the WorldView-2 and GeoEye scene, 16 km × 9 km, red: negative changes, green: positive changes (0-60 m)For a more detailled view fig.6shows a small (1100 m × 450 m) section with a newly erected building in the left top edge (in green) and a cut down forest in the right bottom (in red).The

Figure 6 :
Figure 6: Volume changes between the WorldView-2 and GeoEye scene, small section 1100 m × 450 m, top left: newly erected building, bottom right: cut down forest In the left bottom edge of fig. 5 the open pit mining area "Braunkohletagebau Inden" is visible.A positive or negative height change from any kind of vegetation to non-vegetation allows us to calculate the mined volumes.For this the positive (green) and negative (red) volumes of fig. 5 are calculated as 79.87 and 122.76 millions of cubic metres.The negative changes are the whole digged out volume containing the lignite and also the mine wastes.The positive changes are however only the dumped mine wastes.The mine wastes consist only of earth and stone with a density of ρ soil = 1500 kg/m 3 while the lignite has a density of ρ lignite = 1000 kg/m 3 .This allows us to convert the volume changes to mass changes as shown in tab. 2. Table 2: Validation of 3D change detection results for the opencast mining Inden (left bottom mining area); reference data from http://de.wikipedia.org/wiki/TagebauInden Value Mine wastes Digged lignite Reference amount per year 80-85 Mio t/a 20-25 Mio t/a Amount between images (506 days) 111-118 Mio t 28-35 Mio t Volumes (m/ρ) 74-79 Mio m 3 28-35 Mio m 3 Measured volume changes 80 Mio m 3 43 Mio m 3

3. 2
Ikonos building detection in summer and winter scenes Our second example shows the building change-detection in different seasons of two Ikonos scenes from Dong-An (North-Korea) acquired 2006 and 2010 (snow covered) as shown in fig.7 together with the derived DSMs in fig.8.By applying the change detection method described in 2.4 masking out only buildings following 2.3 a change probility map shown in fig.10, left can be derived.Cross checking the results with a manually created reference change mask and thresholding the changes allows for a evaluation of the results (see fig.9for explanation of the colors).The overall accuracy for this example is 0.9920 while the kappa index of agreement is 0.7502.All qualitiy parameters of this evaluation are calculated pixel wise: true positives (TP): green, pixels found by algorithm and also in reference

Figure 10 :
Figure 10: Detected changes in the Ikonos stereo DEMs of Dong-An, left: change probability (in %), right: change mask (green: true detected changes, blue: missed changes, red: erroneously detected changes)

Figure 12 :
Figure 12: DSMs created from the Cartosat stereo images of a forest area near Oberammergau 2008 (left) and 2009 (right), heights in [m] The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W3, 2015 36th International Symposium on Remote Sensing of Environment, 11-15 May 2015, Berlin, Germany

Table 1 :
Possible changes depending on classified areas, empty fields mean "no change", v and v denote vegetation or nonvegetation, bld.stands for building low → low low → high high → low high → high v → v trees growed trees logged v → v harvested bld.erected trees logged trees → bld.v → v planted trees growed bld.demolished bld.→ trees v → v bld.erected bld.demolished window around (x, y).