MERGING DIGITAL SURFACE MODELS IMPLEMENTING BAYESIAN APPROACHES

In this research different DSMs from different sources have been merged. The merging is based on a probabilistic model using a Bayesian Approach. The implemented data have been sourced from very high resolution satellite imagery sensors (e.g. WorldView-1 and Pleiades). It is deemed preferable to use a Bayesian Approach when the data obtained from the sensors are limited and it is difficult to obtain many measurements or it would be very costly, thus the problem of the lack of data can be solved by introducing a priori estimations of data. To infer the prior data, it is assumed that the roofs of the buildings are specified as smooth, and for that purpose local entropy has been implemented. In addition to the a priori estimations, GNSS RTK measurements have been collected in the field which are used as check points to assess the quality of the DSMs and to validate the merging result. The model has been applied in the West-End of Glasgow containing different kinds of buildings, such as flat roofed and hipped roofed buildings. Both quantitative and qualitative methods have been employed to validate the merged DSM. The validation results have shown that the model was successfully able to improve the quality of the DSMs and improving some characteristics such as the roof surfaces, which consequently led to better representations. In addition to that, the developed model has been compared with the well established Maximum Likelihood model and showed similar quantitative statistical results and better qualitative results. Although the proposed model has been applied on DSMs that were derived from satellite imagery, it can be applied to any other sourced DSMs.


INTRODUCTION
DSMs performs an important role in various applications such as: planning; 3D urban city maps; natural disaster management (e.g.flooding, earthquake, landslide); civilian emergencies; military activities; airport management; and, geographical analysis, such as crime and hazards (Saeedi and Zwick 2008).Due to an increase in the sources of DSMs, their generation has led to a focus on merging datasets to produces a single one incorporating the data from multiple sources.Different terms have been used to indicate the merging such as fusion, combination, integration and synergy.These all can be considered to be a synonym for merging (Papasaika-Hanusch, 2012).
Data merging has been studied by different researchers.For instance, it can be used, potentially, to identify the highest quality data for an area, as well as to address problems of data volume.Data merging is also important as it fills the gaps and voids produced while constructing the original DEMs.The obvious solution for merging DEMs is to average tiles or strips from DEMs of the same area, and to produce a seamless DSM (Reuter, Strobl and Mehl, 2011), but this will not reflect the original data quality, because it gives the same weight to all data.
Data merging has been shown to be important for increasing data quality Podobnikar (2007), Lee et al. (2005), Fuss (2013), Papasaika et al. (2009), andDowman (2004).Wegmüller et al. (2010), focussed on the value of DSM merging to fill gaps.Hosford et al. (2003) showed an approach for enhancing DSMs through a merging operation based on a geostatistical approach (i.e. one capable of estimating a σ value).Papasaika et al. (2009) used merging to solve the problem of poor image matching technique by incorporating extra data sources.Schultz et al. (1999) used merging for image mosaicking using different data sources (SRTM SAR-X, ERS SAR tandem data).
The purpose of this research was to increase DSM accuracy, based on an area's morphological characteristics.Two pairs of satellite images have been used in this research, first the DSMs have been produced, and later they have been merged using both Maximum Likelihood and Bayesian approaches.Later the results have been evaluated by using check points measured in the field.
The approach that has been suggested in this research is based on Bayes rule which requires incorporation of a priori data.The result has been compared with those achieved using the Maximum Likelihood approach.This account is structured to discuss: Bayesian inference in section 2; the test site in section 3; ; the DSM formation model is in section 4; methodology in section 5; results and analyses in section 6; and, finally, the conclusion is given in section 06.

BAYESIAN INFERENCE
Bayesian statistics are based on Bayes Rule, which is, in turn, based on determining the a posteriori probability (|), utilizing the data that is obtained from experiment, represented in the form of a probability distribution (|).This probability distribution, (|), is the so-called likelihood and represents the measured data  =  1 ,  2 , … .,   , represented by random variable, given the vector of required parameters , and values based on  expressed through the a priori probability and the normalising constant (sometimes called marginal likelihood or evidence, Gelman et al., 2004).Bayesian inference is one of the applications of Bayesian statistics.It can be used to infer information of interest from observations, providing data concerning undetected quantities.In the Bayesian approach, usually, the probability statement is used to estimate parameters for the unobserved data through a process called Bayesian inference.The values used in Bayesian approaches are random variables and provide uncertainties as output, represented as probability expressions.Bayesian inference encompass statistical inference methods where observations are utilized to determine the probability that assumptions are likely to be true, or to revise an already determined probability.In the applied field, Bayesian inference uses a priori probability calculated from the likelihood of certain assumptions regarding the observations imported into a computation or process.
Bayesian inference can be summarized as using a model which is constructed based on Bayes' Rule, and the obtained data represented by a probability form which is called the a posteriori probability distribution.The model used in Bayesian inference should effectively reflect the condition from which information is to be inferred.Bayesian inference is represented by constructing a model that adequately represents the situation from which information is to be inferred.The constructed model is based on Bayes' Rule in which the results are represented by probability and called the a posteriori probability distribution.The elements used in Bayesian inference are based on the likelihood and a priori probability.The likelihood is arises by fitting a probability to the data in the experiment, while the a priori probability refers to subjective belief about the situation before that data gathering (Gelman et al., 2004).The major features of Bayesian inference that have led it to becoming a focus for researchers can be specified as: using the probability of the parameters to measure uncertainty instantaneously; handling any number of parameters; and, the applicability of joint probability density functions (Gelman et al., 2004).
According to the literature, Bayesian approaches have been applied in remote sensing, successfully, to fuse and segment images (Mascarenhas, Banon and Candeias, 1992;Zaniboni and Mascarenhas, 1998;Punska, 1999;Mohammad-Djafari, 2003;Shi and Manduchi, 2003;Feron and Mohammad-Djafari, 2004;Ge, Wang and Zhang, 2007;Gheta, Heizmann and Beyerer, 2008;Zhang, 2010) but no study has been found relating to their use in merging or fusing digital surface models.The common approach to fusing DSMs is based on using a weighted average, after assigning weights to the DSMs' points based on checking their fidelity or performing a "DSM accuracy assessment" by calculating some statistical measurements (i.e.RMSE).

TEST SITE
For the test, two sources of satellite images have been acquired with the specifications as shown in the Table 1 (the first source was from the Worldview-1 (WV-1) sensor from the Digital Globe organization).As shown in Table 1 the time gap between the satellite imagery is calculated to be around 14 months.However it has been assumed, in this research, that no changes have arisen within the time period, therefore the merging in this research does not consider any multitemporal effect.This assumption has been made in order to focus on the effect of the merging using a Bayesian approach.The data have been processed and rectified for DSM production using Socet GXP 4.1 software, with the aid of GCPs.Using SOCET-GXP, the resolution of the DSMs was 1m.It is recommend by the SOCET GXP provider that the orthoimagery resolution be higher than the GSD of the DSM (Zhang and Smith, 2010).

DSM FORMATION AND MERGING MODEL
It is essential to construct a model that links the available DSM to the true underlying DSM; this process is important and must be executed earlier than the merging operation using a Maximum Likelihood Estimate and a maximum a posteriori probability.Due to the errors in the DSMs produced, such as blunders, systematic and random errors arising from the image matching and other techniques the DSM does not typify the surface perfectly.
The blunders and systematic errors can be handled, but for the random errors this is not possible.Therefore the underlying DSM contained errors embedded in it.Equation 1, which can also be referred to as the forward model, is used to relate the true DSM, referred to as DSM ̅̅̅̅̅̅̅ , to that obtained from satellite imagery technique (DSM): Where: Z is the generated height of the DSM at location x,y, Z̅ is the elevation of true underlying DSM ̅̅̅̅̅ at location x,y, and, ε is the height error in each DSM at location x,y.
Merging DSMs can be considered as an ill-posed problem.
According to Hadamard (1902), cited via Beyerer et al.(2011) a problem is considered to be ill-posed in the following cases: if a solution is not unique; the problem does not have solution; or, the result can become significantly different following a small change in the input data.
Due to the intrinsic random error in the DSM, the result obtained from the merging is not unique and therefore the merging problem is considered an ill-posed problem.: : In order to obtain the required DSM the above model is used to form the inverse model.
The measured DSMs are referred to as DSM1 to DSMk, while the underlying real or latent DSM values are represented by DSM ̅̅̅̅̅ and the errors at each location in the DSM are represented by (ε 1 ) to (ε k ).

METHODODLOGY
The DSM obtained from the merging is considered to be more informative and more accurate than the original DSMs used in the merging.Two approaches for the merging which are Maximum Likelihood (the weighted average approaches) and Bayesian approaches are discussed in this section.For both approaches, first quality assessment has been performed to find the accuracy of each DSM.Then later the Maximum Likelihood merging was executed.Regarding the Bayesian merging in addition to the quality assessment the a priori estimation is determined in the probability form.

Digital Surface Model (DSM) Quality Assessment
To start the merging process it is necessary to find out the quality of each digital surface model.DSM quality is an intensely researched topic commencing about 40 years ago, in 1972, led by Makarovic (Li, 1990), in The Netherlands.The quality of the DSM is based on measuring the error in DSM heights.
There are many factors affecting the accuracy of a DSM, according to Chen and Yue (2010), Li (1992) and Papasaika-Hanusch (2012), such as:  distortion inherent in the sensor;  the source data's attributes such as density and spread;  surface or terrain features such as relief, land-cover, and texture;  the mathematical approach that has been implemented to produce the DSM from the data source or the interpolation methods used; and,  techniques used in map-digitization or field surveying.
The quality index that has been used foremost in this study to represent the quality of DSMs is a single RMSE value per terrain model, equation 3, based on using the measured ground control points.In addition to assuming the errors are distributed uniformly, it has been assumed the images are fully registered; this assumption arises from the situation where both DSMs are produced with the same grid spacing using the same software, same technique, and similar resolution satellite images.The only differences are sensor source and acquisition angle.These two factors (source and acquisition angle) cause the created DSMs to be different due to the image matching technique, thus identical features appear different in their respective DSMs, where: ∆ℎ  is the difference between the checkpoint measured by GNSS and that obtained elevation from the DSM -i.e. the 'error' (or discrepancy); and, n is number of measurements or checkpoints.
Regarding the errors, they are assumed to be random variables that are normally distributed.This assumption has been checked by testing the histogram normality using the q-q plot as shown in Figure 2. WorldView-1 vs. CP q-q plot discrepancy(m); Pleiades vs. CP q-q plot discrepancy(m);WV-1vs.CP Figure 2. Original DSMs (Pleiades and WV-1) error distribution histograms and q-q plot against RTK check points. (3)

Merging Using Maximum Likelihood Method
A Maximum Likelihood method is considered the traditional method for estimating the results using noisy input data, is also called the weighted average, and is based on the assumption that noise is normally distributed within the data.The Maximum Likelihood method is based on maximizing the probability associated with the estimated value of a pixel in the merged DSM; that is the error between the elevations in the input pixel (DSM ̅̅̅̅̅̅ (x,y) ) and the corresponding estimated pixel (DSM k(x,y) ) is required to be minimized.
The estimated value of z is obtained by maximizing the likelihood function equation with respect to z (Mittelhammer, 2013), which leads to: where: e is: is the exponential function σ is: is the standard deviation which is representing the Digital Surface Model quality, and, ̅  : is the value of the merged elevation using Maximum Likelihood.
Equation 5 yields Equation 6, after it is simplified and maximized with respect to z, in order to get the Maximum Likelihood estimate for z.This method requires the variances to be known for each measurement, which in this case is represented by the square value of the quality (using RMSE in this situation) of each digital surface model.the detailed derivation of this equation is summarized in Sadeq, 2015.
In other words, when the likelihood function and the observations z (i.e.z1, z2) are given, the estimated value is referred to as ̅  .The model can be extended to cover more than two sensors, which is the case if there is more than one set of measurements:

Merging Using Bayesian Approach
The Maximum Likelihood approach can be considered to deal with each pixel individually and does not take into consideration spatial correlation in the fused images' pixels.For that reason the DSM resulting from the fusion does not consider natural characteristics, such as smoothness, or other representations of the natural ground (Kotwal and Chaudhuri, 2013).So, in order to overcome the spatial correlation problem, it was necessary to introduce an a priori value which satisfactorily transformed the Maximum Likelihood value into a maximum a posteriori value, and transformed the problem from an ill-posed one into a wellposed one, by introducing an a priori value into the solution of equation 2, which leads to the merged digital surface model (Beyerer et al., 2011).
In this section the Bayesian approach is used to invert the forward model equation 1; this model is used to express the digital surface model formation, blended with uncertainty, while incorporating a priori knowledge about the digital surface model (i.e. its morphological properties).The most important benefit in a Bayesian approach and which does not exist in the other methods, is that the effect of noise is reduced by using a suitable a priori value.As already mentioned, it is assumed the errors are random and they can be represented by a Gaussian distribution with mean equal to elevation of the used DSM and variance numerically linked to the uncertainty in the digital surface model, measured by evaluating the quality, using checkpoints, and then blunder is detected and eliminated.
According to the Bayesian equation, the merging is based on multiplying the likelihood by the a priori elevation; the likelihood is based on maximizing the probability.Recalling that the Bayesian rule, used to get the a posteriori probability, is represented by: then, recalling equation 2, and assuming that the sensor error is normally distributed with mean z and variance σ 2 , (, σ 2 ) (which represents the quality of the digital surface model) if there are two sensors: The elevations predicted through maximizing entropy have associated probabilities.Suppose that the elevation error is normally distributed, that the mean error is equal to mean elevation, which is so in this case is ( 1 ,   2) , then the estimated a priori probability of the elevation, from (DSM k(x,y) ), can be based on a smoothed surface.This smoothing can be achieved by maximizing local entropy; three smoothed data sets are produced, based on a 3x3, a 5x5 and a 9x9 window respectively.For the two models being considered here, i.e. (DSM k(x,y) ) where k = 1 or k = 2.The Bayesian approach can be simplified to include only the likelihood and a priori data, and that (), the normalization factor, can be eliminated since the situation is not dealing with finding the absolute value of the probability.Assuming there are only two sensors, this results in: Simplifying the equation and minimizing the result by taking the log of equation 11, the result of the Maximum A Posteriori (probability) (or MAP) which represents the merged digital surface models can be obtained from the following expression: In the case of merging more than two sensors, use the result of merging the two data sets as obtained from equation12, to give a priori probability of the elevation, and then use the third data set (z3) in its original form, with the merged data set (ZMAP_old) in a weighted average operation, as shown in equation 13: It is important to determine the quality of the a priori elevation, by using the below equation as shown by (Weng, 2002) : The above mentioned steps can be summarized through the flow chart blow that shows the steps of merging DSMs based on Bayesian approaches Figure 3. Flow chart for DSM merging process using Bayesian Approaches.

Estimating the a priori using Maximum Entropy
The assumption that is made when constructing the a priori value is to build on the assumption that constructed building surfaces are smooth and have low roughness.This assumption can be quantified by using maximum entropy.
It is clear that entropy is representing the randomness in elevation variations which, consequently, can be employed to characterize the digital surface model.These characteristics can be used to manifest the representation of surface roughness, by assuming that the surface is smooth when the entropy is at the maximum and it is rough (or not smooth) when the entropy is lowest.Based on this assumption one can try to maximize the entropy by changing the value of the middle of the window.
Maximizing the entropy by assigning different values will give an elevation that can be considered as the value of the a priori elevation.
According to the earlier discussion about entropy and how entropy can be used to represent roughness, and as illustrated in Figure 5, maximum entropy is obtained when the probabilities are similar, and the uncertainty is high between the probabilities, i.e. the elevations are close to each other and the surface is smooth.The group of probabilities pij from p11 to pMN in a specific window size MN, are represented by the value H f .The equation 15 represents a one dimension signal, but it can be transformed to apply to a window: where: The values of the heights have been transformed into probabilities within the specific window, MN.At each element of the window, the height probability has been evaluated by dividing each pixel value, f, by the total values of elevation within the specific window in order to find pij.
In order to determine the a priori elevation value, the window is generated each time the maximum entropy is determined for a specific elevation.The model that is used for determining the entropy is illustrated in Equation 17.
̂ in equation represents the maximum entropy.

RESULTS, ANALYSES AND DISCUSSION
This section includes the evaluation of each created and merged DSM that has been generated from optical satellite imagery implementing the developed models.For evaluation purposes 31 check points have been used which were acquired by differential GNSS observations and are considered to be the 'true' ground elevations.Both quantitative and qualitative analyses have been used in this evaluation.
The merged DSMs can be classified into two types: Maximum Likelihood (i.e.weighted average) and Bayesian Merging.
Within Bayesian Merging there were three groups derived from the implemented window size used in estimating the a priori probability of elevation, based on maximising local entropy, namely 3x3, 5x5 and 7x7 windows.Each of them has been evaluated based on two different iteration loops in a ± 0.1m range and a± 0.25m range, using a 0.01m increment.

Statistical Assessment
The statistical tests that have been used in the merged DSM validation have been applied after eliminating blunders.Empirical equations, and statistical tests have been used in the quantitative evaluation , such as (RMSE) and 'determination coefficient', r 2 (Legates and McCabe, 1999).It can be noticed that the errors (discrepancies), which are represented by RMSE, for the Maximum Likelihood case was 0.375m while for all the Bayesian approaches was larger, i.e. being 0.392m, or greater.
During the quality assessment for the digital surface models produced by the Bayesian approach it is noticeable that the RMSE of the merged digital surface model with range ±0.1m is smaller than the RMSE of the merged digital surface model with the ±0.25m range for both the a priori probability of elevation obtained from the 3x3 and 5x5 windows, while the RMSE when the 7x7 window is used to obtain the a priori probability of elevation with the ±0.25m range was better than that achieved with ±0.1m range.The σ of error values were almost the same in all cases (i.e.all size windows and all increments for estimating a priori elevation).
The coefficient r 2 has been used in order to test the correlation between the DSMs' elevations and GCPs' elevations.From the previous section it can be seen that the equation 12, represents the merged model using a Bayesian approach in the case when the measurements was limited to two digital surface models only.
Figure 5.Comparison of the correlation and scatter plot for the input data and merging results using Maximum Likelihood and Bayesian techniques against checkpoints (CPs).First row is for Pleiades vs. Checkpoints, second row is for WV-1 vs. Checkpoints, third row is for Maximum Likelihood vs. Checkpoints, and fourth row is for Bayesian merging -range ±0.25m window 3x3 vs. Checkpoints.
It should be noted that the standard deviation of the error (or σ of discrepancies), which can be considered to be an unbiased representation of error, has been reduced in all the merged DSMs to be better than that for the original data.Initially it was 0.359m (Pleiades) and 0.320m (WorldView-1) and the merging has led it to be 0.306m in the worst case.
The error illustrated in Figure 5, shows the differences between the DSMs either from the original or a merged DSM and the 'true' values.This has been measured taking the difference from of the GCP elevations and the aforementioned DSMs at each checkpoint.
The Bayesian approach has considerable influence on imposing normality on the error distribution as can be seen from the plots of each type regardless of the window size and the iteration range used in looping for estimating the a priori probability of the elevation value.An analysis of the discrepancy scatter plots Figure 5, as recommended by Rusling and Kumosinski (1996) shows that all the values are randomly distributed which further validates the linear relationship between all sets of interpolated heights and their checkpoint values.

DSM Qualitative Assessment
In addition to the quantitative assessment, a qualitative assessment has been implemented.Different analyses were implemented, using visual inspection on generated profiles and slope maps from the DSMs.

Height comparison:
Figure 6 shows the profiles that were produced at the merging stage, against the original profiles.
The weighted average (Maximum Likelihood) approach enhances the DSM by removing irregularities in the underlying DSM and shows, graphically, that the WorldView-1 DSM has, through the weighting process based on the DSM's quality, more influence on the underlying DSM than does Pleiades.Due to variations in satellite geometry, it can be noticed from Figure 6 that there is a misregistration in the DSMs and consequently in the produced profile due to weak satellite geometry (Teo et al., 2010).
The merged digital surface model has a smoother surface than the original digital surface model.The Bayesian approach is able to remove the spikes from the building, and has more influence on the resulting digital surface model than Maximum Likelihood, especially when the range used to infer an a priori probability of elevation has been increased to ±0.25m instead of ±0.1m.WorldView-1 Pleiades 6.2.2 Slope Analysis: Further DSM assessment has been achieved by producing a slope map.In the analysis of the produced maps, Figure 7, it has been shown that the Bayesian approach has an active affect on smoothing the flat surfaces during the merging process.This can support the assumption that has been made during estimating the a priori probability of elevation used in the Bayesian approach.For further detailed analysis of the DSM's slope map, Table 2 shows the statistics analysis for each slope map within the study area represented by the arithmetic mean and standard deviations of the slope values across the whole study.In the slope map analysis, the average slope of the merged DSMs using the Bayesian approaches (i.e.all used window sizes and variances) is less than that for the slopes of the merged DSM achieved using the Maximum Likelihood approach.Moreover, the arithmetic mean values decrease with increasing the window size or with increasing the range value, because with a higher increment value, the patch that is used to infer an a priori probability of elevation has become smoother.On the other hand, the standard deviations of slope across the whole study area for the merged DSMs are less than the original DSMs used in the merging.The standard deviation of the merged DSMs using the Maximum Likelihood method is less than the DSMs using Bayesian approaches.Moreover, the standard deviation of slope across the whole study area rose with increased window size, and for each window size the standard deviation is increased with increasing the range (e.g. with simulation range of ±0.25m rather than ±0.1m).

CONCLUSION
In this research a statistical approach, based on probabilistic methods, has been investigated for merging DSMs and enhancing building footprints.Due to increasing the sources for DSM construction, merging DSMs can reduce data redundancy while improving the quality of the data.The applied statistical tests have shown that merging using a Bayesian method can provide DSM results similar to those achieved using a Maximum Likelihood method for merging, and it can be used for its intended purpose.It was hoped to obtain a DSM that had better characteristics than the original.Noticeable improvements in accuracy cannot, theoretically, be achieved, and purely on the positional accuracy basis the unmerged WorldView-1 DSM offers the greatest accuracy, but a more complete model can be achieved, and an approach is offered which can be used to detect blunders in a contributing DSM.Also the Bayesian approach has helped to smooth the surface of the generated structures so it may represent the real surfaces found in urban areas better.It can be acknowledged that the effect of misregistration has not been treated and the result would be more accurate if this was addressed.But the more complete DSM achieved is likely to assist greatly in a variety of applications, when this problem is addressed.

Figure 1 .
Figure 1.Study area and study data used in merging DSMs histograms discrepancy(m); Pleiades vs. CP histograms discrepancy(m);

Figure 4 .
Figure 4. DSM shows different value of elevations used to represent the entropy (a) the entropy is low since the randomness in the elevations is high (b) the entropy is high since the variance in the elevation is low.

Figure 8 .
Figure 8. Slope map analysis for merged DSMs, the white symbolization shows the effect of merging on removing the slope.

Table 1 .
Characteristics of the used satellite imageriesThe test area used to merge the DSMs is located in Glasgow, United Kingdom, covering 10km 2 , Figure1.The test area coordinates were on the Universal Transverse Mercator (UTM) projection, longitudinal zone 30 'North' using the WGS84 figure of the Earth for both the vertical and horizontal datum.The area had coordinates, bottom left corner (417820 mE, 6191335 mN) and upper right (420527 mE, 6195090 mN).

Table 2 .
Merged and original slope map statistical analysis.