ANALYSIS AND VALIDATION OF GRID DEM GENERATION BASED ON GAUSSIAN MARKOV RANDOM FIELD

Digital Elevation Models (DEMs) are considered as one of the most relevant geospatial data to carry out land-cover and land-use classification. This work deals with the application of a mathematical framework based on a Gaussian Markov Random Field (GMRF) to interpolate grid DEMs from scattered elevation data. The performance of the GMRF interpolation model was tested on a set of LiDAR data (0.87 points/m 2) provided by the Spanish Government (PNOA Programme) over a complex working area mainly covered by greenhouses in Almería, Spain. The original LiDAR data was decimated by randomly removing different fractions of the original points (from 10% to up to 99% of points removed). In every case, the remaining points (scattered observed points) were used to obtain a 1 m grid spacing GMRF-interpolated Digital Surface Model (DSM) whose accuracy was assessed by means of the set of previously extracted checkpoints. The GMRF accuracy results were compared with those provided by the widely known Triangulation with Linear Interpolation (TLI). Finally, the GMRF method was applied to a real-world case consisting of filling the LiDAR-derived DSM gaps after manually filtering out non-ground points to obtain a Digital Terrain Model (DTM). Regarding accuracy, both GMRF and TLI produced visually pleasing and similar results in terms of vertical accuracy. As an added bonus, the GMRF mathematical framework makes possible to both retrieve the estimated uncertainty for every interpolated elevation point (the DEM uncertainty) and include break lines or terrain discontinuities between adjacent cells to produce higher quality DTMs. * Corresponding author


INTRODUCTION
Statistical spatial analysis encompasses an expanding range of methods, being spatial interpolation of unknown elevations, referenced to a common vertical datum, one of the most widely studied because of its ability to produce highly demanded cartographic products such as Digital Terrain or Surface Models (DTM or DSM respectively; DEMs in general) (e.g.Aguilar et al. 2005).Thus, what is really behind all spatial interpolation methods turns out to be the concept of spatial autocorrelation, which gives us an idea of the degree to which a set of features tends to be clustered together or evenly dispersed over the Earth's surface.Note that the assumption of the existence of spatial independence would create a chaotic basis for natural phenomena in Digital Earth modelling because most real world patterns take something between a random and clustered form (Negreiros et al., 2009).This is also the principle of grid DEM interpolation that will support the theoretical basis of this work.
The raw scattered elevations needed to build useful grid DEMs can be provided by different data collection methods, though nowadays photogrammetrically-derived and LiDAR-derived DEMs are among the best known and most widely employed.
During the last decade, photogrammetrically-derived DEMs have received a boost with the adaptable stereo imaging capability of the newest civilian VHR satellites that allows generating strong stereo geometry with adequate base-to-height ratio (Aguilar et al., 2014).Furthermore, their agile pointing ability enables the generation of same-date in-track stereo images, reducing radiometric image variations and so improving the success rate in any matching process.
On the other hand, airborne light detection and ranging (LiDAR) has gradually become a common tool for gathering high density and accurate scattered points on surface targets (Renslow, 2012).Compared to photogrammetric method, high vertical accuracies and dense point clouds are achieved regardless of external light condition.In addition, airborne LiDAR systems can penetrate through vegetation and record the underlying terrain by distinguishing between the different reflections.
In any case, and once the earth points have been collected, it is necessary to carry out a filtering process to label, at least, ground points and non-ground points in order to produce high quality DTMs (Aguilar et al., 2010).This step is not required when it comes to producing DSMs, but it is usual to count on the two kinds of surface models just to compute the so-called normalize DSM (nDSM = DSM -DTM), which can be very useful as an additional input layer to extract man-made features in land cover classification.
Once the bare-earth points have been identified through ground filtering, the next step is to apply an interpolation method to create a grid DTM/DSM.In the case of DTM, many grid posts may be recorded without heights (especially in areas with predominance of man-made features), so arising areas of data voids.Such absences in data demand an interpolation procedure be carried out to infill the voids.In this sense, this work deals with the application of a mathematical framework based on Gaussian Markov Random Field (GMRF) to interpolate grid DEMs from scattered elevation data.

Modelling spatial elevations as grid DEMs based on a Markov Random Field
The proposed approach aims at estimating the maximum a posteriori estimation of grid elevations (m) that make up a DEM, including their uncertainty, given a number of scattered observed elevations (z) and their corresponding sample data vertical uncertainty (σs).In this way, a map m = {mi} (i = 1 to N) is modelled as a Markov random field (MRF) where mi are grid DEM estimated elevations locally supported by the observed elevations inside the ith gridmap cell with easting and northing coordinates (xi, yi).Notice that MRF is a tool widely used in estimation problems on grids such as image processing, where statistical models are applied to determine the intensity of image pixels (Winkler, 2003).
The joint probability distribution p(m,z) to be maximised can be factored as the product of the potential functions φ(•) (φ(•) > 0) for the set of all its maximal cliques (Cm) (Bishop, 2007): where the term E can be interpreted as an energy function.For convenience, the maximization process is implemented in practice as the minimization of the negative logarithm of the product above.Now it is convenient to build up a graphical model from which to derive the optimization equations (Dellaert and Kaess, 2006).In this graphical model, each potential function is translated into a factor F that deals with the dependencies between the variables of the MRF model (Figure 1).The model depicts two kinds of nodes and factors: i) grid DEM elevations to be estimated (m), with their corresponding prior factors which, being independent of observations, provide any a priori knowledge on how the elevations are spatially distributed, and ii) observed elevations (known data z), which represent true terrain points to help constraint the m elevation of the gridmap cell where z data is located.According to the two different set of factors, the joint probability distribution becomes:

Observation factors
It is assumed that all the factors involved in the aforementioned graphical model can be properly modelled by means of a Gaussian distribution, so the final developed model becomes a Gaussian MRF (GMRF), all energy terms in eq. ( 2) conveniently become quadratic forms, such that the maximization of eq. ( 1) becomes a well-known, weighted leastsquares minimization problem.
Regarding observation factors, they try to model the relation between every observed elevation and the true elevation of the gridmap cell where the observed elevation is located.Let us suppose that each observed elevation zk, with k = 1..M, presents a sample data error or uncertainty.This random error is approached by a Gaussian distribution N (0 , σ 2 s), being important to highlight that σs may take different values for any zk depending on local terrain complexity, point density, land cover, pulse penetration, etc. (Hodgson and Bresnahan, 2004;Aguilar et al., 2010;Su and Bork, 2006;Montealegre et al., 2015a).A simple probabilistic approach provides the mathematical framework (conditional pdf) for each observation factor in the graphical model through the following expression: where, given mik, it is assumed a conditional independence between zk and the rest of the grid points.
Thus the resulting energy function associated to the observation factors is given by: Figure 1.Factor graph corresponding to 4-neighbourhood scheme for a grid point at row i and column j.Two types of nodes: estimated elevation at grid points (m) and observed elevations (z).They affect central cell estimate depending on prior factors (Fp) and observation factors (Fo) respectively.

Prior factors
These prior factors deal with the correlation between elevations of neighbouring grid points, penalising the z-differences (4neighbourhood scheme) in this case: Where every prior factor di,j follows a Gaussian distribution of mean 0 and variance σ 2 p. σp can be understood as a tolerance parameter (expressed in units of length, e.g.meters) related to the relaxation or constriction between elevations in adjacent grid points.In summary, the lower the tolerance parameter, the higher the smoothing of GMRF interpolated surface and vice versa.
Note that the term bi,j tries to model the presence of break lines (in the jargon of Digital Terrain Models or DTMs) or very high discontinuities in terrain morphology by employing a binary distribution P(bi,j)∈ [0,1] where 1 and 0 means the presence or absence of discontinuity respectively.We can then apply the law of total probability for the two only options: The probability distribution p(di,j) can be properly approached through the following expression: therefore the energy function associated to the prior factors can be written as: Where L is the number of pairwise neighbouring grid points (4neigbourhood scheme) located at row i and column j for the clique k.In this preliminary study, it has been applied P(di,j) = 0, i.e. break lines have not been considered, although their inclusion will be investigated in further works.

Maximum a Posteriori Estimation (MAP)
The goal is to maximise the posterior p(m|z1:M), where z (varying from 1 to M) are the elevations of the M observed points, to obtain a MAP estimate m ) of the GMRF generated grid DEM.By taking the negative logarithm over such posterior (equation 3), the resulting energy function becomes the leastsquares form of a GRMF inference problem (Madsen et al., 2004), which can be written as: Rearranging the terms of E(n) as a sum of quadratic errors r weighted by an information matrix A, conducts to the following expression: Where errors have been defined through a prediction function in the form r = f(m) -y, being y a vector of known data.It is worth highlighting that, given the assumed statistical independence between variables and elevation error, A (L+M square matrix) turns out to be a diagonal matrix where its first L entries correspond to the prior factors or correlation between adjacent grid DEM cells (Apk = 1/σ 2 p).The rest of M diagonal entries stores the weights of every observed elevation (Aok = 1/σ 2 s).
The minimum of the expression in equation 11 can be solved, for example, by means of the Gauss-Newton method (Dennis and Schnable, 1987): Being J = dr/dm the Jacobian of the error function r, J t AJ the Hessian matrix (H), -J t A(f(m)-y) the Gradient (g) and ∆m the unknown elevations of the GMRF interpolated grid DEM.
Because all the factors are linear with the grid m, ∆m = m ) can be solved without iterating.
The existence of two blocks in the Jacobian matrix, with an upper block devoted to prior factors and lower one to observation factors, makes convenient to decompose H into the sum of two components (H = Hp + Ho).
The first matrix, Hp, contains the following nonzero entries: i) Off-diagonal elements: Being k the number of neighbours of grid DEM cell i.
The second matrix, Ho, is diagonal and only depends on observed elevations: Where k is the index of the observed elevations inside the grid DEM cell i.Finally, the gradient vector g = J t Ar (Nx1 row-vector being N the number of grid DEM cells) is computed from the next expression: For all the k observed elevations taken at grid DEM cell i and all the j grid DEM cells adjacent to the grid point i.

Study site
This investigation was carried out in Almería, southern Spain, which has become the site of the greatest concentration of greenhouses in the world, known as the "Sea of Plastic" (Figure 2).
The study site comprises a rectangle area of about 100 ha centred on the ETRS89 UTM 30N coordinates of 524950 m easting and 4072650 m northing (Figure 3), showing an elevation range between 152.6 m and 214.8 m above mean sea level (Spanish orthometric heights EGM08-REDNAP).The mean elevation took a value of 186.9 m, with a moderate northsouth mean slope of around 4.3%.However, and from a morphological point of view, the clear predominance of greenhouse land cover, meaning a high frequency of elevation break lines, makes difficult to obtain accurate DSM and DTM products.In fact, it is a landscape that can be qualified as highly altered by off-terrain elevations (micro-relief).Regarding natural terrain, two dry ravines can be made out running down from North to South along both sides of the working area (Figure 3).

DSM accuracy assessment
The original LiDAR point cloud was decimated by randomly removing different fractions of the original points (from 10% to up to 99% of original points removed).In every case, the remaining scattered points (observed or map points) were used to obtain both 1 m grid spacing GMRF and TLI interpolated DSMs whose vertical accuracy (RMSEz and mean, maximum and minimum vertical error) was assessed on the set of previously extracted checkpoints by obtaining the z-differences (Zcheckpoint -ZDSM) (Aguilar et al., 2006).Every ZDSM corresponding to the easting and northing coordinates of the check points was calculated by applying bilinear interpolation from the GMRF or TLI computed 1 m grid spacing DSM.
Regarding observed elevations error (σs) in terrain mapping, a well-known characteristic is its relationship with terrain slope, especially in the case of DSM generation by means of laser scanning, where planimetric error may be relatively high and also may be directly translated to vertical error on sloping surfaces (Aguilar et al., 2010).Bearing in mind that LiDAR point density is also a critical influencing factor, every observed elevation was assigned an uncertainty (standard deviation) given by the empirical equation proposed by Karel and Kraus (2006): Where σs is the observed elevation uncertainty (m), n the point density (points/m 2 ) and tanα the local slope (dimensionless).
The whole procedure described above and the mathematical framework of GMRF method was programmed in C++ and has been released as open source code.

Sensitivity analysis for the tolerance parameter (σp)
A sensitivity analysis was performed to assess the degree of influence of the so-called tolerance parameter or relaxation factor (σp) in the grid DSM vertical accuracy.This parameter may be compared to the smoothing factor of interpolation based on radial basis functions (Carlson and Foley, 1991;Mitasova and Mitas, 1993), being then depending on the terrain complexity or slope allowed between neighbouring points (Aguilar et al., 2005).For example, a low σp value or tolerance might not be suitable for sharp changes in elevation over short distances, such as steep cliffs or man-made features (e.g.along greenhouse or building walls in this case).In such situations, the resulting DSM would look like over-smoothed, being convenient to diminish the stiffness of the interpolated surface by increasing the tolerance parameter.This concept is somehow analogous to the interpolation by means of radial basis function approach based on thin plate splines, where the stiffness of the plate can be lowered by including first derivatives to the smooth seminorm, so tuning the character of the interpolation surface from thin plate to membrane (Mitas and Mitasova, 1988).
A priori, and from the explanation above, it can be deduced that σp optimum value would be related to relief features that are closely connected with terrain variability and roughness such as, for instance, average terrain slope, standard deviation of unitary vectors perpendicular to the topographic surface, standard deviation of the difference in height between adjacent samples (Aguilar et al., 2006) or mean profile curvature (Mitasova and Hofierka, 1993).
The DSM vertical accuracy assessment results as a function of different σp values are depicted in Table 1.Neither the dataset containing the 10% of the original LiDAR points (10.74 m EGS) nor the dataset with only 1% of the total points (EGS = 33.97m) show significant differences when varying σp input values, both in terms of rmsez and mean error.Although the results should be qualified as quite preliminary, because they have to be contrasted on a greater number of terrain morphologies, it could be hypothesized that a reasonable range of values for σp can be tested without significantly affecting the final results.In this way, a σp value of 1 m was employed in this work.

Comparison between GMRF and TLI methods
The method TLI consists of converting randomly spaced data points into regularly gridded data points using contiguous and non-overlapping Delaunay triangles, whose vertices are the own scattered data points, and computing every z over the grid from the local plane containing the corresponding triangular facet (Aguilar et al., 2006).It is a simple, computationally efficient and parameter-free method that has proved to performance quite well on different morphologies and environments (Aguilar et al., 2006;Guo et al., 2010;Montealegre et al., 2015a).TLI method is also de facto standard method used by US Federal Emergency Management Agency to create LiDAR-derived DEMs because of its simplicity and effectiveness (FEMA, 2010).
The qualitative results provided by the GMRF interpolation method may be deemed as visually pleasing and reasonably independent on the initial number of observed LiDAR points (Figure 4).The same can be said about the results provided by the Triangulation with Linear Interpolation (TLI) method (data not shown), although if point density was much lower than the output cell size, some triangles of the TIN model may be unpleasantly made out in the output DEM/DSM.In Figure 5 can be appreciated the goodness of terrain morphology description achieved from applying the GMRF interpolation method (1 m grid spacing DSM and σp = 1 m) from a random sample comprising 10% of the original LiDAR points.In fact, the GMRF-based cross-section over the dry ravine seems to be rather similar to the profile obtained from the original LiDAR points, although GMRF computed DSM looks like clearly smoother.
With regards to the quantitative results in terms of vertical accuracy, Tables 2 and 3 depict some vertical accuracy statistics for several ratios of observed or map points with respect to the total amount of original LiDAR points, ratio that is also tabulated as the Equivalent Grid Spacing (EGS) for a better understanding.Both GMRF and TLI methods did not present systematic errors even working on low point densities (EGS around 10.7 m), with absolute mean error always below 1 cm.Regarding random errors, root mean square error (rmse) behaved similar for the two methods tested, keeping quite stable against varying the ratio of observed points till reaching the value of 1%, just when rmse rose sharply exceeding the value of 1 m.It is worth underlining that the values of rmse shown in Tables 2 and 3 3. Vertical accuracy of TLI-computed DSMs (1 m grid spacing) depending on the ratio of observed points relative to the total.EGS = Equivalent Grid Spacing In Figures 6 and 7 can be found the plotted histograms of zdifferences between the check points and the GMRF and TLI computed elevations after applying the widely known 3σ rule to remove outliers (Daniel and Tennant, 2001).From the corresponding Gaussian distribution overlaid on both figures, it can be detected a clear leptokurtic and unbiased distribution of residuals for the two methods tested which is rather usual in landscapes with clear predominance of non-open terrain LiDAR points (Aguilar and Mills, 2008).This scenario means that despite applying 3σ rule, most of theoretical outliers still remain likely because of dealing with a grid DSM which is modelling an excessively complex landscape and morphology with abundance of break lines mainly originated by man-made features.

Mapping DSM vertical uncertainty
Without a doubt, one of the main strengths of the proposed method would lie in the possibility of mapping the uncertainty associated to every interpolated point.In fact, this is a complementary output pursued by any interpolation model, being a recursive demand tackled in different studies (Kraus et al., 2006;Aguilar et al., 2010;Liu et al., 2015).Yet, maybe only Kriging has demonstrated to counts on the mathematical framework to properly carry out this task (Cressi, 1988), although requiring a lot of decision-making and being computationally intensive.
The GMRF method outlined in this work also allows obtaining the uncertainty of the estimation at each grid DEM/DSM cell.It can be simply accomplished by computing the diagonal elements of H -1 (inverse of Hessian matrix in equation 12), given that H -1 (i,i) is found to be the variance of grid DEM cell i (σ 2 i).The resulting uncertainty map, in terms of standard deviation of estimated grid elevations, is depicted in Figure 8 as a GMRF DSM (2 m grid spacing and σp = 1 m) computed from only 864 LiDAR points randomly sampled (0.1% of the original LiDAR points).According to expected, and from a qualitative point of view, the grid points closest to the observed points were those presenting the lowest uncertainty (blue colour).Areas without observed points (blank areas) showed more uncertainty, especially those located at the edges of the map (red colour) where the uncertainty slope gradient was higher, reaching values above 1 m.This edge effect was provoked because of the absence of observed points on the other side, thus producing a sudden increase in the estimated error of the computed grid elevation.

GMRF-computed DTM from only ground points
Since the raw LiDAR data contain a large number of points returned from various surface objects, such as buildings, bridges, electrical wires and trees, these non-ground/object points should be separated, the so-called LiDAR data filtering, prior to Digital Terrain Model (DTM) construction (Cheng et al., 2013).Continuing with this idea, the proposed GMRF method was applied to a real-world case consisting of filling the LiDAR-derived DSM gaps after filtering out non-ground points, mainly situated on greenhouse roofs, to obtain a DTM.LAStools open source software, through its tools lasground, was utilised to automatically label each LiDAR point as ground or not (more information in http://rapidlasso.com/lastools/).Lasground implements the algorithm proposed by Axelsson (Axelsson, 2000) based on a progressive triangular irregular network (TIN) densification.The ground points classified by lasground were visually checked and manually edited to improve as much as possible the final classification, taking into account the high proportion of greenhouse land cover over the working area and, conversely, the highly sparse and low density sample of ground points available.In Figure 9 are shown the LiDAR points labelled as ground (in red) overlaid on the original point cloud, highlighting the abundance of ground gaps (mainly greenhouses) to be interpolated in order to obtain a continues surface for modelling the underlying terrain topography.
Finally, Figure 10 depicts a 3D shade view of the GMRF interpolated DTM (1 m grid spacing).This DTM was produced from 227084 previously classified ground points and tested over 1141 check points randomly sampled and put aside from the set of the original ground points (0.5% of the ground points available) to undertake a true validation.Besides providing a visually high quality DTM (Figure 10), the quantitative vertical accuracy assessment carried out on the GMRF interpolated DTM yielded good results, getting a rmsez of 28 cm and a mean error of 0.3 cm, which can be considered as reasonable figures in the case of open terrain LiDAR points.

CONCLUSIONS
The results provided by the proposed Gaussian Markov Random Field (GMRF) interpolation method may be deemed as very promising, producing visually pleasing and accurate digital surface or terrain models.The widely known interpolation method based on triangulation with linear interpolation (TLI), a common and accepted interpolation algorithm because of its simplicity and effectiveness working on LiDAR data, yielded similar qualitative and quantitative results.Both methods do not require to specify the local support or kernel (searching radius or maximum number of neighbours intervening in the interpolation of each grid point), which can be qualified as very advantageous, above all when dealing with low density ground points areas (forests, cities and, in general, landscapes with high presence of man-made features) for producing DTMs.However, and unlike the TLI method, the mathematical framework implemented through GMRF algorithm makes possible to easily retrieve the maximum a posteriori estimation of every interpolated elevation point from its graphical model, so allowing an easy way to map the spatial distribution of DEM uncertainty.
Agrifood Campus of International Excellence ceiA3.The LiDAR data were provided by the National Centre for Geographic Information of Spain (CNIG).
partition function (a proportionality constant), C represents the different cliques and nc the set of values (m,z) for the corresponding clique.Since potential functions are strictly positives, it is possible to express them as exponential functions:

Figure 4 .
Figure 4. 3D shaded view of GMRF DSM (1 m grid spacing).Top: ratio of observed points 90% (high density); σp = 1 m.Bottom: ratio of observed points 10% (low density); σp = 4 m kept below the standard deviation of 1.2 m measured on both open and non-open terrain check points during the GPS RTK field survey described in the section 3.2.

Figure 8 .
Figure 8. Map depicting the grid DSM vertical uncertainty (estimated values classified according to the map legend).

Figure 9 .
Figure 9. Lidar points (in red colour) automatically classified as "Ground" by applying LAStools software and manual edition

Table 2 .
Vertical accuracy of GMRF-computed DSMs (1 m grid spacing) depending on the ratio of observed points relative to the total.σp = 1 m in all cases.EGS = Equivalent Grid Spacing