A NEW MULTI-RESOLUTION ALGORITHM TO STORE AND TRANSMIT COMPRESSED DTM

WebGIS and virtual globes allow DTMs distribution and three dimensional representations to the Web users’ community. In these applications, the database storage size represents a critical point. DTMs are obtained by some sampling or interpolation on the raw observations and typically are stored and distributed by data based models, like for example regular grids. A new approach to store and transmit DTMs is presented. The idea is to use multi-resolution bilinear spline functions to interpolate the observations and to model the terrain. More in detail, the algorithm performs the following actions. 1) The spatial distribution of the observations is investigated. Where few data are available, few levels of splines are activated while more levels are activated where the raw observations are denser: each new level corresponds to an halving of the spline support with respect to the previous level. 2) After the selection of the spline functions to be activated, the relevant coefficients are estimated by interpolating the observations. The interpolation is computed by batch least squares. 3) Finally, the estimated coefficients of the splines are stored. The model guarantees a local resolution consistent with the data density and can be defined analytical, because the coefficients of a given function are stored instead of a set of heights. The approach is discussed and compared with the traditional techniques to interpolate, store and transmit DTMs, considering accuracy and storage requirements. It is also compared with another multi-resolution technique. The research has been funded by the INTERREG HELI-DEM (Helvetia Italy Digital Elevation Model) project.


INTRODUCTION
Nowadays, Digital Terrain Models (DTMs, Li et al, 2005) represent fundamental databases for the Geographical Information Systems (GIS, O' Sullivan and Unwin, 2003).Up to few years ago, DTMs were only used in specific applications of territorial analyses, typically by the scientific community.The coming and the diffusion of the new technologies based on Web GIS and virtual globes have changed the perspective: altimetric analyses and three dimensional representations of the terrain are still object of new researches but also praxis.With respect to the traditional programs for the two dimensional representation, virtual globes have introduced the third dimension and, consequently, a simpler usage and a greater visual consistence between the digital representation and the real world.At present, the new acquisition techniques provide information with a never seen accuracy.Therefore, virtual globes are no more merely qualitative viewers for low resolution global data, but can become scientific instruments to process and analyze high accuracy geographical information.Virtual globes and Web GIS cannot be properly compared, but they share a fundamental principle: the geographic information (satellite and aerial images, height data, vectorial objects) is accessed via Web.Particularly, the servers provide the data according to specific transmission standards that have been defined mainly by the Open Geospatial Consortium (OGC, 2006(OGC, , 2010a(OGC, , 2010b)).The new DTMs provide height information with never seen accuracy and spatial density (El-Sheimy et al., 2005).However, in the Web distribution of geographical information, the databases storage size represents a critical point.Given a specific interest area, typically the server needs to perform some preprocessing, the data have to be sent to the client, that applies some additional processing: the efficiency of all these actions is crucial to guarantee a near real time availability of the information.Generally speaking, the terrain surface is composed by an infinite number of points: a DTM is obtained by interpolating the available height observations and extracting a finite dataset that allows the reconstruction of the whole surface at a given accuracy.DTMs can be stored and transmitted according to two different approaches: the data based models and the analytical models (Biagi et al., 2011).In the data based models, the DTM is stored and transmitted by a sample of interpolated heights that are used to reconstruct (interpolate) the terrain heights in other points.On the contrary, an analytical model implies the storage of a dataset of coefficients that, in a one to one relation with a given function, allows the computation of the height everywhere.The purpose of this paper is to discuss a new analytical approach, that is based on an least squares interpolation by multi-resolution bilinear splines and has been already discussed in its preliminary implementation in (Brovelli and Zamboni, 2009).The raw observations are interpolated by a linear combination of splines with compact support, whose resolutions and positions vary in space and are automatically chosen according to the distribution of the raw observations.In the following, they will be called multi resolution splines.For each spline, the resolution level, the position and the coefficient are stored by the server and are transmitted to the clients.The coefficients and their auxiliaries metadata allow the complete reconstruction of the terrain at any point and different detail levels can be provided, according to the required accuracy.The purpose of the proposed approach is the saving of storage requirements with respect to the traditional models without any loss of accuracy.The following part of the introduction shortly summarizes the data models and interpolation methods that are typically adopted to store and compute DTMs.Section 2 discusses our new approach and its storage requirements.In Sect.3, the performances of the different models are analyzed, by comparing their application to a case study.The conclusions and the future developments follow.

Data based Models
Different data models can be adopted to store and transmit a DTM.Contour lines, grids (or elevation matrix) and Triangular Irregular Networks (TIN) are the standard.Contour lines are obtained by connecting with a line all the points with the same height.The lines are drawn at given, equally spaced in height, intervals.Contour lines are useful to visualize heights on maps in 2D applications, but seldom are used for analysis purposes, and are stored and transmitted following the general rules of vector objects.Gridded DTMs (in the following DTM GRID ) are georeferenced as regular grids of nodes, whose heights are stored.The storage of a grid requires a set of metadata that allow its georeferencing (see Sect. 3) and are listed in the so called header.The heights are stored in an ordered sequence.DTM GRID is a very simple conceptual model and can be easily accessed, visualized and spatially analyzed by map algebra.However, the choice of the grid resolution is a crucial point, because the storage size is inversely proportional to the square of the gridding interval.If rough terrain (for example mountains) alternates to flat terrain (plains), the high resolution needed to accurately describe the first causes a useless redundancy in the second.To continuously describe the heights between the nodes, either a bilinear or a bicubic interpolation is typically applied.In a TIN, the DTM (in the following DTM TIN ) is described by a set of planar triangular faces that are obtained by connecting sparse points, whose horizontal coordinates and heights are given.Usually, the Delaunay criterion is applied to triangulate the points.By a TIN model, more points can be stored where the terrain is rough and less points are used in flat areas.Each point of a TIN is represented by its three (X, Y, height) coordinates.Moreover, to reconstruct the topology of the triangles, the labels of the three vertices of each triangle are needed.This simple data model requires long computation times for the processing and the analysis of the 3D surface: therefore, in the practice, more complex topological models are applied, like for example the node based, the triangle based or the edge based data structures.These models reduce computation times but require an overhead of information that is stored and transmitted to the clients.When a TIN model is used, the height within each triangle is linearly interpolated from its three vertices.

Interpolation techniques
To produce a DTM, several interpolation techniques exist: a first classification can be into exact and approximate interpolations.An exact interpolator passes for all the observations and allows the complete reconstruction of all the discontinuities existing in the dataset.However, the observation errors are not filtered and propagate into the model.A classical example of exact interpolator is given by the Inverse Distance Weighting (IDW).Approximate interpolators apply statistical methods to estimate a smoother function from the observations: in this way, the errors can be filtered and both the observations accuracy and the function correctness can be assessed.However, actual details and discontinuities can be lost in the smoothing.Local Polynomial (POL) is an approximate interpolator when the coefficients are fewer than the observations and are estimated by least squares.
In the deterministic interpolation, either exact or approximate, the analytical model of the surface is a priori chosen and the observations are used to estimate it: IDW and POL are examples of deterministic interpolators.In the stochastic interpolation (Christakos, 1992), the observations are considered as a sample of a random field (the surface) that is completely described by spatial stochastic properties like, for example, the covariance function.The stochastic properties are estimated analyzing the observations and then applied to interpolate the surface.Collocation and kriging are the classical examples of stochastic interpolators.Note that the most popular interpolation techniques, as reported in scientific and technical literature, cannot be easily and efficiently used to implement an analytical model because the interpolating functions cannot be described by a small number of parameters or coefficients.In IDW and POL, the interpolation coefficients and domain are a function of the positions of both the interpolation point and the observations: to reply the model, all the observations must be stored and distributed.Radial Basis technique uses a linear combination of radial functions that interpolate exactly the observations and are characterized by the minimum curvature.These methods (Regularized Spline, Spline with Tension, Thin Plate Spline) differ in the function choice, all of them could be analytically described by a finite set of coefficients but the needed coefficients are at least as many as the raw observations.Let consider a stochastic interpolator, for example the collocation.The height in a point is provided by the ˆ( ) T h P  c ξ , where ξ is the vector of the observations multiplied by its inverse covariance matrix, c is the crosscovariance vector between the point and the observations.c can be built by knowing the covariance function of the surface and the positions of the observations, while ξ needs to be stored: also in this case, an analytical model would require as many data as the original observations.The classical bilinear splines estimated by least squares provide a twofold interpretation, because they can be thought as both data based and analytical models.Given the required spatial resolution, the observations are interpolated to estimate the coefficients of the splines, that are used to predict the heights on a regular grid, that represents the data based model.If the splines and the grid have the same spatial resolution, the coefficients of the splines and the heights of the relevant grid nodes are equal.Moreover, the coefficients of the bilinear splines used to interpolate from the four neighboring nodes of a regular grid are exactly the relevant four heights: indeed each local bilinear spline assumes the maximum in its node, while annihilates on all the other nodes.In this case, the analytical model has exactly the same complexity of the data based model.
The adoption of a new multi-resolution splines interpolation has been studied, that represents a true analytical model and provides actual storage and distribution saving with respect to a data based model.

THE MULTI-RESOLUTION SPLINES APPROACH
The approach has been previously discussed in a preliminary way by (Brovelli and Zamboni, 2009): it is an approximate deterministic method whose estimation principle is based on Least Squares (LS, Kock, 1987).The founding idea is to combine splines with different width in order to guarantee the resolution adequate to the data density in every region of the field, exploiting all available information implicitly stored in the sample.
The height field is given by the where is the coefficient of the spline at the grid node ( , ) In the estimation, all the field observations are tiled in a vector If H0 is true, the coefficients of the new level are not significant, the relevant estimates can be discarded and the iterative process can be stopped.Otherwise, the coefficients should be kept and a new higher level should be tested.Let define the following quantities  analysis on the a posterior variances can be applied indicate respectively a chi square variable with i degrees of freedom and a Fisher variable with ( , ) i j degrees of freedom.A threshold value F  with significance  can be set and the zero hypothesis (2) can be tested by (3).

Storage requirements
To evaluate the DTM storage requirement of the different models, at first a numerical comparison between grids, TINs and our multi-resolution approach is here presented.Particularly, an occupation of 64 bits (8 bytes) is hypothesized for the horizontal coordinates and the height of a point.In the where h N is the number of activated splines in the level; X j i and X J i are the row and column indexes of the node occupied by the J-th spline; , ,


is the coefficient of the J-th spline.
At level h, the maximum number of active splines is bits to store the number of active splines, bits needed to store the row and column indexes ,  64 h N  bits needed to store the coefficients.

A CASE STUDY
In order to evaluate the proposed approach and compare it with the data based models, we have analyzed one case study.The data stem from a LiDAR survey of a promontory overlooking the lake of Como in Northern Italy.The horizontal spacing of the pre-processed grid is 2 m x 2 m and its vertical accuracy (Rood, 2004) is of about 20 cm.
The first step is to extract three different samples in order to simulate three dataset of sparse observations with different accuracies from which extract the relevant DTMs.For this reason, four TINs have been extracted from the grid, with different sampling tolerances.By fixing the tolerance equal to 5m, 2m and 1m, we have created respectively the training datasets TR5, TR2 and TR1 containing scattered data (i.e. the nodes of the TIN's) .By fixing the tolerance equal to 20 cm (and removing TR5, TR2 and TR1), we have finally created the test dataset TE to use for cross-validate the results.The original dataset is shown in Figure 2. In Table 1 the statistics of the datasets are reported.2. Characteristics of the three sampled TIN.N_V: number of vertices; S_V: storage space for vertices; N_F: number of faces; S_F: storage space for faces; S: total storage size.If both the accuracy and the storage size of a grid have to be considered, the optimal compromise is given by the coarser grid that guarantees the desired accuracy.Therefore, different International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B4, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia spatial resolutions have been tested in the interpolation, starting from 10 m and decreasing it by steps of one meter.The residuals of each resulting DTM GRID have been computed with respect to the starting TR and to the set of check points (TE).
For each interpolation function and for each starting TR, the coarser DTM GRID whose accuracy is consistent with the starting data set has been selected.In Table 3, the statistics and the characteristics of the final DTM GRID are reported.
In the interpolation of TR1 and TR2, POL function does not provide consistent accuracies, even pushing the resolution of DTM GRID to one meter.The other algorithms provide satisfactory results, the best being CRS that provides one meter accuracy already at the resolution of three meters.
As expected, the TIN model requires less storage space than the grid model.The saving ranges from 72% to 97% for TR1, from 68% to 80% for TR2, from 59% to 67% for TR5.

Interpolation and storage size of the multi-resolution model
By interpolating TR1, TR2 and TR5, the coefficients of the multi-resolution splines model (DTM MR ) have been estimated, according to the criteria described in Sect. 2.Then, the relevant accuracies and storage sizes have been analyzed.The statistics of the DTM MR s obtained by interpolating T1, T2 and T5 are summarized in Table 4.For each TR, the total number of splines, the number of splines per level, the statistics of the residuals on the used observations and on the checkpoints are reported.Moreover the storage size in Kbytes is given.The interpolations of TR1, TR2 and TR5 activate eight, seven and six levels of splines respectively.However, the final accuracies of DTM MR (TR1) and DTM MR (TR2) are not completely satisfactory: indeed, their RMSEs exceed one and two meters respectively and are worse than those of the respective DTM GRID s.The criteria that we are adopting to automatically activate the splines (f=1 and k=4) is conservative and, in these cases, provide sub optimal results: more analyses are needed and will be discussed in a final paper.Moreover, it should be taken into account that the decimation criterion International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B4, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia adopted to create TR1, TR2 and TR5 from the original grid is optimized to build a TIN model and not to apply a multiresolution interpolation.At present, an optimized criterion for the automatic activation of the splines is under study.Moreover, new decimation algorithms will be applied.
To honestly compare the storage size required by different models, they should have the same accuracy.Except for the POL interpolation, all the final DTM GRID 's interpolated from TR1 are more accurate than DTM MR (TR1).In the storage comparison, coarser DTM GRID have been chosen, with accuracies similar to that provided by our approach.Even the final DTM GRID 's interpolated from TR2 have better accuracies than DTM MR (TR2) but, in this case, the coarser grids are worse.Therefore, for DTM MR (TR2) and DTM MR (TR5) the storage requirements are compared with the grids of Table 3, for DTM MR (TR1) with the grids reported in Table 5.The comparisons, in term of storage saving, are reported in Table 6.At the end, our approach has been compared also with the Multilevel B Spline Approximation, that represents a different multi-resolution interpolation approach by bicubic splines (Lee et al., 1997).To make the comparisons (Tab.7), the lower level of MBA whose accuracies are similar to our approach has been chosen: also the storage requirements of MBA are significantly bigger than those of our approach.

CONCLUSIONS
In this paper a new approach has been presented to interpolate and store a DTM, aimed at saving storing size without losing in accuracy with respect to classical models.Multi-resolution bilinear splines are adopted to interpolate the observations and their coefficients are stored, instead of the interpolated heights.The coefficients can then be used to reconstruct the height at any point.The model is defined analytical instead of data based.The classical models have been compared with our approach, considering accuracy and storage requirements.An original grid has been sampled to produce three TINs with tolerances of one, two and five meters respectively.Then, the TINs vertices have been interpolated by different deterministic techniques to produce grids at different spatial resolutions and the grids have been compared with the original data.Synthetically, different interpolation techniques provide similar results and the accuracy of the grids increases with their resolution: in particular, accuracies of one, two and five meters are obtained respectively with one, five and ten meters of spatial resolution.At present, our approach reaches an accuracy slightly worse than the accuracies provided by the finest grids: this problem is probably due to a particular interpolation choice that still needs to be deeply analyzed and optimized.However, for similar accuracies, our approach allows a significant storage saving with respect to the classical models: indeed, its storage size is about 2% of the grids size and 5% of the TINs.The results are quite satisfactory and justify further studies finalized to define a complete scheme for the managing of the data in the server, for their transmission and for their use by the clients.
is based on the well known LS principle.Two innovative aspects characterize our interpolation approach.Given a level, each local spline is individually activated if no spline of some lower level has the same application point.Moreover the spline is activated if at least , 1,f f  observations exist in at least k ( 1, 2,3, 4 k ) quarters of the spline support: , f k are input by the user.They must be choosen according to two criteria.
posterior estimates provided by LS.From a geometrical point of view the situation is depicted in Fig.1.

Figure 1 .
Figure 1.Geometric interpretation of the significance analysis of a new level.If (2) holds,

International
Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B4, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia following S() is the storage space required by the model type .To compute it, the following formulas are used.S(DTMα)= S(metadata DTM α) + S(data DTM α) To georeference a grid, the minimum needed metadata are four geographic coordinates and two integers: typically, the X (LL_X) e Y (LL_Y) coordinates of the lower left node of the grid, the spacing in X and Y between the nodes (DX and DY) and the number of the nodes in X and Y directions (N_X and N_Y) are provided.Other ways can be adopted but the number of minimally needed metadata does not change.Moreover, a field should be devoted to the conventional identifier of no-data (ND).So, the following holds S(metadata GRID ) = S(LL_X)+S(LL_Y)+S(DX)+S(DY)+S(N_X)+S(N_Y)+S(ND) = 7×64 bits = 56 bytes S(data GRID ) = N_X × N_Y × S(height) = N × 8 bytes As TINs are concerned, the minimal model, without additional topological information, is discussed.S(metadata TIN ) = S(N_V)+S(N_F) S(data TIN )=S(data NODES )+S(data FACES ) = N_V×3×64bits+N_F×3× Ceil[log 2 (N_V)]bits N_V and N_F are the number of vertices and faces.The vertices are stored as 3D points (X, Y and height).The triangular faces are stored simply by the list of the three relevant vertices.Consider that k bits can address 2 k points: if the number of vertices is N_V, the size in bits needed for each label is Ceil(log 2 (N_V)), where Ceil is the rounding to the greater integer.DTM MR requires the storing of metadata relevant to the coefficients, that are needed to define the position and the resolution of each activated spline.Once defined the global interpolation domain (lower left and upper right corners), the record corresponding to a particular level is stored in the following way:

Figure 2 .
Figure 2. The original DTM Using the three training sets as raw observations, TIN and grid models corresponding to the height accuracies of 1, 2 and 5 m have been built.By construction, the training sets directly provide the DTM TIN at the different accuracy levels: in Table 2 the storage requirements of the DTM TIN are reported.To produce DTM GRID , five deterministic interpolation techniques have been tested: the Inverse Distance Weighting (IDW), the 1° Order Local Polynomial (POL), the Completely Regularized Spline (CRS), the Spline with Tension (SWT) and the Thin Plate Spline (TPS).The interpolations have been computed in ArcGIS, applying the parameters automatically optimized by the software itself.
Different levels of splines, corresponding to different halving steps, are considered.A new level halves the width of the support of the previous level.

Table 1 .
Statistics of the sampled datasets.Values in m.

Table 3 .
Characteristics of the interpolated grids.T: technique used to interpolate; TR: interpolated dataset; R: final resolution; NX and NY: number of nodes in X and Y; S: storage size; ResObs, ResCheck: statistics of the residuals on the used observations and the check points; M: mean; RMSE: root mean square error.

Table 4 .
Statistics of DTM MR .TR: interpolated dataset.NSplines: total number of activated splines.ResObs, ResCheck: statistics of the residuals on the used observations and the check points.M: mean.RMSE: root mean square error.S: storage size.

Table 5 .
Characteristics of the grids interpolated from TR1 with accuracy similar to DTM MR .T: technique used to interpolate; R: spatial resolution; NX and NY: number of nodes in X and Y; S: storage size; ResObs, ResCheck: statistics of the residual on the used observations and the check points; M: mean; RMSE: root mean square error.

Table 6 .
Storage size comparisons between DTM MR and DTM GRID , DTM TIN