A DTM MULTI-RESOLUTION COMPRESSED MODEL FOR EFFICIENT DATA STORAGE AND NETWORK TRANSFER

In recent years the technological evolution of terrestrial, aerial and satellite surveying, has considerably increased the measurement accuracy and, consequently, the quality of the derived information. At the same time, the smaller and smaller limitations on data storage devices, in terms of capacity and cost, has allowed the storage and the elaboration of a bigger number of instrumental observations. A significant example is the terrain height surveyed by LIDAR (LIght Detection And Ranging) technology where several height measurements for each square meter of land can be obtained. The availability of such a large quantity of observations is an essential requisite for an in-depth knowledge of the phenomena under study. But, at the same time, the most common Geographical Information Systems (GISs) show latency in visualizing and analyzing these kind of data. This problem becomes more evident in case of Internet GIS. These systems are based on the very frequent flow of geographical information over the internet and, for this reason, the band-width of the network and the size of the data to be transmitted are two fundamental factors to be considered in order to guarantee the actual usability of these technologies. In this paper we focus our attention on digital terrain models (DTM's) and we briefly analyse the problems about the definition of the minimal necessary information to store and transmit DTM's over network, with a fixed tolerance, starting from a huge number of observations. Then we propose an innovative compression approach for sparse observations by means of multi-resolution spline functions approximation. The method is able to provide metrical accuracy at least comparable to that provided by the most common deterministic interpolation algorithms (inverse distance weighting, local polynomial, radial basis functions). At the same time it dramatically reduces the number of information required for storing or for transmitting and rebuilding a digital terrain model dataset. A brief description of the method is presented and comparisons about the accuracy and data-store compression obtained with respect to other interpolators are shown.


INTRODUCTION
The concept of Digital Terrain Models (DTM, Li Z. et al, 2005) is relatively recent and a first, quite illustrative, definition is given by Miller and LaFlamme (1958): "DTM is simply a statistical representation of the continuous surface of the ground by a large number of selected points with known X, Y, Z coordinates in an arbitrary coordinate field".
Nowadays, the information provided by DTM's represents a fundamental database for Geographical Information Systems (GIS, O' Sullivan and Unwin, 2003) and is used in a large number of scientific and technical applications: geodesy, geomorphology, geology, seismology, hydrology, geophysics, civil and environmental engineering, territorial planning, remote sensing, mapping, etc.
Up to few years ago, DTM's were only used in specific applications of territorial analyses, typically by the scientific community.The coming and the diffusion of the new technologies based on Web GIS and virtual globes have changed the perspective: altimetric analyses and three dimensional representations of the terrain are both object of new researches and praxis.
At the beginning, virtual globes were simply conceived as tools to visualize satellite and aerial images, directly georeferenced on an approximate surface of the Earth globe.With respect to the traditional programs for the 2D representation, they have introduced the third dimension and, consequently, a simpler usage and a greater visual consistence between the digital representation and the real world.
At present, the new acquisition techniques provide information with a never seen accuracy.Virtual globes are no more merely qualitative viewers for low resolution global data, but can become scientific instruments to process and analyze high accuracy geographical information.
Virtual globes and Web GIS cannot be properly compared, but they share a fundamental principle: the geographic information (satellite and aerial images, height data, vectorial objects) is accessed via Web.Particularly, the servers provide data according to specific transmission standards that have been defined mainly by the Open Geospatial Consortium (OGC, 2006(OGC, , 2010a(OGC, , 2010b)).The data can be either images (WMS protocols) or matrices (WCS protocols) or vectorial data (WFS protocols) and are directly accessed, visualized and analyzed by the clients.
In the Web distribution of geographical information, the databases storage size represents a critical point.Given a specific interest area, typically the server needs to perform some preprocessing, the data have to be sent to the client, that applies some additional processing.The efficiency of all these actions is crucial to guarantee a near real time availability of the information.Particularly, virtual globes require the data more frequently than 2D viewers, because they operate a continuous reconstruction of the scene, according to the position and the view angle of the observer during the navigation.
Different techniques can be adopted to interpolate a DTM from the raw observations and different data models can be adopted to store and transmit a DTM.The purpose of this paper is to discuss storage and transmission requirements.Moreover, a complementary approach is studied and compared with the standard models.
Section 2 shortly introduces and compares the different ideas of data based and analytical DTM's.Sect. 3 describes the standard data based models, while Sect. 4 discusses the main problems involved by the implementation of an analytical model in a GIS software.In Sect.5, a new approach to model a surface by multi resolution bilinear splines is presented: particularly, its application to store a DTM is discussed.Sect.6 and 7 analyze the performances of the different models, by comparing their application to a case study.Sect.8 contains the conclusions and the future developments.

Data based and analytical DTM's
The new acquisition techniques, like LiDAR, SAR, etc., provide height information with never seen accuracy and spatial density (El-Sheimy et al., 2005).However, the raw datasets have size that prevent their use and analysis within desktop GIS, even more so in real time Web GIS and virtual globes.Once the required tolerance has been defined, the DTM is obtained from the raw dataset by some sampling or interpolation technique and is stored and distributed as contour lines, Triangular Irregular Network (TIN) or regular grid.These models can be defined data based because, despite their different data structures, describe the terrain surface by a set of representative heights.The heights of other points can be interpolated from the set.
On the contrary, our approach to model the heights can be defined function based, because it implies the storage of a dataset of coefficients that univocally, by a given function, allows the computation of the height everywhere.Particularly, the raw observations are interpolated by a linear combination of splines with compact support, whose resolutions and positions vary in space and are automatically chosen according to the raw data distribution.In the following they will be called multi resolution splines.For each spline, the resolution level, the position and the coefficient are stored by the server and are transmitted to the clients.These data allow the complete reconstruction of the terrain at any point and different detail levels can be provided, according to the required accuracy.
Generally speaking, the accuracy of a DTM depends on its spatial resolution.Therefore, to improve the accuracy, more data need to be stored and transmitted.In the 3D navigation, the visual inspection and the qualitative analysis do not need the highest accuracy.In this case, the transmission and processing of high resolution data is useless and time consuming.On the contrary, the accuracy of the input dataset is fundamental to guarantee good results to numerical processing aimed at quantitative analyses.
In a combined approach, data based models could be used to distribute low resolution DTM's if qualitative analyses are needed.On the contrary, analytical models could be used when more accurate data are needed for specific queries, processing and numerical analyses.The two different models require different architectures, particularly at the client level.A data based model requires a thin client, that simply receives the data and interrogates them, while a function based model requires a thick client, that must be able to execute specific algorithms to extract height information from the coefficients.

Data based DTM's: contour lines, grids and TIN's
In general, a surface is composed by an infinite number of points: a data based DTM is obtained by extracting a finite sample that represents the whole surface at a given accuracy.The sample constitutes the database that the server stores, uses in all the preprocessing and distributes to the clients.
The standard data based models are contour lines, grids (or elevation matrix) and Triangular Irregular Networks (TIN).Contour lines are obtained by connecting with a line all the points with the same height.The lines are drawn at given, equally spaced in height, intervals.Contour lines are useful to visualize heights on maps in 2D applications, but seldom are used for analysis purposes, and are stored and transmitted following the general rules of vector objects.Gridded DTM's (in the following DTM GRID ) are georeferenced as regular grids of nodes, whose heights are stored.The storage of a grid requires a set of metadata that allow its georeferencing: for example, the X (East) and Y (North) coordinates of the lower left node, the number of nodes and the gridding interval (spatial resolution) in both the directions are provided.The metadata are listed in the so called header, where also the value assigned to no data is reported.The heights are stored in an ordered sequence.
DTM GRID is a very simple conceptual model.Gridded data can be easily accessed, visualized and spatially analyzed by map algebra.However, the choice of the grid resolution is a crucial point.Given an interest area, a finer grid guarantees a better description of the terrain details but requires more space than a coarser grid.Particularly, the storage size is inversely proportional to the square of the gridding interval.If rough terrain (for example mountains) alternates to flat terrain (plains), the high resolution needed to accurately describe the first causes a useless redundancy in the second.To continuously describe the heights between the nodes, either a bilinear or a bicubic interpolation is typically applied.
In the TIN model, the DTM (in the following DTM TIN ) is described by a set of planar triangular faces that are obtained by connecting sparse points, whose horizontal coordinates and heights are given.Usually, the Delaunay criterion is applied to triangulate the points.By a TIN model, more points can be stored where the terrain is rough and less points are used in flat areas.Each point of a TIN is represented by its three (X, Y, height) coordinates.Moreover, to reconstruct the topology of the triangles, the labels of the three vertices of each triangle are needed.However, this simple data model requires long computation times for the processing and the analysis of the 3D surface.Therefore, in the practice more complex topological models are applied, like for example the node based, the triangle based or the edge based data structures.These models reduce computation times but require an overhead of information that is stored and transmitted to the clients.When a TIN model is used, the height within each triangle is linearly interpolated from its three vertices.

Interpolation techniques and analytical DTM's
The observations are interpolated to reconstruct surfaces.Several interpolation techniques exist.A first classification can be into exact and approximate interpolations.An exact interpolator passes for all the observations and allows the complete reconstruction of all the discontinuities existing in the dataset.However, the observation errors are not filtered and propagate into the model.A classical example of exact interpolator is given by the Inverse Distance Weighting (IDW).Approximate interpolators apply statistical methods to estimate a smooth function from the observations.In this way, the errors can be filtered and both the observations accuracy and the function correctness can be assessed.However, actual details and discontinuities can be lost in the smoothing.Local Polynomial (POL) is an exact interpolation if N polynomial coefficients are estimated from N observations.It is an approximate interpolator when the coefficients are fewer than the observations and are estimated by least squares.
In the deterministic interpolation, either exact or approximate, the analytical model of the surface is a priori chosen and the observations are used to estimate it: IDW and POL are examples of deterministic interpolators.In the stochastic interpolation (Christakos, 1992), the observations are considered as a sample of a random field (the surface) that is completely described by spatial stochastic properties like, for example, the covariance function.The stochastic properties are estimated analyzing the observations and then applied to interpolate the surface.Collocation and kriging provide examples of stochastic interpolators.
Despite their differences, all the interpolation techniques share the same principle: to interpolate the height in a point, the available observations are weighted by coefficients that are computed according to the specific technique rule.This is the principle both to interpolate DTM's from raw observations and, in grids and TIN's, to reconstruct heights between the DTM points.Our approach is complementary and is based on the idea that the coefficients of the interpolation functions can be directly stored and distributed to represent a DTM.This model can be called analytical.
Note that the most popular interpolation techniques, as reported in scientific and technical literature, cannot be easily and efficiently used to implement an analytical model because the functions cannot be described by a small number of parameters or coefficients.
In IDW and POL, the interpolation coefficients and domain are a function of the positions of both the interpolation point and the observations: to reply the model, all the observations must be stored and distributed.
Radial Basis technique uses a linear combination of radial functions that interpolate exactly the observations and are characterized by the minimum curvature.The different methods (Regularized Spline, Spline with Tension, Thin Plate Spline) differ in the function choice.These functions could be analytically described by a finite set of coefficients but the needed coefficients are at least as many as the raw observations.Let consider a stochastic interpolator, for example the collocation.The height in a point is provided by the h(t) = c T ξ, where ξ is the vector of the observations multiplied by the inverse of its covariance matrix, c is the cross-covariance vector between the point and the observations.c can be built by knowing the covariance function of the surface and the positions of the observations, while ξ needs to be stored.Also in this case, an analytical model would require as many data as the original observations.The classical bilinear splines estimated by least squares provide a twofold interpretation, because they can be thought as both data based and analytical models.Given the required spatial resolution, the raw observations are interpolated to estimate the coefficients of the splines.These coefficients are then used to predict the heights on a regular grid, that represents the data based model.If the splines and the grid have the same spatial resolution, the coefficients of the splines and the heights of the relevant grid nodes are equal.Moreover, the coefficients of the bilinear splines used to interpolate from the four neighboring nodes of a regular grid are exactly the relevant four heights.Indeed each local bilinear spline assumes the maximum in its node, while annihilates on all the other nodes.In this case, the analytical model has exactly the same complexity of the data based model.
The adoption of a new multi resolution splines interpolation has been studied, that represents a true analytical model and provides actual storage and distribution saving with respect to a data based model.It will be discussed in the following sections.

THE MULTI RESOLUTION SPLINES APPROACH
The interpolator here presented has been already discussed in (Brovelli andZamboni, 2004, 2009).It is an approximate deterministic method based on a least squares approach.It is a global method, i.e. each observation contributes to the whole interpolating surface, but at the same time it shows a relatively short range of diffusion of the local information.
We suppose that the h field has been sampled at n locations t 1 , t 2 , …, t n and we model these observations h o (t) by means of a suitable combination of bilinear spline functions (deterministic model) and residuals ν i seen as noise (stochastic model).
The main idea is to combine splines with different width in order to guarantee in every region of the field the resolution adequate to the data density, exploiting all available information implicitly stored in the sample.
Different levels of splines, corresponding to different halving steps, are considered.A new level halves the width of the support of the previous level.
In the mono-dimensional case, each observation can be described as a linear combination of spline functions of decreasing (halving) Δ width: ( ) where M is the number of levels, N h is the number of splines at level h (N h =2 h+1 +1), Δ = (t max -t min )/2.(t) ϕ is defined as follows: Constraints must be introduced on λ h,k coefficients in order to avoid local rank deficiency.A general solution of this problem is till now under study and then, to be cautious, at present we are applying the following criterion: a generic k-th spline function at h level ( ) where 2 is active (i.e.λ h,k ≠0) if: at least f (f>0) observations exist for each Δ h halfsupport of the spline; it does not exist a spline at lower level having the same application point.
The bi-dimensional formulation can be directly obtained generalizing the mono dimensional case.We suppose that h(t) = h(t 1 ,t 2 ) can be modelled as: ( ) where 1h Δ and h 2 Δ are the X and Y grid resolution; M is the number of different resolutions used in the model; τ lk = [l k] T are the nodes indexes (l,k) of the bi-dimensional grid; λ h,l,k is the coefficient of the spline at the grid node τ lk ; N 1h and N 2h are the number of X and Y grid nodes at the h resolution.
To avoid local rank deficiency, we generalize the same criterion seen in 1D (at least f observations for each quarter of spline support are needed).
With our model, the stored data are the number M of levels, the structure of the 0 level grid (corners and interval) and the coefficients λ h,l,k of the splines.

THE STORAGE REQUIREMENTS: A CASE STUDY
In order to evaluate the DTM storage requirement of a data based model, we sample data from a LiDAR survey; it is a promontory overlooking the lake of Como in Northern Italy.The horizontal spacing of the pre-processed grid is 2 m x 2 m and its vertical accuracy (Rood, 2004) is of about 20 cm.
The first step is to extract three different samples in order to simulate three dataset of sparse observations with different accuracy from which extract the relevant DTM's.
For this reason, four TIN's have been extracted from the grid, with different sampling tolerances.By fixing the tolerance equal to 5m, 2m and 1m, we have created respectively the training datasets TR5, TR2 and TR1 containing scattered data (i.e. the nodes of the TIN's) .By fixing the tolerance equal to 20 cm (and removing TR5, TR2 and TR1), we have finally created the test dataset TE to use for cross-validate the results.The original dataset is shown in Figure 1.In Table 1 the statistics of the datasets are reported.197.44 197.44 197.44 197.44 197 The following analysis is aimed to numerically quantify the space needed to store grid and TIN data based models.
Particularly, an occupation of 64 bits (8 bytes) is hypothesized for the horizontal coordinates and the height of a point.In the following S(α) is the storage space required by α.To compute it, the following formulas are used.

S(DTM)= S(metadata DTM ) + S(data DTM )
For grids, we have: Consider that k bits can address 2 k points: if the number of vertices is N_V, the size in bits needed for each label is Ceil(log 2 (N_V)), where Ceil is the rounding to the greater integer.
Using the three training sets as raw observations, TIN and grid models corresponding to the accuracies of 1, 2 and 5 m in height have been built.By construction, the training sets directly provide the DTM TIN at the different accuracy levels.In Table 3 the dimensions of the DTM TIN are reported.
To produce DTM GRID , five deterministic interpolation techniques have been tested: the Inverse Distance Weighting (IDW), the 1° Order Local Polynomial (POL), the Completely Regularized Spline (CRS), the Spline with Tension (SWT) and the Thin Plate Spline (TPS).The interpolations have been computed in ArcGIS, applying the parameters automatically optimized by the software itself.
If both the accuracy and the storage size of a grid have to be considered, the optimal compromise is given by the coarser grid that guarantees the desired accuracy.Therefore, different spatial resolutions have been tested in the interpolation, starting from 10 m and decreasing it by steps of one meter.The residuals of each resulting DTM GRID have been computed with respect to the starting TR and to the set of check points (TE).
For each interpolation function and for each starting TR, the coarser DTM GRID whose accuracy is consistent with the starting data set has been selected.In Table 2, the statistics and the characteristics of the final DTM GRID are reported.
In the interpolation of TR1 and TR2, POL function does not provide consistent accuracies, even pushing the resolution of DTM GRID to one meter.The other algorithms provide satisfactory results, the best being CRS that provides one meter accuracy already at the resolution of three meters.As expected, the TIN model requires less storage space than the grid model.The saving ranges from 72% to 97% for TR1, from 68% to 80% for TR2, from 59% to 67% for TR5.
Lev.: level; N_Nodes: number of nodes at the level; bits N : bits needed to store the number of active splines; bits IX and bits IY : bits needed to store the row and column indexes I X and I Y ; bits Cf : bits needed to store each coefficient; N_S k : number of active splines at the level.
Lev N_Nodes bits N bits IX + bits IY bits Cf . Space requirements for the multi resolution DTM.

Interpolation and storage size of the multi resolution model
By interpolating TR1, TR2 and TR5, the coefficients of the multi resolution splines model (DTM MR ) have been estimated.Then, the relevant accuracies and storage sizes have been analyzed.
DTM MR requires the storing of metadata besides the coefficients, that are needed to define the position and the resolution of each activated spline.Once defined the global interpolation domain (lower left and upper right corners), the record corresponding to a particular level is stored in the following way: Where N is the number of splines belonging to the level; I xi , I yi are the row and column indexes of the node occupied by the ith spline; c i is the coefficient of the i-th spline.At level M, the maximum number of rows (or columns) is I MAX =2M+1, while the maximum number of active splines is N MAX =(2 M +1)×(2 M +1).It is easy to show that Ceil(log 2 (2 M +1))=(M+1): the resulting storage requirements are reported in Table 4.
The statistics of the DTM MR 's obtained by interpolating T1, T2 and T5 are summarized in Table 5.For each TR, the total number of splines, the number of splines per level, the statistics of the residuals on the used observations and on the checkpoints are reported.Moreover the storage size in Kbytes is given.
The interpolations of TR1, TR2 and TR5 activate eight, seven and six levels of splines respectively.However, the final accuracies of DTM MR (TR1) and DTM MR (TR2) are not completely satisfactory: indeed, the RMSE's exceed one and two meters respectively and are worse than those of the respective DTM GRID 's.As previously stated, the adopted criterion (f=1 or f=2) to automatically activate the splines is conservative and, in these cases, provide sub optimal results.Experimentally, the manual activation of more splines allows a more accurate interpolation and improves the residuals.Moreover, it should be taken into account that the decimation criterion adopted to sample TR1, TR2 and TR5 from the original grid is optimized to build a TIN model and not to apply a multi resolution interpolation.At present, an optimized criterion for the automatic activation of the splines is under study.Moreover, new decimation algorithms will be applied.
To honestly compare the storage size required by different models, they should have the same accuracy.Except for the POL interpolation, all the final DTM GRID 's interpolated from TR1 are more accurate than DTM MR (TR1).In the storage comparison, coarser DTM GRID have been chosen, with accuracies similar to that provided by our approach.Even the final DTM GRID 's interpolated from TR2 have better accuracies than DTM MR (TR2) but, in this case, the coarser grids are worse.Therefore, for DTM MR (TR2) and DTM MR (TR5) the storage requirements are compared with the grids of

CONCLUSIONS
DTM's are obtained by interpolating raw height observations and can be stored in different formats.Typically, TIN's and grids are adopted, in which the interpolated heights are stored for a particular set of representative points.These models can be defined data based DTM's.The accuracy of a DTM is a crucial point for many applications.However, in Web GIS and virtual globes the DTM's are distributed by servers to clients: for these applications the saving of storage size allows a faster transmission and represents a strategic task.
In this paper a new approach has been presented to interpolate and store a DTM, aimed at saving storing size without losing in accuracy.Multi resolution bilinear splines are adopted to interpolate the observations and their coefficients are stored, instead of the interpolated heights.The coefficients can then be used to reconstruct the height at any point.The model is defined analytical instead of data based.
Data based models have been compared with our approach, considering accuracy and storage requirements.An original grid has been sampled to produce three TIN's with tolerances of one, two and five meters respectively.Then, the TIN's nodes have been interpolated by different deterministic techniques to produce grids at different spatial resolutions and the grids have been compared with the original data.Synthetically, different interpolation techniques provide similar results and the accuracy of the grids increases with their resolution: in particular, accuracies of one, two and five meters are obtained respectively with one, five and ten meters of spatial resolution.At present, our approach reaches an accuracy slightly worse than the accuracies provided by the finest grids.This problem is probably due to a particular interpolation choice that still needs to be deeply analyzed and optimized.However, for similar accuracies, our approach allows a significant storage saving with respect to the other models: indeed, its storage size is about 2% of the grids size and 5% of the TIN's.
These results are quite satisfactory and justify further studies finalized to define a complete scheme for the managing of the data in the server, for their transmission and for their use by the clients.

Figure
Figure 1.The original DTM LL_X)+S(LL_Y)+S(DX)+S(DY)+S(N_X)+S(N_Y)+S(ND) = 7×64 bits = 56 bytes S(data GRID ) = N_X×N_Y×S(height) = N×8 bytes LL_X, LL_Y are the X e Y coordinates of the lower left node of the grid; DX and DY are the grid intervals in X and Y; N_X, N_Y are the numbers of nodes in X and Y; N = N_X × N_Y; ND is the value used for no data.As TIN's are concerned, the minimal model, without additional topological information, is discussed.S(metadata TIN ) = S(N_V)+S(N_F) S(data TIN )=S(data NODES )+S(data FACES )) = N_V×3×64bits+N_F×3× Ceil[log 2 (N_V)]bits N_V and N_F are the number of vertices and faces.The vertices are stored as 3D points (X, Y and height).The triangular faces are stored simply by the list of the three relevant vertices.

Table 1 .
Statistics of the sampled datasets.Values in m.

Table 3 ,
for DTM MR (TR1) with the grids reported in Table6.The comparisons, in term of storage saving, are reported in Table7.The relevant considerations are reported in the conclusions.

Table 3 .
Characteristics of the interpolated grids.
T: technique used to interpolate; TR: interpolated dataset; R: final resolution; NX and NY: number of nodes in X and Y; S: storage size; ResObs, ResCheck: statistics of the residuals on the used observations and the check points; M: mean; RMSE: root mean square error.

Table 5 .
Statistics of DTM MR .TR: interpolated TIN.NSplines: total number of activated splines.ResObs, ResCheck: statistics of the residuals on the used observations and the check points.M: mean.RMSE: root mean square error.S: storage size.

Table 6 .
Characteristics of the grids interpolated from TR1 with accuracy similar to DTM MR .
T: technique used to interpolate; R: spatial resolution; NX and NY: number of nodes in X and Y; S: storage size; ResObs, ResCheck: statistics of the residual on the used observations and the check points; M: mean; RMSE: root mean square error.

Table 7 .
Storage size comparisons between DTM MR and DTM GRID , DTM TIN