TESTING THE IMPACT OF 2 D GENERALISATION ON 3 D MODELS – EXPLORING ANALYSIS OPTIONS WITH AN OFF-THE-SHELF SOFTWARE PACKAGE

Popularity and diverse use of 3D city models has increased exponentially in the past few years, providing a more realistic impression and understanding of cities. Often, 3D city models are created by elevating the buildings from a detailed 2D topographic base map and subsequently used in studies such as solar panel allocation, infrastructure remodelling, antenna installations or even tourist guide applications. However, the large amount of resulting data slows down rendering and visualisation of the 3D models, and can also impact the performance of any analysis. Generalisation enables a reduction in the amount of data however the addition of the third dimension makes this process more complex, and the loss of detail resulting from the process will inevitably have an impact on the result of any subsequent analysis. While a few 3D generalization algorithms do exist in a research context, these are not available commercially. However, GIS users can create the generalised 3D models by simplifying and aggregating the 2D dataset first and then extruding it to the third dimension. This approach offers a rapid generalization process to create a dataset to underpin the impact of using generalised data for analysis. Specifically, in this study, the line of sight from a tall building and the sun shadow that it creates are calculated and compared, in both original and generalised datasets. The results obtained after the generalisation process are significant: both the number of polygons and the number of nodes are minimized by around 83% and the volume of 3D buildings is reduced by 14.87 %. As expected, the spatial analyses processing times are also reduced. The study demonstrates the impact of generalisation on analytical results – which is particularly relevant in situations where detailed data is not available and will help to guide the development of future 3D generalisation algorithms. It also highlights some issues with the overall maturity of 3D analysis tools, which could be one factor limiting uptake of 3D GIS.


INTRODUCTION
Popularity of three-dimensional (3D) city models has increased massively in recent decades.While the early uses of 3D city models were mainly dominated by visualisation (Christen and Nebiker, 2015;Glander and Döllner, 2009), e.g.3D urban visualisations for disaster management (Kemec et al., 2010), as the technology developed, they have been useful for many purposes beyond visualisation (Biljecki et al., 2015), for example solar radiation distribution calculations (Hofierka and Zlocha, 2012), noise impact studies (Stoter et al., 2008), 3D Cadastre (Shojaei et al., 2017;Stoter and Ploeger, 2003), urban infrastructure planning (Herbert and Chen, 2015) and so on.3D City Models -showing details of buildings (indoors and outdoors), roads, parks, street furnitureare also fundamental to enabling Smart Cities.They link information such as traffic flows, pollution, tourism, utilities, infrastructure and public transport to address real world problems in a wide variety of disciplines.
While in theory the same 3D model could be used for these multiple applications, in reality for some applications a high level of detail is required whereas for others this is not the case.Different details are also required for different applications -a tourism application might focus on details about landmarks, whereas a solar panel application might require detailed roof structures.In order to make 3D city modelling efficient and reusable to address different user needs, all the information should be derived from one detailed sourcethis is known as generalisation.
A number of attempts have been made in the past to develop algorithms for the automatic generalisation of 3D buildings (see Section 2.2), However transitioning from a detailed threedimensional representation to a coarser one maintaining both geometric and semantic characteristics is still challenging, difficult and time consuming.The process of developing such an algorithm would be assisted by having greater clarity about the end goalsi.e.what the resulting generalised dataset should look like.As part of this, it is important to understand the impact of using generalised data with 3D applications.
To further this understanding, this paper presents an introductory study of the impact of the generalisation on the results of two algorithmsline of sight and shadow casting from the buildings.In the absence of commercially available 3D generalisation algorithms, traditional 2D cartographic generalisation operators are used, in combination with extrusion to answer the following question: What is the impact of generalisation on the results of 3D line-ofsight and shadow-casting algorithms?We consider this question from the point of view of the variation in analytical results obtained as well as by examining the impact on algorithm performance, and the results will: -Provide insight into the impacts of using a generalised dataset for these operations, which is particularly relevant for situations where more detailed data is not available.-Guide further work into 3D generalisation by providing a preliminary insight into the consequences of loss of detail on the algorithms tested.This in turn will inform decisions as to which elements of 3D data can be generalised without consequence and which are important to retain.-Additionally, as we will be using off-the-shelf tools for this task we will gain initial understanding related to the maturity of 3D algorithms.

Complexity of 3D Data
Several techniques and methodologies are available to create 3D models including processing data from photogrammetry, remote sensing or laser scanning (Lafarge and Mallet, 2012).When the requirements for detail are not very high, elevating the 2D footprint data to a given height, known as extrusion, is an efficient process to model buildings of a city in 3D (Ledoux and Meijers, 2011).However, even using this simple approach, 3D models can be complexe.g.Wong and Ellul (2016) note that an LOD1 model of Toronto contains 397,602 buildings and over 10 million vertices (covering 709 km 2 ) and the 3D model of the city of Berlin (LOD2) 537,208 buildings and over 10.5 million vertices (for 890 km 2 ).They also note, the method of capture is also highly correlated to the overall geometric complexity and size of the model (ibid).

Generalisation
Generalisation is a process that categorises features and excludes unnecessary detail in order to reduce visual complexity (Robinson et al., 1995).In traditional cartography, reducing reality to a given map scale while the most important features are emphasised makes generalisation necessary: distances, lengths and widths are shortened and adjacent objects get merged.
Generalisation can be said to have multiple purposesfirstly, to ensure that map information is presented in an understandable way for a map user -i.e. that the correct, relevant details are shown but that the map is not overcrowded as the scale changes; secondlyto create a dataset that is suitable for various analytical tasksbalancing between the computational complexity required to analyse very detailed datasets, and the loss of information (and hence accuracy of results) if the data is over-generalised (ibid).In all cases, generalisation also permits a 'create once, use many times' approach to datawhich maximises the return on investment in data capture.

Map Generalisation Steps
The first operation to be performed is the selection of necessary features and attributes, which will depend heavily on the purpose for which the new map is being created.Once this process is completed, drawing the objects at the given map scale is required -this is carried out by generalisation (Robinson et al., 1995) using a combination of the following steps: -Classification: ordering and grouping features by their type.
-Aggregation: substituting multiple features into a single one.
-Exaggeration: enhancing the important characteristics.
-Induction: deduction of the relationships among features.
Generalisation must preserve a harmony and balance between the retained and omitted data, always dependant to the level of scale of the outcome (ibid).

Conceptual Approaches -CityGML and LoD
Many three-dimensional city models are represented and exchanged in the open data model CityGML, based on the international standard for spatial data agreed by the Open Geospatial Consortium (OGC) (Gröger et al., 2012).The standard permits City Model representation not only through graphical means but also in relation to semantic characteristics, which are also considered for thematic applications.City models are represented using five different levels of detail (LoD0-LoD4).
Buildings and building parts are represented from LoD1 -a block model with a flat roof to -LoD4 -a building with detailed façades and interiors with windows and doors.Generalisation operators are important when converting the model from certain LoD to a lower one.

Previous Implementations of 3D
Table 1.Objectives of existing approaches.Kada (2002) developed an algorithm that first creates a constraint building model in where the faces of the model are grouped by coplanarity, parallelism and rectangularity constraints hierarchically.Afterwards, features such as extrusions are detected and their significance over the global look of the model is evaluated.The features of least significance are then eliminated and the constraint building model altered.Finally, a new location for the vertices of the constraint building model is calculated by least squares adjustment.In many cases overall complexity fell by between 30% and 50% (ibid).
In order to obtain a continuous generalisation for use in a mobile device with a small display, Sester and Brenner (2005) proposed elementary generalisation operations (EGO's) as the key tools for the generalisation of building ground plans and a typification of buildings.The sets of EGO's presented are applicable from a detailed level to a coarser one and also applicable in the inverse.Further reduction of detail is acquired with the operation of the amalgamation and operations to remove offsets, extrusions and corners.This is followed by a discrete process known as typification, which replaces objects with more 'typical' representations.Using this approach an object is gradually modified rather than being replaced as it moves from one level of detail to another.
An experiment on 3D generalisation based on a scale-space theory from image analysis is performed by Forberg (2007).Different representations at different scales are derived from an image by a scale-space and thus this theory is suitable to generate different levels of detail for 3D models.Forberg (2007) presents results relating to simplification of orthogonal structures and suggests squaring as appropriate for roofs and walls having nonorthogonal structures.
The algorithm presented by Fan et al. (2009) extends Sester and Brenner (2005), taking the semantic information into account in order to avoid deleting important features and the aggregation of polygons which belong to different objects, as well as applying a typification process.As a result, a notable 90% reduction of the storage space was obtained substantially from the original 3D model to the extracted shell, without losing the overall appearance and semantic information of the buildings.
Fan et al ( 2009) do not include edges and vertices in their process.This omission is addressed by Fan and Meng (2012) who describe a three-step process which starts with the exterior shell of the 3D building extracted using methods proposed by Fan et al., (2009).Roof structures are generalised separately and they take distortions of edges and angles of roof polygons when projected onto the ground into account, by simplifying these on a plane rather than at ground level.However, this has not yet been developed to allow the algorithm handle further extrusions beyond the initial extrusion.
More recently, Baig and Rahman (2013) extend the work of Sester and Brenner (2004) and Fan et al. (2009).Semantic information, height and positional accuracy of the 3D objects are considered in order to derive multiple LoDs and the simplification methods used -removal of intrusions, extrusions, offset and corners and aggregation of footprints-are based on neighbouring edges with semantic rules are imposed to avoid deleting important objects.The authors highlight the semanticbased removal of building parts is the strength of their algorithm for applications in where maintenance of important parts is the main priority.

Testing Performance Impact of 3D Generalisation
As noted above, one of the main outcomes of a generalisation process is to reduce the quantity of data required for analytical tasks.Ellul and Joubran (2012)  of the building and shadow casted by a building.They conclude that there is no optimal geometric structure suitable for multiple analysis processes and that errors and their distribution are caused by alterations in the configuration of buildings which makes them unique for each geometric reference (ibid).
The above review highlights the fact that, to date, relatively little work has been carried out to determine not only the performance impact of generalisation but the overall impact on algorithm results.In other words, while performance may be greatly improved in terms of time to execute an algorithm, is this at the cost of the 'fitness for purpose' of the results?The remainder of this paper describes first tests to assess this issue, for 3D analytical algorithms of sun shadow volume calculation and line of sight.

DATA
The dataset used for the experiment is the 'Ordnance Survey MasterMap Topography Layer, including Building Height Attribute'1 information supplied by the centre for digital expertise of the University of Edinburgh EDINA through their web mapping portal Digimap2 (© Crown Copyright and Database Right 2018.Ordnance Survey -Digimap Licence).It provides height attributes to the buildings within the OS MasterMap Topography Layer obtained from the OS Digital Terrain Model3 , using Ordnance Datum Newlyn as the national height datum within the OSGM15 National Geoid Model and British National Grid spatial reference system.The heights used in this experiment is the difference height value between the absolute ground level and the highest part of the roof of a building.
An absolute accuracy of 0.9 m and a relative accuracy of +/-1.1 m is applicable to the horizontal accuracy of the dataset as it is based on the 1:1250 OS MasterMap4.In regards of the vertical accuracy, this is again dependent to the source of the data used to obtain the vertical values, which in this case are the DSM and DTM.Because this dataset is still in a BETA state and subject to update, confidence levels are provided instead of accuracy values.
The study area is a 5 km by 5 km dataset of buildings from Greenwich and Canary Wharf in South East London which has been selected due to its mix of low level suburban housing and taller buildings (Figure 1).

Generalisation and extrusion -Full Dataset
The generalisation process applied to the original data (5 km by 5 km) is composed by an aggregation of buildings, followed by a simplification process.ESRI's ArcGIS software was used for both operators.Extrusion of the results in ArcScene5 is held prior to the spatial analyses.

Aggregation
The first step is the aggregation6 of the raw data with a tolerance of 1 m, a minimum size of 25 m 2 for every building and an internal minimum hole size of 25 m 2 .When inspecting the resulting dataset, a flaw with this tool was detected: while the buildings are correctly aggregated, holes smaller than 25 m 2 are retained.This issue is resolved by repeating the aggregation process, as shown on Figure 2.For each aggregated polygon, the average height is calculated by summing the height values for each building and dividing by the number of buildings.Both original and aggregated buildings are translated using SafeSoftware's 'FME Data Inspector' to a PostgreSQL database using the PostGIS extension in order to calculate the average height of the resulting building blocks.

Simplification
The building polygons obtained from the second round of aggregation process are simplified by ESRI's tool 'simplification of buildings'7 using a minimum area of 25 m 2 and a tolerance of 1 m and 5 m (Figure 3).

Extrusion and visualisation
Simplification completes the 2D generalisation process.3D visualisation of the original data is obtained by extruding8 the building polygons to their height attribute and generalised datasets to their average height attribute calculated in steps above (Section 4.1.1)using ESRI's ArcScene software.

Reducing the Dataset Size
Once three-dimensional buildings are obtained by extrusion, the impact of the generalisation can be investigated in the selected use cases.However, in the first trial of the spatial analysis, the chosen ESRI's tool could not deliver any result due to the large amount of data (57999 polygons) and the characteristics of the computer used.As a solution to the issue, a reduced 1 km by 1 km area is selected maintaining the same essence as the original dataset: a mix of buildings with different heights.
Figure 4 shows a 2D maps of the buildings in the 1 km by 1 km reduced dataset in the original situation (a), data aggregated using a 1 m tolerance (b), simplified using a 1 m tolerance (c) and simplified using a 5 m tolerance (d).The three-dimensional visualisation of the original data is carried out as above (Section 4.1.3).

Accuracy measurements
In order to determine the accuracy of the information derived from the generalised dataset when compared to the original model, four measurements are proposed as indicators of change: centroid shift, area, perimeter and volume change.

Centroid shift
The centroid of every polygon is calculated in ArcGIS for the original -  (  ,   ) and generalised -  (  ,   ) dataset and translated into a PostgreSQL database using FME.In PostGIS, the function 'ST_Intersects' 9 is used to evaluate which original polygons have been aggregated into each generalised polygon.The outcomes are translated using FME into a shapefile in order to calculate the difference in X and Y coordinates between the calculated of the original polygons and that of the resulting aggregated polygon (Figure 5b), and therefore, the distance between them as:

Area, perimeter and node count change
A similar procedure is carried out to evaluate changes in area, perimeter and nodes of the real buildings compared to the generalised buildings.The area, perimeter, and node counts are obtained as:

Sun Shadow
The first spatial analysis carried out with the finalised 1 km by 1 km extension 3D models is the shadow casting of each individual building by using the ArcScene tool 'Sun Shadow Volume' 11 on both raw (Figure 6a) and generalised (Figure 6b) datasets.The date and time fixed is 2018-05-15 at 15:30.
Run time of the analysis is recorded in order to evaluate the impact of the generalisation.In addition, the volume of each building is calculated with the tool 'Add Z information' 12 which obtains the total sum of volume of the original and generalised 3D models.The differences between the two scenarios are calculated as:

Line of Sight
The original and generalised 3D models are used for the second spatial analysis where a hypothetical mobile antenna mast is installed on the roof of a prominent building.The position of the antenna in both models is the same: XYZ (537957.000,177630.000,67.100).Sight lines from the antenna are first determined by using the ArcScene tool 'Construct Sight Lines'13 and the visibility from those is defined by 'Line Of Sight'14 (Figure 7).In addition, a point feature class is created with the information from where the line of sight has been first obstructed against the 3D models.The geometry enclosing all those points is generated, in other words, a convex hull and its area determines the visible surface from the observation point.The interval the software needs to explore the spatial analysis is recorded in order to see the performance:

Generalisation results -Full Dataset
The geographic information of the original 5 km by 5 km dataset has been reduced significantly, i.e. 57999 buildings were generalised into 9879 buildings.Table 2 summarises the values each new dataset obtained throughout the generalisation steps.Table 2. Obtained values throughout the generalisation process to the total original 5 km by 5 km dataset.
The outcome of comparing each generated dataset with the original data is presented in percentages in Table 3 in order to further examine the impact of the generalisation process.
Negative values represent a decreases in information while the positive values mean an increase, e.g.comparing the total area of all buildings within the original dataset with the total area of the most generalised dataset results in an increase of 1.52 %, due to their replacement with larger polygons after simplification.
Polygons and node counts are reduced by approximate 83% while the total volume (obtained from the sum of volume of every single building in 3D) is reduced by around 15% due to the change on the height of the buildings on the aggregation step.

Generalisation results -Reduced Dataset
As the 5 km by 5 km dataset is minimized into a 1 km by 1 km in order to test the impact of the generalisation when realising some spatial analyses, new results from this smaller version of raw data were obtained in every generalisation operator (see Table 4).Table 4. Obtained values throughout the generalisation process to the total minimized 1 km by 1 km dataset.
Taking an identical approach to that used for the whole dataset, the results obtained in every step are compared with the new reduced dataset and shown as percentages in Table 5.Similarly, to the whole 5 km by 5 km dataset, polygons and nodes are reduced by circa 82% while there is less impact on area and volume.Table 5. Reduction/addition of data in percentages after generalisation to the total original 1 km by 1 km dataset.

Accuracy measurement results
One of the most interesting factors which determines how much the geographic information suffered a change is the distance between the geometric centre of the original centroids and the centroids on the generalised dataset (for the whole 5 km by 5 km study area).The shift of these centroids has been plotted as the frequency of the values obtained for both the differences of the coordinates in X, ΔX (Figure 8a) and in Y, ΔY (Figure 8b) on a logarithmic log-10 scale.Examining an extreme case, Figure 9 displays the calculated geometric centre from the original polygons' (no.3012 -left) and the generalised polygon and its centroid (no.5912 -right).
Visualisation of both together (right) evidences the existence of a large centroid shift, the maximum between all in this case.In addition to the centroid shift, statistical results for the remaining measures (area, perimeter and number of nodes change) are presented in Table 6.

Sun Shadow results
At a glance, the generated sun shadow volumes look visually correct.However, navigating throughout the three-dimensional space a flaw in the tool is detected: shadow casting continues underneath terrain (Figure 10).Consequently, the obtained volume values are larger than expected as they include the volume below ground.As this issue is consistent in all tests, it is still relevant to measure overall volume change due to generalisation as all the volume measurements will include the extra volume.

Line of Sight results
Using the generalised dataset, the total area where the visibility of sight lines is not interrupted is increased by 19.81% (ΔAc = 32641.47m 2 ) from the original 3D model (AcO = 164792.26m 2 ) to the generalised one (AcG = 197433.73m 2 ).The process runs 7.82 times faster (Δtl = -10651.29s) the generalised model (tlG = 1560.96s) than the original model (tlO = 12212.25s).

DISCUSSION
This paper addresses the question: What is the impact of generalisation on the results of 3D line-of-sight and shadow-casting algorithms?From the results it can be seen that applying 2D generalisation operators to a dataset prior to extruding it into a 3D model involves a loss of detail in terms of the model (14.87% less volume) while performance in the spatial analyses algorithms is improved significantly: 7.82 times faster for the line of sight and 6.32 times faster when casting shadows.The overall increase in footprint (area) results in a decrease in shadow volume of over 11%, which could be fairly significant when estimating, for example, the potential for solar panel energy across a city.Similarly, generalised data resulted in a nearly 20% increase in visible areas for line of sight for a hypothetical mobile mast.Given the importance of the accuracy of such calculations in particular with emerging 5G technologythis suggests that mobile phone companies should use detailed data where possible, even though this comes at greater computational cost and potentially at greater purchase cost.
This study reflects previous findings by Ellul and Joubran (2012) and highlights the potential of displaying larger 3D datasets and using them more efficiently for comprehensive 3D spatial analyses.This can be particularly useful in situations where detailed representation is not needed or where more detailed data is not available.However, whether using generalised or original data for each spatial analysis depends on the user case scenario and its detail requirements, which confirms findings by Biljecki et al. (2016) that one size does not fit all when it comes to generalisation.
Issues encountered when using off-the-shelf tools for the spatial analyses (specifically problems when running line of sight using a large dataset and the casting of shadows belowground) demonstrate that further work is still required before such tools can be considered fully mature.This may currently be one factor that is limiting the uptake of 3D GIS specially when working with larger datasets.The study also highlighted the lack of commercially available 3D generalisation tools, and while 2D generalisation with extrusion was used as a workaround, the results obtained were consequently limited to the LOD1 datai.e.flat roofs.As this approach also involved an average height for a block, local over-and-under estimations in height could be significant.

CONCLUSION AND FURTHER WORK
The work carried out above highlights the importance of 3D generalisation, in particular as newer 3D analysis algorithms come on stream; understanding the impact of detail versus generalisedand which components of a model should be kept and which could be generalised in each application case is importantboth for performance and for analysis.
As more extensive and more detailed datasets become available e.g.OS has introduced height information for all buildings in GB -the presented exploration of 3D generalisation is particularly important when up and coming requirements for 3D are considered -specifically smart cities, and the need to locate sensors in 3D space and perform analysis based on their location.For instance, the impact of line of sight is of special interest for 5G telephony.
Further work is suggested in order to understand different user case needs in depth, and in their respective application domains.Technically, 3D generalisation is not a solved problem, and there is much room for improvement even within the above approach -for example, it may be possible to take the height of individual buildings into account when creating the blockand if the height is over a certain percentage above the meansuch as a church towerthen keep the original building.
Limitations with the software used suggest need for further work in order to generate robust tools that will in turn encourage uptake of 3D analysis.

Figure 1 .
Figure 1.3D Buildings of different heights 4

Figure 3 .
Figure 3. Simplified data (tolerances of 1 m left and 5 m right)

Figure 4 .
Figure 4. Original and generalised datasets in a larger scale

Figure 5 .
Figure 5. Centroid change control.The overall coordinates of the geometric centre of the centroids on those original polygons (Figure 5a) are computed with the function 'ST_Centroid' 10 averaging the X and Y coordinates of the vertices involved:   (  ,   ) = (  1 +  2 +⋯+ between centroids ΔX = Centroid difference in X axis ΔY = Centroid difference in Y axis XC, YC = coordinates of calculated geometric centre XG, YG = coordinates of the centroid on the generalised dataset Overall statistics were then calculated for the values obtained, and specific cases showing significantly large shifts explored in more detail. where AC, PC, NC = Calculated area, perimeter and number of nodes from the original polygons involved.AO, PO, NO = Area, perimeter and number of nodes of each polygon on the original dataset  = number of original polygons involved The differences are obtained as: ∆   =   -  (4) 10 http://postgis.net/docs/ST_Centroid.html11 http://desktop.arcgis.com/en/arcmap/10.3/tools/3d-analysttoolbox/sun-shadow-volume.htmhttp://desktop.arcgis.com/en/arcmap/10.3/tools/3d-analysttoolbox/add-z-information.htm∆   =   -  ∆   =   -  where AG, PG, NG = Area, perimeter and number of nodes of the polygon on the generalised dataset ΔA = Area difference ΔP = Perimeter difference ΔN = Number of nodes difference 5) ∆   =   -  where VO = Total volume of the resulting shadows from the original 3D model VG = Total volume of the resulting shadows from the generalised 3D model tsO = Analysis duration for the original 3D model tsG = Analysis duration for the generalised 3D model ΔV = Total volume difference Δts = Time/duration difference
6) ∆   =   -  where AcO = Convex hull area on the original 3D model AcG = Convex hull area on the generalised 3D model tlO = Analysis duration for the original 3D model tlG = Analysis duration for the generalised 3D model ΔAc = Total area difference Δtl = Time/duration difference 5. RESULTS

Figure 8 .
Figure 8. Histograms representing the frequency of the observations appearance on y axis for ΔX (a) and ΔY (b) shifts on x axis, on a log-10 scale.For an easier visual interpretation, the frequencies have been split in 35 bins of 1.8 bin width.
in spatial analyses.They create a range of different building models in LoD1 and LoD2 based in different geometric references and they investigate three different spatial analyses with each model: area of the building envelope, volume

Table 6 .
Statistical analysis-Measures of Change.