GENERATIVE ADVERSARIAL NETWORKS TO GENERALISE URBAN AREAS IN TOPOGRAPHIC MAPS

: This article presents how a generative adversarial network (GAN) can be employed to produce a generalised map that combines several cartographic themes in the dense context of urban areas. We use as input detailed buildings, roads, and rivers from topographic datasets produced by the French national mapping agency (IGN), and we expect as output of the GAN a legible map of these elements at a target scale of 1:50,000. This level of detail requires to reduce the amount of information while preserving patterns; covering dense inner cities block by a unique polygon is also necessary because these blocks cannot be represented with enlarged individual buildings. The target map has a style similar to the topographic map produced by IGN. This experiment succeeded in producing image tiles that look like legible maps. It also highlights the impact of data and representation choices on the quality of predicted images, and the challenge of learning geographic relationships.


INTRODUCTION
The representation of geographic data depends on the scale of the map. The adaptation of a detailed dataset for a representation at smaller scales is called map generalisation. In urban areas, the information is dense, and each map object has many relations with its neighbours (proximity, orientation, alignments...). Consequently, the generalization of these areas is a challenging task (Ruas and Mackaness, 1997). The main elements of those maps are buildings, roads, and rivers. Usually, roads and rivers are generalised first, because they partition and structure the space (Ruas and Mackaness, 1997). Then, the buildings have to be enlarged and simplified to be legible. As the space in each block is limited, the buildings also have to be displaced, typified (density is reduced while preserving patterns), amalgamated... Previous work has demonstrated the potential of deep learning for the generalisation of two important map elements in urban areas: buildings (Feng et al., 2019) and roads (Courtial et al., 2020a). These projects used a segmentation convolutional neural network to determine which pixels of the image belong to a generalised object. However, the amount of object reduction and relations between objects (e.g., proximity, alignment,...) were not considered. The challenges of deep learningbased generalisation are manifold, but the first step is to address a more global approach with multiple layers at the same time in the map. This paper presents some first experiments to test the suitability of generative adversarial networks (GAN) for this task. It is a step towards the generation of complete generalized maps.
In this article, we first present a review of techniques for urban area generalisation and past experiments on geographic information representation using deep neural networks. Then, we present our use case and experiment settings. The results are presented in Section 5 and finally we discuss the benefit of our experiment in Section 6. * Corresponding author 2. RELATIVE WORK

Building Generalisation in Urban Areas
In this section, we mainly focus on building generalisation. First, to be legible, buildings have to maintain a minimal size and their shape needs to be simplified, so many algorithms were proposed to simplify the shape of buildings since the seminal building simplification algorithm from Ruas (Ruas, 1988). Some propose to transform the building into a raster grid (Hui-lian et al., 2005) to use morphological operators (dilations and erosions) on pixels. Others use the skeleton of the polygon to generate a minimal, simplified geometry (Meijers, 2016;Lupa et al., 2018). Finally others adopted strategies based on optimisation (Haunert and Wolff, 2010) or machine learning (Cheng et al., 2013) to find the edges that should be simplified.
Enlarging and simplifying buildings is not enough because these operations cause many overlaps between the buildings themselves, and between buildings and road symbols. The classical generalisation to solve this problem is the displacement of buildings. Several algorithms were proposed, with two main categories: the iterative approaches where buildings are displaced one by one Ruas 1998;Aslan et al. 2012;Liu et al. 2014, and global approaches where the algorithm searches the optimal position for all the buildings in a block (Gaffuri, 2009;Ai et al., 2015;Li et al., 2020).
The main issue in information reduction is to preserve the initial pattern distribution. A pattern is a set of buildings, it is characterized by its regular repeated arrangement. In urban areas, the most common are alignments, grid-like patterns, or clusters characterized by a certain proximity, similarity, and continuity of buildings. Consequently, most of the approaches for pattern preservation focus on constructing and analysing an adapted proximity structure, and then determine a class for the building group (Christophe and Ruas, 2002;Zhang et al., 2013;Wei et al., 2018;Wang et al., 2020). Then, when the pattern is characterized, typification algorithms can be applied to simplify the pattern while preserving its structure (Regnauld, 2001;Burghardt and Cecconi, 2007;Wang and Burghardt, 2019) . Some other approaches typify buildings without properly defining patterns (Bader et al., 2005;Basaraner and Selcuk, 2008). (Deng et al., 2017) carried out a comparative study of several methods for recognizing building groups, some using proximity only, and others based on multiple grouping principles, and the study concludes that (1) when only proximity is considered, the buffer analysis approach performs significantly better than other approaches; (2) when multiple grouping principles are considered, the local constraint-based approach usually performs better than other approaches; (3) existing approaches that consider similarity and/or continuity improve the performance of building grouping.
Finally , built-up area polygons can additionally be introduced to cover areas where solving these constraints is not possible (Touya and Dumont, 2017). The amalgamation or aggregation of the building permits to cover several complex buildings with a simpler one or a global built-up area. While covering simply uses the geometry of the dense block, some more precise building amalgamation algorithms were proposed, for instance using Kohonen self-organising maps (Allouche and Moulin, 2005), or using morphological dilations and erosions on buildings and roads (Regnauld and Revell, 2007).
Although all these algorithms are necessary to apply atomic transformations of the cartographic data, the main issue of the automated generalisation of urban areas is the orchestration of these algorithms. Many different generalisation models that try to combine these algorithms to generalise an urban area, have been proposed in the past years. Optimisation based models have been proposed to find the optimal sequence of operations for a given map, with varying optimisation methods: finite elements (Hojholt, 2000), least squares (Sester, 2000;Harrie and Sarjakoski, 2002), simulated annealing , or genetic algorithms (Wilson et al., 2003). This orchestration can also be achieved with multi-agent systems where map features are autonomous agents (Barrault et al., 2001;Sabo et al., 2008), but machine learning is also a valid approach (Burghardt and Neun, 2006), as most urban blocks can be generalised with the same sequence of algorithms. Finally, purely heuristic approaches with a workflow of algorithms is sometimes sufficient (Yu et al., 2021). In this paper, our take on orchestration is completely different, as GAN are not supposed to learn to the optimal sequence of algorithms, but to learn how to directly generate the generalised output.

Deep Learning and Cartography
Machine learning aims to extract knowledge from examples to learn a data representation, and deep learning is the current prominent machine learning technique. So, map generalisation fulfils two main theoretical conditions to make relevant use of deep learning (Touya et al., 2019a): (1) it is possible to model map generalisation as a deep learning problem, (2) the large amount of maps at several scales guarantees the availability of training sets . Moreover, deep neural networks succeeded in a similar task of simplifying the content of an image (Simo-Serra et al., 2017), consequently we assume that a simplification task like generalisation could be resolved.
Several objectives can be achieved using deep neural network. For example, classification networks can contribute to geographical data enrichment (Touya and Lokhat, 2020), which is usually the first task of a map generalisation process (Mackaness and Edwards, 2002). These classification networks can also be used for the classification of types of maps (Zhou et al., 2018), or can be used for the selective omission in a road network (Zhou and Li, 2016). Segmentation networks can localize pixels that belong to a generalised object given the image of the map before generalisation (Courtial et al., 2020a;Feng et al., 2019;Jenny et al., 2020;Du et al., 2021). Generative adversarial networks (GANs) are another deep learning architecture that can be interesting for map generalisation, as they have shown potential for style transfer on maps (Kang et al., 2019), and were employed for building shape generalisation (Kang et al., 2020), although the results are not convincing yet. GANs combine a generator and a discriminator to generate an image in a target domain from an image in another input domain . However, these networks are not specifically designed for geographic data, so adjustments to the default architectures are required: for instance, constraints to preserve the shape of the cartographic symbols can be introduced in the loss function of the network (Fu et al., 2019). Finally, graph convolutional networks can also be employed to learn structures and patterns in a geographic network: for instance to classify the building pattern in a block (Yan et al., 2019), or to encode the shape of a building with a graph made of its vertices (Yan et al., 2020).

Data-set
Our goal is to generate maps of urban areas at a medium-large scale. In particular, the input data are detailed maps at 1:25,000, whereas the target scale is 1:50,000. These maps mainly contain buildings, roads, rivers, and vegetation, and the representation of these elements should maintain the adequate level of detail to be legible and prevent symbol overlap. We use as target two alternative of 1:50,000 scale maps, which are supposed to be one of the three progressive intermediate representations between 1:25,000 and 1:100,000 scales, proposed by Touya and Dumont (2017). the target generalisation is available for an area of 30* 15 kilometres at the east of Saint-Jean-de-Luz in the south west of France. One representation is achieved using an agent-based model (Barrault et al., 2001) and the second by typification-based method (Burghardt and Cecconi, 2007). A cartographic vector dataset is first used to generate the images.

Target style
The target style is adapted from the Plan IGN map presented in Figure 1. In this map style, the roads are symbolized with a bordered line with size and color varying according to the importance, and buildings are represented in brown or gray according to their nature. For the sake of simplicity, we decided to preserve a unique symbol for each road (the yellow one) and building (the brown one). The covering of dense blocks (or graying) is done using a light brown in plan IGN, and the same color is used in our target scale images. Finally, we observe that the two browns used for buildings and built-up areas are very close, and first experiments showed that the GAN sometimes confused both elements. So we decided to enhance the contrast between the buildings and the covered blocks, to facilitate learning. Moreover, the building outlines are not represented in order to reduce overlaps between symbols.

Generalisation constraints
The objective of the generalisation can be described using constraints (Beard, 1991). Usually, linear elements are generalised The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2021 XXIV ISPRS Congress (2021 edition) first as they are structuring elements of the urban area. The generalization for roads and rivers focuses on density reduction while preserving the road network topology. Then, the buildings are generalized, and have to satisfy the following classical constraints: • (C1) Buildings should be bigger than a minimum size; • (C2) the smallest edge of the buildings should be greater than a minimum value (granularity constraint); • (C3) The buildings should not be too close to the roads symbols; • (C4) The buildings should not be too close to each other; • (C5) The density of buildings in a block should remain stable; • (C6) Building patterns, such as alignments, should be preserved; • (C7) Topological relations have to be preserved, for instance buildings should remain in the same block.
Satisfying all these constraints is often not possible and a good solution is a balance between preservation and legibility. Both target datasets (agent-based and typification-based) represent a certain balance in the resolution of those constraints. But when it comes to deep learning, these constraints are not specified during the learning process, but they will be used to assess the quality of the output of the deep learning models. In this paper, the constraints will not be automatically assessed (Courtial et al., 2020b), but we will use them for the visual assessment of the results.

Tiles Creation
In this section, we present the process we used to create image tiles from vector cartographic databases. The creation of the training datasets is an important step in deep learning because it should efficiently illustrate the target knowledge (Touya et al., 2019b). We created square tiles that represent the input data (roads, buildings, and rivers) from the 1:25,000 scale map, with the style of the target 1:50,000 scale map.The output images cover the same area, with the same style, but we used the roads, buildings, and rivers from 1:50,000 scale map. This tile size is 512x512 pixels, which represents 500x500m². These dimensions guarantee a legible situation for both input and target data, and it is small enough to build an example set with around 2.700 images with our test area. We randomly extract 100 tiles in order to evaluate the model.
First experiments showed that learning the selection of important roads and rivers was too hard a task for now, so we decided to change the input images, and only include the roads and rivers from the 1:50,000 data.

Deep Neural Network
Generative adversarial networks seek to learn to predict an image that "looks like" a target domain. They combine a generator that predicts a new image and a discriminator that determines if the prediction is realistic enough. Both of them work in an adversarial way to make the prediction more realistic.
GANs can be supervised or unsupervised. In supervised learning, the training dataset is made of pairs, while in an unsupervised approach, we only need a set of inputs and outputs, and the network learns how each domain looks like, to perform the translation. Supervised networks are able to learn more precise relations between input and output when the domains are similar or very close, but when they are too different it is unable to produce an optimal result. In our case, we believe that the information preservation should be better learned by supervised architecture, while the block could imply too important changes for the supervised network. Finally, we assume that the legibility increase can be learned from both methods as they only need to learn how each domain looks like.
We tested one network for each approach to assess their suitability to urban area generalisation. First, pix2pix is a GAN designed for generic image-to-image translation with paired data . Then, CycleGAN is an unsupervised network that learns the transformation from domain B to A, together with the transformation from domain A to B . We have seen in Section 2 that both architectures have already been used with images of maps.

RESULTS
In this section, we present separately the results for most of the images, and the results for the images in dense areas where block covering is necessary, because the expectations and generalisation mechanisms are really different. We used the default PyTorch implementation of pix2pix and CycleGAN, and trained them with default parameters.

In general case
Some result images for both supervised (pix2pix) and unsupervised (cycleGan) experiments are presented in Figure 2. The first column presents the input images, the second and third columns present the predicted image for each approach, and the last column presents the target generalisation. For the experiments presented in this figure, we use the agent-based generalisation as the target generalisation to reproduce. In general, the predicted images do look like the target map, the style is consistent with our expectations, and the predictions are credible images of maps. For those situations, without block covering, supervised (pix2pix) and unsupervised (CycleGAN) results are really similar in terms of quality.
We can more precisely visually evaluate these results considering the constraints defined in section 3.3. First, at the individual building (or micro (Barrault et al., 2001)) level, we observe that the building size constraint (C1) is always satisfied, buildings are large enough, even a bit larger than in the reference. But the granularity constraint (C2) is a little less satisfied: the shape is simplified, but it is sometimes blurred, which is consistent with past attempts to generalise buildings with GANs (Feng et al., 2019;Kang et al., 2020). We can also observe that some rectangles can be distorted and have an unrealistic shape. For example, in line 5 of Figure 2, the first image predicted by pix2pix contains an unrealistic triangle building at the bottom left, and an inconsistent L-shaped building at the top. In the second image predicted by CycleGAN, unexpected courtyard appear in the large buildings at the top. Moreover, some buildings are too simplified and lose their distinctive shape: e.g., a T-shaped building transformed into a rectangle in the prediction of line 3, and a L-shaped building transformed into a rectangle in the prediction line 1.
The global legibility of images is satisfying: most of the buildings are separated enough from the roads (C3) and do not overlap each other (C4). Building density is well preserved (C5), as there are less buildings where enlargement would have caused a density problem. Remaining overlap problems may be induced by similar errors in the target images (see for instance the reference image on the first line of Figure 2). Then, the most im-portant challenge is the preservation of relations between different geographic objects (C6 and C7). For the road-building and river-building relations, the relative orientation, topology most of the time, and proximity are respected. However, we do observe some building-road overlaps (e.g., at bottom of the predictions on line 2 and at the top right of the predictions on line 3), and some inclusions in a small block disappear (at the top of the prediction image on line 1, in the middle of the pix2pix prediction on line 4).
However, the preservation of building patterns is not correctly achieved. For instance, the alignment on top of the image predicted by pix2pix on line 5 disappears. Indeed, structure preservation is not a priority in this agent-based generalisation (Dumont, 2018-06-18), so a GAN trained on the tiles produced using this method is not able to learn the structure preservation correctly. We verified that GAN can learn this kind of relation by testing the same experiment on images produced using a typification algorithm in addition to the individual building generalisation provided by the AGENT model. This process focuses on reducing density while preserving structure.
Figure 3 present predictions and targets of the model trained with the agent-based and the typification-based generalisation methods. We observe that alignments and other building structures are better preserved by the typification-based method, however, the shapes are less regular and the minimum separation between buildings is more frequently not respected. In map generalisation, there is often does no unique good solution, as most of the time it is not possible to satisfy all constraints at the same time, and compromises are necessary. Different algorithms will focus on resolving different constraints and give different results. For now, the network learns to reproduce these strategies and privileges the resolution of the same constraints as the generalisation used for the target data.

Cases with Block Covering
The task of transforming dense urban blocks into built-up areas is an important transformation, and unsupervised learning seems to be more adapted for this problem (Figure 4). It produces clear grayed areas while the supervised network tends to erase most of the buildings, but does not apply gray area in the whole area. Moreover, for some images, the predicted covering corresponds exactly to the target (CycleGAN prediction in line 1 of Figure 4), while for some other examples, unexpected parts of built-up areas are covered, and some information is lost (predictions at lines 2 and 3 of Figure 4).
We believe that there are three possible causes for this problem: 1. The image is not sufficient to learn block covering. The density and shape of the buildings in a block are important clues for a block covering decision, and both are visible on the images. However, they may not be the only factors. For example, on line 3 the central block needs to be covered while the block up to it has a similar aspect and does not. The covering decision might be due to touching blocks outside the image, which are also covered.
2. Image tiles might not show the complete block. The covering is applied on a complete block but nothing guarantees that the image shows the complete block. For example, in line 4 the graying is missed because only a short part of the expected covered area is visible and this short part is not really dense.  The evaluation of GAN results is an important problem in the deep learning research community, as the pixel accuracy is less important than the global aspect of images. Moreover, it often deals with problems where a good solution does not exist or where the good solution is not unique. This is the case for map generalisation: several different generalisations can be acceptable. There is no global measure for generalisation quality (Touya, 2012), and there is no way to measure how much an image looks like a real map. In this article, we employ constraints to guide the visual evaluation of the generated images. The definition of constraints is common in map generalisation (Beard, 1991), but most of the measures of constraint violation are adapted to vector data (Mackaness and Ruas, 2007-01-01), and not to raster data. Courtial et al. (2020a) proposed a set of adapted constraints for road map generalisation using images, but these methods are only adapted for one theme of the map. Consequently, a quantitative evaluation of the generalisation images produced with deep learning is still an issue. As stated in this recent paper, the usual set of generalisation constraints is not enough to assess the images generated by deep neural networks, and constraints on the realism and credibility of the images have to be added.

How to Learn the Preservation of Spatial Relations?
The main limit of our results is the preservation of spatial relations. Our network fails to select roads while preserving road network connectivity and to avoid coalescence between roads and buildings. However, it was able to maintain the relative orientation between those elements and to preserve most of the building patterns (when they were presented in the target data).
The main difference between road density reduction and building density reduction seems to be the scale of the change: erasing a road would impact several tiles, while erasing a building may only impact a part of a tile. The possible hypothesis for the fail of the first task would probably be the scale of the tile that is not adapted to represent the context necessary for road selection and the absence of attribute information on the road that is more important for roads than for buildings.
For the road/building proximity problem, a context problem could also be possible as the tiles often do not represent the complete partition formed by the road and river and it seems to be the relevant level of context necessary for relation studying. However, some more plausible hypotheses would be a quality The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2021 XXIV ISPRS Congress (2021 edition) problem in the training data, which makes possible to generate a building under a road.
Finally, the failure of the generation of images with covered dense blocks can also be interpreted as a failure to understand a spatial relation (very high density of buildings inside a small block), and it is also induced by a lack of context and external information such as attributes of building, function and administrative limits.
6.3 Is It Possible to Generate a Complete Generalised Map?
The last and most important question brought by this experiment is the possibility to generalise a complete map using these networks. We can observe the following guidelines for the use of GANs for map generalisation: • Input and target domains have to be not too different with a supervised architecture. Consequently, we believe that only a small scale gap is possible.
• Input and target tiles have to be legible, thus the presented information has to be limited.
• A sufficient context has to be visible in the tile, so the images scale has to be adapted.
• Different elements on the map have to be represented in a distinct manner.
These four constraints on task definition and tile creation reduce the possibility offered by GAN. Moreover, some other techniques like attention based architectures (Vaswani et al., 2017) may resolve some context relative issues. Finally, it is currently more reasonable to design a process that learns separately different independent elements of the map, similarly to traditional generalisation methods that treat roads first and then building in the fixed generalised road network.

CONCLUSION
In conclusion, GANs succeed in generating a topographic map of urban areas that follows most constraints and preserves structures, orientation and relative density. We believe that the blurred and distorted outlines that sometimes occur could be avoided by changing the parameters of the neural network, and by using post-processing. However, the topological errors and the covering operation remain challenging, and are the next issue to deal with. Graph convolutional networks seem to be able to encode some spatial relations between geographic objects, to solve these remaining problems.