AI4GEO : A DAT A INTELLIGENC E PLATFOR M FOR 3D GEOSPATIAL MAPPING

The availability of 3D Geospatial information is a key issue for many expanding sectors such as autonomous vehicles, business intelligence and urban planning. Its production is now possible thanks to the abundance of available data (Earth observation satellite constellations, insitu data, ...) but manual interventions are still needed to guarantee a high level of quality, which prevents mass production. New artificial intelligence and big data technologies adapted to 3D imagery can help to remove these obstacles. The AI4GEO project aims at developing an automatic solution for producing 3D geospatial information and new added-value services. This paper will first introduce AI4GEO initiative, context and overall objectives. It will then present the current status of the project and in particular it will focus on the innovative platform put in place to handle big 3D datasets for analytics needs and it will present the first results of 3D semantic segmentations and associated perspectives.


INTRODUCTION
The availability of huge volumes of satellite, airborne and in-situ data now makes the production of 3D Geospatial information feasible at large scale. It needs nonetheless a certain level of manual intervention to secure the level of quality, which prevents mass production.

AI4GEO INITIATIVE
AI4GEO is a French scientific and industrial program aimed at lifting the technological barriers to the automatic production of 2D and 3D Geographic Data. By applying innovative artificial intelligence and big data technologies to the processing of varied and precise geospatial data sources, it aims to automatically produce a 3D smart map. The AI4GEO consortium is composed of institutional partners (CNES, IGN, ONERA) and industrial groups (CS Group, AIRBUS Defense and Space, CLS, GEOSAT, QWANT, QUANTCUBE) covering the whole value chain of Geospatial Information. Started at the end of 2019, the project is funded for 4 years by the French future investment program led by the Secretariat General for Investment and operated by public investment bank Bpifrance. The project is structured around 2 axes which will progress simultaneously: The first axis consists in developing a set of building blocks allowing the automated production of qualified 3D maps and their additional layers of information (3D objects and related semantics), as illustrated in Fig. 1. This collaborative work will benefit from the latest research from all the partners (imagery, AI and Big Data technologies) as well as from an unprecedented database (satellite and airborne data (optics, radars, lidars) combined with cartographic and in-situ data).

Fig. 1. 3D maps and semantics
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2021 XXIV ISPRS Congress (2021 edition) The second axis consists in deriving from these technological bricks a variety of new services illustrated in Fig. 2 for fields targeted by industrial partners.

Fig. 2. Industrial services
The project could also contribute to other fields and services (e. g. Precision Agriculture, or Robotics) thanks to the IT platform that will be accessible via CS Group.
This list could be extended in the future by early adopters (e. g. Precision Agriculture, or Robotics). It is also important to highlight that the project will benefit from the arrival of satellite constellations such as AIRBUS Pleiades-Neo program (in 2021) and above all CNES CO3D program [1] that will greatly increase the provision of accurate 3D Very High Resolution data.

AI4GEO DATA ANALYTICS PLATFORM
Considering the dramatic increase of data and processing needs for new Earth Observation program, CNES has developed since 2017, in collaboration with CS Group, a new Data Analytics platform based on big data and cloud technologies aimed at fostering user interaction and developing data analytics on large datasets [2] [3].
The main idea is to rely on a software ecosystem widely adopted by the scientific community and to build on top of it a complete and consistent thematic data analytics lab, called VRE (Virtual Research Environment) to access, manipulate, process and visualize at scale any kind of geospatial products (optical, radar, etc.).

Application layer
From the user point of view, the VRE consists mostly of a JupyterLab Interface, as shown in Fig. 3, enhanced by powerful plugins among which: • A Virtual Desktop that allows to execute "desktopbind" application such as QGis in a web browser environment.
• A complete Integrated Development Environment (IDE) based on Visual Studio Code. This is complementary to the notebook approach, mostly prototyping oriented. • EODAG [4], a Python framework with a JupyterLab plugin that acts as a catalog proxy. Thus, it simplifies data discover by providing a standard API through a single access point.

Fig. 3. VRE interface
In addition to these plugins, the VRE includes a large number of EO libraries (e.g. OTB, PANGEO) and AI ecosystem (Tensorflow, Keras or Pytorch as well as tools such as tensorboard) facing the challenge to provide a complete set of tools whose versions are compatible with each other.
Two kind of visualizer are provided. The first one is fully integrated into the VRE environment and let users to immediately have a look on their results. 2D visualizer is based on a light and performant tile manager that can handle large volume of images fluently. Besides a 3D visualizer, based on WebGL, allows to display 3D models computed through a simple web browser.
The second tool, QUB has been design to fully exploit 3D geospatial data. Unlike traditional approaches, 3D information is not only a 2D picture projected on an underlying mesh. QUB stores the information in Voxel and the whole model is a 3D volume. The rendering is then computed in real time to display frames on screen. Each voxel is able to contain temporal geospatial information and physical measures.
Technically, QUB is based on a 3D engine so user can move the point of view as he were flying on top of the scene but he can also change in real time the kind of information displayed, for instance to show only a specific class of a land cover map or the same area at two different points in time, as shown in Fig. 4. Firsts results are promising as shown in figure below.

Platform layer
At system level, the VRE relies on the Docker container technology. In addition to bringing substantial simplification of the deployment process, the Docker inheritance capability allows to provide each community with the right environment. The kernel VRE includes common packages of data science and tools whose versions are compatible with each other, on top of which multiple thematic overlays (Imagery, Hydrometry, etc.) have been created. At last, a specific Project can also add its own overlay to embed dedicated libraries, specialized tools and so on.
Finally, at infrastructure level, the VRE is based on Safescale platform developed by CS Group 1 that manages of security and interoperability. The VRE is able to seamlessly work on academic HPC cluster or on Public Cloud.

Fig. 5. Framework architecture
In the frame of AI4GEO project, this environment has been deployed onto CNES Computing Center taking advantage of its powerful Data Processing platform 2 , and accessing a wide range of satellite images and analysis ready data [5] (PEPS, THEIA, SWH, etc.) but also datasets such as SPOT6/7 or Pleiades-HR from AIRBUS OneAtlas, aerial data from ONERA or cartographic references from IGN.

Scalability and automation
Deep data analytics work involves multiple steps that are mostly done manually, in a sequential way, that prevents mass production. We work on a two level parallel software architecture. The higher level relies on an orchestrator 3 that manages the automation of the full workflow as well as the coarse grain parallelism. This data parallelism is based on Inputs that are consistent and independent. Thanks to a workflow modeler illustrated in Fig. 6. and building blocks offering a standard API, scientists can seamlessly design their own automated pipeline.

Fig 6. Zeebe workflow modeler
The orchestrator is able to launch a full pipeline taking into account specialized hardware placement when needed by building blocks (for instance GPU for training and inference steps). It also takes care of failure resiliency and provides processing monitoring information.
At lower level, each building block could, if needed, implement a fine grain parallelism. To do so, we rely on the Dask 4 processing framework. We chose Dask for its reduced learning curve, its native Python integration and the multivel parallelism it offers. Indeed, one can choose multi cores or multi nodes parallelism depending on the computing cycles greediness of the algorithm developed.
High level orchestration studies are underway but we obtained promising first results on automated production.

FIRST RESULTS AND OUTLOOK
The 4-year project aspires to produce automatically 3D semantic maps at very high resolution and global scale. Two pipelines have been deployed in the platform during the first year.

Fig. 7. 3D pipeline
The first one consists in extracting building footprints as well first 3D reconstructions using both Digital Surface Model (DSM) and images from Pleiades-HR satellites.
The pipeline described in Fig. 7 [7], an open source stereo pipeline designed for scalability and robustness. Each stereoscopic pair of images are converted in epipolar geometry and a disparity map is computed based on Semi Global Matching algorithm [8]. From this map, homologous points are found in sensor geometry. Then, using forward sensor model (RPC) lines of sight related to this points are computed. 3D points are obtained at the intersection of these lines (barycenter). Finally, a rasterization process is applied to get a georeferenced DSM raster.
The second step of the pipeline aims at extracting the Digital Terrain Model (DTM) from the DSM at large scale using a specific module called Bulldozer. This algorithmic module is an improvement of the original method based on 5 https://www.ogc.org/standards/citygml Lidar data [6] to handle photogrammetric DSM and scalability. The idea of the algorithm is to let fall a rigid cloth to cover the inverted surface. The high frequency distortions of DSM are therefore filtered whereas the low frequency variations of ground altitude are captured. At last, the Digital Height (DHM) model is obtained.
Then the Pleiades spectral bands (Red, Green, Blue, Near Infrared), the DHM, and two additional NDVI and C3 (shadow) channels are combined into a Tensor. The building semantic segmentation process is performed using a U-Net architecture with an EfficientNet encoder and RefineNet decoder. The neural network is trained by using labels that comes from freely available OpenStreetMap and IGN Databases.
Once extracted semantic building regions are obtained, the following step is to perform a morphological post processing which consists of removing too small buildings (when its area in pixels in lower than a user given threshold) and also too small inner courtyards. Then, a connected component segmentation is carried out in order to identify uniquely each building.
Finally, a RANSAC regularization method with geometric constraints improvements specific for building shapes, considering the parallelepiped shape of buildings, is applied to obtain the LOD0 (Level Of Detail 0) shape of each building based on the CityGML standard 5 .
The next step consists of computing the LOD1 (flat roof) 3D reconstruction of buildings. Thanks to the DHM, statistical height measures such as the minimum, maximum, mean or median are computed for of each building and then 3D meshes are constructed for each one.
Ongoing work is focusing on exploring algorithms to determine the roof shape of each building in order to compute a LOD2 3D reconstruction. To do so, roof faces need to be identified. Two methods have been applied: an unsupervised segmentation (Mean-Shift) and a supervised approach based on a MASK-RCNN deep learning model to identify each roof face. The spatial arrangement of each roof facet for one building is then compared to a bank of standard roof types and the closest one is selected as a better approximation. Current results are not convincing. This is mainly explained by the spatial resolution of Pleiades-HR satellite images where building contours and ridges are often blurred. The future very high resolution optical satellite images Pleiades Neo will be a perfect opportunity to further assess our LOD2 reconstruction methods. The training step consists of using 90,000 buildings over the city of Toulouse. The full dataset has been split into a training dataset (80%), a validation dataset (10%) and a test dataset (10%). The performance of the pipeline has been measured on the test dataset. The performance of the U-Net is summarized in table 1:  Table 2. gives the performance of the regularization method is computed using 3 metrics: • IoU to evaluate the location and the global shape accuracy • Ratio of the number of segments (between the resulted shape and the reference shape) to evaluate the simplicity of the resulted building shape • The difference of the orientation of the minimum oriented bounding box between the resulted shape and the reference one.  9 shows other areas that have been produced with equivalent results such as the city of Montpellier and Barcelone.

Land cover and detection change pipeline
The second pipeline aims at producing land cover with first attempts on change detection. Two different scales are targeted, VHR on urban tiles (50cm GSD based on Pleiades images) and HR on regional ones (10m GSD based on Sentinel 2 images).
Urban land cover map contains 4 classes (building, vegetation, water and roads). Semantic segmentation is based on U-Net neural network trained on a mix of OSM data (95%) and manual annotations (5%). Fig. 10 presents the Toulouse land cover produced in 2020.

Fig. 10. City scale (Toulouse) land cover 2020
Analysis of the preliminary results has quickly pointed out a performance limitation due to multiple bias included in the ground truth (GT), especially when considering alignment of object edges (highlighted by Intersection over Union, IoU, metric) even if the classification itself is good (overall accuracy above 0.85). Therefore, an activity on labelling tools has been started with two main axes. On the one hand a labelcooker tool aims to merge existing GT by combining the best characteristics of each (through a statistical approach). On the other hand, a semi-automatic method is developed in autolabel tool, that intend to produce new GT, using active learning approach in order to increase labelling productivity.
Global land cover map is made up of 14 classes, subset of Corine Land Cover. It is based on Sentinel2 sensor. Best results are obtained with a ResNet network on Sentinel2 L1C time series, trained with IGN OSC-GE dataset. Fig.11 shows the PACA land cover map produced at the end of 2020 with a 0.89 overall accuracy score. Recent researches are now focusing on multi resolution inputs with the objective to benefit from the resolution of a THR sensor (Spot6/7 or PHR) and from the revisit frequency and multispectral information of HR Sensor (Sentinel2).

Fig. 11. Region scale (PACA) land cover
As regards the detection change, the ONERA REACTIV Tool (Rapid and Easy Change Detection in RADAR Timeseries by Variation coefficient) was integrated on the AI4GEO platform. As detailed in [10], changes are highlighted in SAR time series by analyzing the Variation coefficient. It provides change visualization capability by colorizing pixels depending on change detection date. Furthermore, it is possible to discriminate between one-time and persistent change. Fig. 12 shows the Toulouse change detection map produced using a Sentinel1 time series from 2017 to 2020.

CONCLUSION
In this paper the AI4GEO project has been presented and some of its first achievement reviewed. Many other research activities have been carried out within partner labs so as to launch the second year of the project. Our goal will be to tackle new challenges towards the Global Smart Map, such as performance optimization and generalization on a wide range of cities and regions (Asia, America, etc.), multi resolution classification, change detection, 3D mesh reconstruction and from a platform point of view, workflow orchestration at large scale to cope with the global scale objective.