MODELING TREES FOR VIRTUAL SINGAPORE: FROM DATA ACQUISITION TO CITYGML MODELS

.


INTRODUCTION
Singapore is world-renowned as a "City in a Garden" (National Parks Board, 2016, Neo et al., 2012) owing to decades of dedication to greening the city.However, there remain challenges for capturing this unique quality in the virtual city of Singapore, i.e.Virtual Singapore (National Research Foundation, 2018), in the form of representative 3D models in a computationally-tractable way.Interwoven between buildings and other urban structures are an estimated 1.5 million trees spread over more than 1000 species in public parks, on state lands, and along the roads (Chew, 2015, Toh, 2018) which are often oversimplified, under-represented, or completely left out in Virtual Singapore.
Trees grow and change in a less predictable manner compared to other non-organic city objects and are governed by a multitude of different factors from their species biological characteristics to their surrounding environment.The diversity and density of vegetation in Singapore also makes it extremely challenging to accurately identify and represent all trees in the country adequately.
Yet it is exactly due to these differences of species and density that they are planted for creating a variety of experiences within the city.Modeling these differences will enable various environmental or urban analysis and studies to factor in vegetation, but the efficient and accurate creation and maintenance of these models in any virtual city are significant hurdles to overcome.
Modeling individual trees at a high level of detail is a tedious and labour-intensive process.Tree models derived directly from the laser-scanned point data are generally incomplete owing to the limitations of the scanning process as well as the nature of trees in general -with leaves obscuring branches.In addition, images from multiple viewpoints of individual trees are not readily available to automatically assist the tree model reconstruction, and even if they do, will often require substantial manual intervention.Given the need to model millions of trees in Virtual Singapore, these direct approaches are not feasible.Hence, tree models need to be dynamically generated for scalability and ease of maintenance, yet representative of the actual tree on the ground.

Objective
The main objective of the project is to develop efficient tools and techniques to model 3D trees which are biologically, spatially and semantically representative in Virtual Singapore.The project focused on the various level of details (LOD1, LOD2, and LOD3) of solitary vegetation objects (Figure 2) defined in the CityGML standard (Open Geospatial Consortium, 2012).In particular, the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W10, 2018 13th 3D GeoInfo Conference, 1-2 October 2018, Delft, The Netherlands existing LOD2 and LOD3 representations of solitary vegetation are not well defined for the tropical environment in Singapore.
Hence, the findings and results of this project will contribute to Singapore government's ongoing initiative for localising the CityGML standards.We adopt a procedural modeling approach to automatically produce dynamic virtual tree models which can grow or change based on their species characteristics and local environmental conditions.This automatic approach works by analysing the acquired remote-sensing data to extract the characteristic properties of each individual tree, which are then used to customise preformulated species-specific tree models to represent their real-life counterparts being with their surroundings.
This paper lays down our overarching framework (Figure 1) which we believe will enable Virtual Singapore as the world's first dynamic green virtual city, where trees can be populated automatically without continuous and expensive labour-intensive efforts.It is also envisioned that using such highly representative tree models will greatly enhance outcomes of environment studies, simulations, and city planning.Lastly, this work is in line with the aim of addressing challenges of automation, maintenance and quality control of the tree models within Virtual Singapore.

Problem Statements
In order to address the challenges above, our framework focuses on two specific areas: 1. Automated tools for remote sensing data acquisition and processing 2. Methodologies to generate large scale CityGML-compliant tree models at multiple levels of details (LOD1, LOD2, and LOD3)

LITERATURE REVIEW
Our framework entails various aspects of data acquisition, processing, and 3D modeling, and we will discuss some of the relevant state-of-art works in this section.
We leverage off satellite imagery to efficiently and accurately locate trees and to obtain their crown size information (Kamal et al., 2015, Zhen et al., 2016).Beyond satellite imagery, which has limitations, laser scanning technology has revolutionised environmental digitisation for quantifying vegetation (Newnham et al., 2012, Vga et al., 2016, Zhao et al., 2015).However, the problem of occlusions in urban settings and the sheer density of tropical vegetation require high resolution and coverage of trees, producing massive point cloud data which are labour-intensive to process, store, manage, and work with (Martinez-Rubi et al., 2015, van Oosterom et al., 2015).Dealing with a single tree is challenging (Raumonen et al., 2015), and attempting it for all trees across Singapore efficiently and accurately is a much more daunting process.
The modeling of trees from LiDAR data requires a number of interdependent steps.The first is the identification and extraction of vegetative material from the LiDAR dataset (Zhou and Neumann, 2013), one tree at a time (Rutzinger et al., 2010, Rodrguez-Cuenca et al., 2015).The next step is to quantify tree semantics from the isolated point cloud data (Pfeifer et al., 2004, Livny et al., 2010, Leavenworth, 2012) by segregating the tree point clouds into the above-ground woody structures and the tree crown.At present, we are not aware of any commercial software for this segregation step, although some relevant techniques have been demonstrated (Bland et al., 2014).Works such as (Li et al., 2017) use the intensity information found in typical point cloud data to differentiate the woody structure (trunk) and crown (leaves).Other works (Xu et al., 2007a, Wang et al., 2014) use minimum spanning tree construction based on neighbours' distances to construct tree skeletons.The next step involves simplifying the woody structures into a series of cylinders (Raumonen et al., 2013, Delagrange et al., 2014, Calders et al., 2015) to allow extracting basic physical tree parameters to derive tree semantic information and to generate 3D tree models.
Lastly, modeling 3D trees can be done by non-procedural, interactive approach or stochastic, procedural approach.The current state-of-art in the interactive approaches (image-based (Reche-Martinez et al., 2004), LiDAR-based (Xu et al., 2007b, Livny et al., 2011), graph-skeleton (Pirk et al., 2012a, Pirk et al., 2012b), sketch-based (Weber and Penn, 1995, Lintermann and Deussen, 1999, Longay et al., 2012)) are inherently not scalable and generally ignore the tree growth factor despite producing highly realistic looking tree models.On the other hand, the procedural approach generates high quality trees stochastically and automatically based on a set of predefined rules, hence it is suitable for large scale modeling of naturally-looking trees and their growths.However, the current state-of-art for this approach is limited in accurately capturing regular patterns of nature and requires precise biomass distribution measurements which is not always feasible to obtain (Vos et al., 2010, Cournde et al., 2011), or requires a good mesh input of the actual tree (Stava et al., 2014) which is challenging to derive from remote sensing data without manual intervention, and hence not directly feasible for Virtual Singapore.

PROPOSED FRAMEWORK
Here we present our tree modeling framework that supports a full workflow from data acquisition and processing until the generation of CityGML tree models for Virtual Singapore (Figure 3).We intend to locate every single tree in Singapore, then generate its Virtual Singapore CityGML representation in the form of a 3D model and its semantic information.
The official tree database website of National Parks Board (NParks), trees.sg,currently shares approximately 500,000 trees in parks and along the roads in Singapore, each with unique identifier, species and other semantic information (National Parks Board, 2018, Lee, 2018).On top of this database, we rely on remote sensing data to extract the tree information.Our acquired  3. Framework workflow remote-sensing data include airborne LiDAR scanning (ALS) and mobile LiDAR scanning (MLS) data, as well as satellite and airborne imagery.The remote sensing data are processed to measure individual tree properties such as growth space, crown shapes, trunk and branch sizes and angles.The measured data accumulate as statistics in our species-specific tree library and also used as constraints to model individual trees at required level of details.All components of our tree modeling framework, which are mostly work-in-progress, will be discussed in more details in the following subsections.

Data Acquisition
LiDAR scans (ALS and MLS) and airborne imagery were acquired (Soon and Khoo, 2017) and geo-referenced to the SVY21 Coordinate system with orthometric height based on Singapore Height Datum: • ALS data over the entire Singapore were collected in April 2014 using an Optech ALTM Pegasus with up to 4 range measurements, including 1 st , 2 nd , 3 rd , and last returns.The data were then processed for georeferencing and registration before saved in a LAS format to a vertical and planimetric accuracy of ±0.15 m or better with a minimum of 5 points/m 2 and pre-classified.• Airborne imagery was obtained using a Leica RCD30 with a 60 megapixel resolution which produced an eventual orthophoto mosaic imagery in TIFF format with a resolution of 10cm with a spatial accuracy of ±0.5 m RMSE or better.• MLS data were collected between Aug 2015 and June 2016 using a Riegl VMX-450 at approximately 40 points/m 2 at 70 m range at a speed of 60 km/hr with a measurement rate at minimum of 200,000 points/sec/head.The data covered the vast majority of roadways across the entire Singapore.
The ALS and MLS data acquisition above cost around SGD$4 million.Plans to conduct subsequent scans are in discussion at the point of writing.
In addition, high resolution satellite images of the Worldview-2 satellite with 8 multispectral bands (2 m/pixel) and a panchromatic band (0.5 m/pixel) were acquired.The images were orthorectified with fine digital elevation model (DEM) for georeferencing.The pixel digital numbers were then converted to band-average spectral reflectance for better spectral analysis and to facilitate comparison with imagery acquired at other time or by other sensors.The multispectral and panchromatic bands were fused to form a 0.5 m/pixel multispectral image using an in-house developed pan-sharpening algorithm (Xiaojing and Chin, Unpublished), which preserves both the spectral fidelity of the multispectral bands and the spatial resolution of the panchromatic band.
3.2 Data Processing 3.2.1 Tree Isolation using Satellite Imagery For trees outside of the NParks database and not covered by the MLS data, we employ the use of customised algorithms to isolate trees from satellite imagery.Here, an object-based multi-resolution segmentation procedure is applied to extract relatively homogenous objects from the image layers based on their spectral and spatial/contextual properties.The objects are classified into tree or non-tree classes, according to the mean image layer reflectance values, or spectral indices of NDVI, brightness, band ratios, etc.
For the tree objects, the individual tree crowns are delineated using a watershed segmentation technique.Morphology filtering and region growing algorithms (Gonzalez and Woods, 2006) are used to smoothen the outlines of individual tree crowns and to fill small gaps within the canopy.Finally each tree crown is represented by a circular object of the same area (Figure 4) and tagged with parameters such as the geo-position, crown size, and attributes extracted from other data sources (e.g., ALS, MLS and NParks tree database).In our initial validations, the results of automated individual tree crown detection and delineation using satellite imagery correlated well with the ground truth data obtained by manual delineation.
We anticipate that the extracted spectral and morphological attributes of tree crowns can be used to identify tree species using template matching or machine learning methods.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W10, 2018 13th 3D GeoInfo Conference, 1-2 October 2018, Delft, The Netherlands 3.2.2Tree Isolation using MLS data MLS data was collected of almost all roadways in Singapore which would mean that every roadside tree would have been scanned.From this data, our task is to isolate individual trees and generate a single point cloud file for each tree to be further processed.To perform the isolation, we developed tools within the PointCloudScene (Digicart, 2018) production environment to automate the isolation process which requires minimal manual quality checks.
To isolate the trees from the surrounding environment, we first prepare a generalised triangulated irregular network (TIN) ground model (Figure 5) to serve as a reference to base the subsequent extraction from.While tree trunks tend to be separated from one another, the density of trees in Singapore often results in tree canopies which intersect each other.To segregate the trees from each other, customised point cloud filtering tasks were developed to identify the canopy out of original point cloud points.Voxelisation and shape recognition solutions were then applied to separate canopies from other points in the scan (Figure 7).This is followed by best fitting ellipses that are generated around each tree trunk enveloping the respective canopy and stored in the database (as clipping shapes) for each isolated tree (Figure 8).The end result of this process is a database whereby the 3D location, canopy coverage and best fitting ellipsis are stored for each successfully isolated tree.In addition, using the ellipses, point cloud files were clipped out of the main scan for each tree and stored for further processing.
In the ongoing process of verifying the extracted data, we have found that the developed algorithms generally perform well in extracting girth measurements (Figure 9), especially for cases where the area of interest (AOI) fits the target closely, there are good go-return scans around the target and no dense hedges around tree trunks, and the trunk diameter is relatively big (more than 10 cm in our case).Understandably like all automated techniques, a degree of quality checks need to be performed.As such, we have also developed simple quality control tools to operators to check and if necessary modify the results.

Classification of Tree Woody Structure and Leaves
After successfully isolating a single tree from the MLS data, we then seek to classify the point cloud into woody structures and leaves for further analysis.There are two main characteristics of the point cloud that can be used to perform this classification, the intensity of the return and its distance to its knearest neighborhood.Generally, points of woody structures tends to have a higher intensity and are more densely spaced, while points located around leaves tends to have a lower intensity and are sparser.However, due to scanning irregularities, segregation using these two parameters typically resulted in many mis-classifications.
We thus proposed a multi-stage classification process.In the first stage, small clusters of points are first classified as woody structure.To do that, points are sorted based on their intensity values and their average distance to their nearest three neighbors.Points which are both in the highest 10th percentile with respect to highest intensity and closest distance are selected as woody structure.These regions of points clusters are known as seed clusters.In the second stage, these seed clusters are then allowed to grow, based on an average distance to nearest three neighbors cutoff parameter.In the final third stage, the points that are selected so far are put into a minimum spanning tree, with the source being the point lowest with respect to the tree.Only the main connected cluster to the source in the minimum spanning tree is then classified as the woody structure.Some example results of the classification is shown in Figure 10.

Tree Parameter Extraction
This portion of the project is currently under development at the time of writing.Here, we use the individually extracted and classified point cloud file of each tree obtained from above and develop more tools to extract the following parameters automatically: • 3D coordinates and height • trunk girth and height • crown width, depth, height, orientation, and eccentricty • tree growth space in the form of voxels • branching structure and diameters All extracted parameters can be visualised for operators to modify them if necessary.The extracted parameters are then exported as a customised XML file which represents a very compact file format required to store as much information of the tree as possible (Figure 11).This XML database not only serves as permanent record of a particular tree in time but also provides the necessary data for the following LOD2 and LOD3 modeling work.vegetation objects could be expressed in several levels of complexity (Figure 2): LOD0 as the most general portraying only the tree crown outline, LOD1 as a simplified 3D proxy with a height representation, LOD2 as a crown and trunk representation, LOD3 as a more realistic representation at the species level of details including leaves and branches, and potentially LOD4 which includes tree's internal spaces (e.g., cavities).One of the objectives in this project was to establish suitable representations of the vegetation theme incorporating requirements from Singapore's local context.In this paper, we touch on the modeling for solitary vegetation objects at LOD1, LOD2, and LOD3.Additional properties relevant to Singapore's local context were added to the list of existing attributes of CityGML solitary vegetation theme.
3.3.2LOD1 and LOD2 Solitary Vegetation Models For LOD1 tree models, it is proposed that a simple but symbolic representation should be used to portray solitary trees.The polygon count for the LOD1 models should be kept minimal but enough to distinguish the model as a vegetation feature from other LOD1 city objects.This need to distinguish trees from other city objects led us to decide against the cylindrical model, going instead with simplified geometrical shapes positioned and scaled to the respective extracted coordinates and heights (Figure 12).
The development of LOD2 models was detailed in our recent work (Lin et al., In Press, 2018) whereby seven common tree shapes were been identified to represent the entire range of trees in the city: columnar, conical, irregular, oval, round, wedge and palm (Figure 14).There, we demonstrated the tree extraction The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W10, 2018 13th 3D GeoInfo Conference, 1-2 October 2018, Delft, The Netherlands CityGML LOD2 models are then generated based on existing typology information and allometric equations derived from MLS data measurements.Over 80,000 tree models were automatically generated within a 25 km 2 test site in Singapore (Figure 13) using this method resulting in a low polygon count but representative method of modeling the various typological shapes and sizes of the different species in Singapore.
Figure 14.Tree shape typologies for LOD2 representation 3.3.3CityGML LOD3 Models Currently, LOD3 modeling is a work in progress.Our LOD3 tree models are defined at the species level and distinguishes various components such as trunk, branches, leaf, flower, root, and so on.Ten local species were chosen to represent approximately one third of managed tree population in Singapore: We intend to generate LOD3 models by an automatic inverse procedural modeling approach similar to (Stava et al., 2014), while tree parameter values are derived from remote sensing data and a pre-formulated tree library.In this case, 3D tree models are generated from a set of rules.To specify the rules, in this case the growth rules of trees, we choose to adopt L-system plant modeling language (Prusinkiewicz et al., 2000).By nature, L-systems generate trees through growth from a seed into root, trunk, leaf, fruit, and so on.We are formulating the L-system rules to model growth of the selected ten tree species.
The workflow of our procedural tree modeling consists of two parts (Figure 15): 1. Preprocessing: • formulation of a tree library of Singapore speciesspecific parameters and their value distributions • formulation of tree growth rules for various tree species 2. Runtime: parameter optimization using the input data (from the data processing process in the form of physical measurements and a voxelised growth space) and reference information from the compiled tree library with growth rules, in order to obtain the optimum growth parameter configuration Optimisation -Growth space -Constraint params.(measured) -Tree species -Tree age (est.)

Input data
Tree library (statistics) Tree growth rules (L-system)

Reference information
Pre-processing

Figure 15. Procedural modeling workflow
For the tree library, two sets of parameters are defined to describe the characteristics of the tree anatomy -the constraint parameter set and the growth parameter set.Constraint parameter values are obtained by the semantic analysis of processed MLS data, whereas growth parameters characterise the growth, and the change in shape, and structure of a tree over time.The value distributions of the growth parameters vary with respect to the tree species.Optimal values of the growth parameters for a tree are determined stochastically within constraints during optimisation -given the species and, if available, an estimated age of the tree.
The parameter optimisation module solves for optimum parameter values by using a constrained stochastic optimization technique such as Markov Chain Monte Carlo (MCMC) (Kass et al., 1998) with suitable directed search algorithms such as simulated annealing and gradient descent.The value of each unmea- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W10, 2018 13th 3D GeoInfo Conference, 1-2 October 2018, Delft, The Netherlands sured parameter is determined randomly according to its speciesspecific probability distribution, while keeping the generated tree within its growth space.
To initialise the optimization, we use the extracted tree measurements from LiDAR data (such as crown and trunk sizes) to estimate the age of the tree.A tree seed model of its species is procedurally grown to that age while matching the bending angles and diameters of the trunk and first order branches (and number of first order branches).From this state, it will iteratively and stochastically determine the values of other unmeasured parameters while minimising distance costs within the growth space constraint of the tree.
At the end of the optimization process, we obtain a set of optimum parameter values with their corresponding growth timeline, to be used by some tree modeller software such as Xfrog (Lintermann and Deussen, 1999) to output a high resolution 3D geometry model of the tree.The generated tree model is similar to the actual tree with respect to its species, trunk-branch structure, and growth space, yet stochastically different from the actual tree, especially in terms of leaves and unmeasured high order branches.
The geometry models, along with the corresponding tree's semantic information, are stored as CityGML LOD3 models.

CONCLUSION
Our tree modeling work for Virtual Singapore is still in progress but has demonstrated promising preliminary results with automated flows among the framework components.However, there remain challenges of completing the tasks at hand and testing the robustness of our methodologies to the wide variety and sheer population of trees across Singapore.
Our research outcome will be a prototype framework to automatically model multiple level of details of trees in Virtual Singapore based on extracted remote sensing data constraints and a library of Singapore tree species.We envisage that the work carried out by the team will eventually enable city planners, designers and researchers alike to use these representative vegetation models for analyses and studies on virtual city platforms such as Virtual Singapore.

Figure 1 .
Figure 1.Workflow of interdependent components in the tree modeling framework

Figure 4 .
Figure 4. Tree crowns extracted from satellite imagery

Figure
Figure 6.Tree locations

Figure 12 .
Figure 12.LOD1 models on a Singapore test site