UTILISING SIMULATED TREE DATA TO TRAIN SUPERVISED CLASSIFIERS

P.O


INTRODUCTION
There is increasing demand for single tree inventory. With the higher resolution of available remote sensing datasets this becomes possible. Tree species classification is a main task for automatic tree inventory from remote sensing data. The species information can be utilized, for example, by the forest owner to confirm boundaries and quality as well as market values of a stand. Knowing about each single tree, also enables close monitoring of tree growth.
Tree species can be classified using different machine learning methods such as random forests and support vector machines (Raczko and Zagajewski, 2017) and fairly recently convolutional neural networks (Ferreira et al., 2020;Weinstein et al., 2019;Natesan et al., 2019;Nezami et al., 2020). In point cloud-based approaches, voxelization and waveform representations (Guan et al., 2015) have been utilized. In addition, support vector machines, random forests and neural networks (Deng et al., 2016) have been suggested for tree species classification from 3D point clouds. Despite the variety of approaches, the tree species classification problem remains an open research topic for single trees over a wide area Since McCulloch and Pitts (1943) published ideas of neural networks, they have developed significantly. Early deep learning approaches (e.g. Ivakhnenko, 1971) with several layers were difficult to train, and deep learning did not become popular until the 2010s. Convolutional neural networks are specialized to handle grid-like data, such as images. Modern convolutional neural networks were introduced by Lecun et al. (1998). One significant step that made convolutional neural networks popular was AlexNet (Krizhevsky et al., 2012) that consists total of eight layers from which five are convolutional layers and the remaining three layers are fully connected. * Corresponding author Convolutional neural networks have become widely applied for object detection and image classification, seeing use particularly in databases with large numbers of parameters to be trained (Dhillon and Verma, 2020). In recent years, many applications of convolutional neural networks have been studied, such as detection of trees depending on age (Mubin et al., 2019), recognition of human actions in videos (Serrano et al., 2018), object classification for autonomous driving (Dreossi et al., 2017), and semantic segmentation of point clouds . The architectures utilized for different applications are diverse, with varying input image sizes, and parameter numbers ranging from a few thousand to over a hundred million (Shin et al., 2016). With a large number of pre-trained networks available, transfer learning can be a viable approach (Gopalakrishnan et al., 2017), as a fine-tuned model can provide adequate results with a relatively small set of training data (Afridi et al., 2018).
Simulation is an alternative to field measurements for obtaining data to train supervised classifiers. Especially, neural networks require relatively large training datasets for the network to correctly recognize features, and real data may be difficult to obtain (Ødegaard et al., 2016). Simulated data can be used to get a large training dataset with varying features without need for field measurements (Ji et al., 2019). Simulated data has also been widely utilized to train convolutional neural networks in various applications, such as electron detection (van Schayck et al., 2020), ultrasound image enhancement (Perdios et al., 2018), and identification of fish species (Allken et al., 2019). A network trained with simulated data can potentially be directly applicable to real data (Nair et al., 2018), though real data may differ from the simulated data, and the network may need to be adjusted to address this difference (van Oort et al., 2019). (Sothe et al., 2019) cameras or mobile laser scanners (Ramalho de Oliveira et al., 2021), or combinations of these (Colgan et al., 2012;Wu and Zhang, 2020). Lower-density airborne laser scanning data and imaging (Weinstein et al., 2019), as well as satellite imaging (Krahwinkler and Rossmann, 2013) and terrestrial laser scanning (Holmgren et al., 2008;Othmani et al., 2013) have also been applied.
The aim of this paper is to examine whether simulated forest data can be utilized for training supervised classifiers. In addition, we examine a novel classification method that utilises feature images and convolutional neural networks. As a comparison, we tested the random forest classifier with the same data. We conducted also a preliminary examination how well training with simulated data performed with reference data from field inventory.

Simulating tree features
In order to experiment and compare classification methods, a simulated set of tree features was created. We selected three tree species, namely spruce, pine and birch, to be simulated. Simulation was implemented in Matlab.
The simulation allows to change the number of trees. The starting point is a trunk diameter at breast height (DBH). We set mean DBH, standard deviation as well as minimum and maximum DBH. From DBH, we derive tree height by utilizing the Näslund's height curve (Näslund, 1936): Parameter m depends on the tree species and is 2 for pine and birch, and 3 for spruce. Coefficients and also vary depending on tree species. We utilized mean parameters estimated in Siipilehto and Kangas (2015), presented in Table 1. We added some variation with given standard deviation to resulting heights. However, it was ensured that a tree could not exceed the characteristic maximum height of the corresponding tree species. The mean tree crown diameter of each tree species (Table 2) was taken from Korpela et al. (2014). Again, some variation with a given standard deviation was added. We decided to simulate false colour, namely near infrared (NIR), red and green, values for each tree species. To add some challenge and imitate reality, we created two colour cases for each tree species corresponding to the cases where the treetop is well illuminated and when it is in shadows. This is especially important for the convolutional neural network method because it is detecting patterns in images, and illuminated and shadowed cases look very different.
For simulation, we applied mean reflectance values (Table 3) from Korpela et al. (2014) that were converted into colour values by multiplying with 255 and then linearly enhanced according to real colour samples of a corresponding tree from an aerial image. We selected manually representative colours for both illuminated and shadowed cases for each tree species. The coefficients for linear transformation were found by regression ( Figure 1 and Figure 2). In addition, we added some variation with given standard deviation to colours.   The R 2 values reveal that the relative ratios of applied reflectance values fit well with true R, G and NIR observations. Figure 3 illustrates simulated colour variations between tree species and compares them with a real sample of a pine from an aerial image. The simulation randomly selected if illuminated or shadowed case was created. In addition, a small variation with a given standard deviation was added to colour values. To get more descriptors for a feature vector, some derived measures were The final feature vector ( ) that was utilized for tree species classification was:

Creation of feature images and labelling
In order to utilize convolutional neural networks, the feature vectors were converted into feature images (Figure 4). The size of the final feature image was decided on the basis of divisibility by 32, as that is the requirement for an unmodified YOLO v3 neural network. Therefore, the final size of our feature images was decided to be 96 x 96 x 3 pixels. Applying a colour image structure minimised the need to modify YOLO. First, we created an empty image of the size of 6 x 6 pixels. Then, the five values from a feature vector (eq. 3) were placed to area [2:4, 2:3] (last pixel was left empty). Two last elements were rescaled to the colour range [0,255]. Finally, the images were scaled to the size of 96 x 96 pixels and the first layer was copied to two other layers. Figure 5 illustrates samples from resulting feature images for following six tree classes: illuminated spruce, shadowed spruce, illuminated pine, shadowed pine, illuminated birch, and shadowed birch. Spruce, 2 Pine, 2 Birch, 2 Figure 5. Examples of feature images. Number 1 corresponds to an illuminated case and number 2 corresponds to a shadowed case.

Simulated and field reference data
We created a training data set of 1000 trees per tree species. However, because this was divided randomly to the illuminated and shadowed cases, each sub-class did not have exactly 500 samples but close to it. Separate data sets were created for testing.
To test classification with real data, we applied data from forest inventory done in three forest sample plots in Evo, Finland. Inventory data was collected in 2014 as a joint effort by Finnish Geospatial Research Institute FGI and University of Helsinki. However, crown diameters were measured only from pines (Pinus sylvestris). Pine was the dominant tree species in the area. According to known locations and tree heights, trees were superimposed into an aerial image (UltraCam Eagle Mark 1 f100, year 2019). Red, green and near-infrared colours were attached to trees by selecting the median colour value from a small area within a tree area. Figure 6 illustrates the distribution and location of 40 reference pines in three inventory areas. Test plots included only few grown-up spruces and birches. Unfortunately, the crown diameter data of them were not available. However, the potential to detect spruces (picea abies) and birches (betula sp.) was examined by giving the expected crown diameter to them.

Utilizing convolutional neural network
As a convolutional neural network, we selected YOLO (You Only Look Once) (Redmon et al., 2016) with the default full 106-layer yolov3 model (75 convolutional layers and 31 maxpool, route, up-sampling, and YOLO layers). Training was performed in the 64-bit c++ version of the open-source neural network Darknet (https://pjreddie.com/darknet/) in the Windows 10 operation system with GPU support. The pre-trained weights of yolov3.weights were applied as a starting point. Training a convolutional neural network requires many iterations. However, at some point the model starts overfitting, i.e. the model performs much better on the training dataset than on the test dataset, hence does not generalize well. Therefore, the mean average precision (mAP) was followed to detect an optimal number of iterations. In our case, we found that the weights at 13000 iteration rounds performed the best. For testing the trained YOLO neural network, OpenCV libraries were utilized with Python.

Random forest classification
We decided to apply also the well-known random forest classifier (Breiman, 2001). Random forest classification is based on decision trees and has become a popular classification method. We utilized randomForest library in R (version 4.0.3). The random forest classifier was trained and tested with the same data set than YOLO. The "Number of Trees to Grow" parameter of the random forest classifier was set to 300.

RESULTS
In Table 4, the confusion matrices of classifications with both classifying methods are listed. In Table 5, illuminated and shadowed classes were combined into three main tree species classification. In Table 6, the overall accuracy and Cohen's Kappa are presented for both classifiers. In Figure 7, we illustrate examples when YOLO has detected the feature image area and classified feature images to correct classes.    In addition, both methods were applied to detect 40 reference pines that were measured in the field. Both methods succeeded to classify the same 39 reference trees correctly and classified one pine to the birch class leading to the 97.5% classification accuracy. However, to achieve this result the brightness of colours needed adjustment.

DISCUSSION
Applying 1000 simulated training samples per tree species seemed to work both with random forest and YOLO. Tests against simulated test data showed that the overall performance of tested methods was very similar even if there was some variation which trees were misclassified. When the classifiers tested with field-measured reference data, the colours needed adjustment. If we compare Figure 3 and Figure 6, it can be seen that there is a clear difference in colours, because the examples are taken from different images. To successfully identify pines only brightness needed to be adjusted to the level of training data.
Even if we had no crown diameter information of spruces and birches from the forest test plots, we examined them by giving expected canopy diameters to four spruces and seven birches. This examination revealed that the same brightness correction that worked with pines did not give satisfying results with other tree species. However, after adjusting brightness and stretching the histogram of individual colour layers, it was possible to classify all trees correctly. In practice, we needed to modify only near-infrared colours for spruces and green colours for birches to detect them correctly. This indicates that simulated data is a feasible way to get large training and testing data sets, but when a new aerial image is applied, colours need to be adjusted to meet the expected colour levels. This needs to be done separately to each colour layer and it might need both brightness adjustment and histogram stretching. Ideally, colour values should be changed to something that is invariant to changes in photographs. An alternative to simulation is data augmentation, if some training data is available. However, this is a topic for future research.
Converting feature vectors into feature images and applying convolutional neural networks for classification is an unconventional approach. In other words, by converting feature vectors into feature images, we enable YOLO to detect and classify a tree in focus. Actually, we would not need all the properties of convolutional neural networks, since localization at the image plane could be avoided. After all, the feature information is always located at the same place in feature images.
In this experiment, we utilized only five features, and each feature filled a small area in a feature image. Therefore, convolution operations did not dismiss any features. If each feature would correspond to only one pixel in a feature image, it would be expected that the order of features might have influence to results, and the significance of features could vary. However, this is a topic for further research. In addition, in the future it would be interesting to compare the results of our way to utilize YOLO with the results of traditional fully connected multilayer neural networks.
We selected features that can be extracted from dense laser point clouds and photogrammetric images in practice. NVDI was added mainly to get larger pattern area in feature images. Since NVDI is derived from colour values, it most likely makes no difference to random forest classification. However, the effect for YOLO classification needs further research. The good performance of both classifying methods indicate that such simulation approach has potential to work in a practical forest inventory. However, we believe that the presented method can be further developed. We expect that simulation data can be improved by utilizing more species-wise knowledge about variation of the selected features. In addition, it might be possible to fine-tune training data for a specific classifier to improve the performance. For example, the convolutional neural network classifier might benefit if the training data set was larger than we applied. Our crown parameters came from field data, and it is a topic for future research how the method works with crown parameters extracted from laser scanning or photogrammetric data.
In our case, the use of a novel convolutional neural network classification method gave very similar outcome than the random forest classifier. The convolutional neural network classifier is expected to be scalable to handle many feature elements in a feature image, if needed. Theoretically, a feature image of the same size than we applied can hold 9216 features. Optimizing the structure of YOLO for operating with feature images is a good topic for further research. As an example, the current implementation of a full YOLO searches targets with different scales. However, the size of our feature image did not change. Therefore, a convolutional neural network could be modified to operate only with an optimal scale. In addition, the size of our feature images might not be optimal. However, this is not expected to affect to the classification accuracy, but merely to the computation speed.

CONCLUSIONS
The aim of our research was to examine whether simulated forest data can be utilized to train supervised classifiers. We created a simulator that can produce realistic tree feature vectors. These features were directly applied to a random forest classifier. In addition, we converted feature vectors into feature images suitable for a YOLO convolutional neural network. The random forest classifier and convolutional neural network classifier performed similarly both with simulated data and field-measured reference data. As a result, both methods were able to identify correctly 97.5% of the field-measured reference trees. This indicates that simulation is a feasible method to train classifiers. Simulated data allows much larger training data than could be feasible from field measurements. However, the colour levels of images need to be adjusted to expected levels when a new image is applied. Using feature images and convolutional neural networks should scale well and the method is not limited to a single application.