CLASSIFICATION OF UAV-BASED PHOTOGRAMMETRIC POINT CLOUDS OF RIVERINE SPECIES USING MACHINE LEARNING ALGORITHMS: A CASE STUDY IN THE PALANCIA RIVER, SPAIN

The management of riverine areas is fundamental due to their great environmental importance. The fast changes that occur in these areas due to river mechanics and human pressure makes it necessary to obtain data with high temporal and spatial resolution. This study proposes a workflow to map riverine species using Unmanned Aerial Vehicle (UAV) imagery. Based on RGB point clouds, our work derived simple geometric and spectral metrics to classify an area of the public hydraulic domain of the river Palancia (Spain) in five different classes: Tamarix gallica L. (French tamarisk), Pinus halepensis Miller (Aleppo pine), Arundo donax L. (giant reed), other riverine species and ground. A total of six Machine Learning (ML) methods were evaluated: Decision Trees, Extra Trees, Multilayer Perceptron, K-Nearest Neighbors, Random Forest and Ridge. The method chosen to carry out the classification was Random Forest, which obtained a mean score cross-validation close to 0.8. Subsequently, an object-based reclassification was done to improve this result, obtaining an overall accuracy of 83.6%, and individually a producer’s accuracy of 73.8% for giant reed, 87.7% for Aleppo pine, 82.8% for French tamarisk, 93.5% for ground and 80.1% for other riverine species. Results were promising, proving the feasibility of using this cost-effective method for periodic monitoring of riverine species. In addition, the proposed workflow is easily transferable to other tasks beyond riverine species classification (e.g., green areas detection, land cover classification) opening new opportunities in the use of UAVs equipped with consumer cameras for environmental applications.


INTRODUCTION
Riverine areas play one of the most important functions of watersheds, influencing the transfer of energy, nutrients and sediments between aquatic and terrestrial systems, as well as being the habitat of a wide variety of animal and plant species, having a great landscape and educational interest (Gutiérrez and Alonso, 2013). In relation to water quality, the riverine areas act as buffers between the upper and lower reaches of rivers, helping to filter out pollutants, as well as nutrients and sediments (Elmore and Beschta, 1987). Riverine vegetation plays a key role here, reducing erosion of stream banks by reducing the linear velocity of water, preventing soil erosion and keeping the geomorphology of the channels stable, while also fixing CO2 (Naiman et al., 1999).
Nevertheless, riverine zones are endangered by human activity such as land use changes, modification of riverbeds, use of dams, wastewater or introduction of invasive species (Michez et al., 2016b). The strong anthropic pressure makes these ecosystems very fragile, making it necessary to implement management plans, which allows for the compatibility of their ecological and economic functions, improving sustainability for future generations (Muñoz et al., 2004;National Research Council, 2002). Riverine management plans need a baseline data on existing conditions, with the objective of achieving a balance in the river course (National Research Council, 2002). Knowing the structure of the riverine landscape and how it affects landscape processes is essential for making planning and management decisions (e.g., removal of invasive species, planting of species for riverbank stabilization, removal of species that block watercourses, etc.) (Apan et al., 2002). In this regard, one of the most important points to avoid during periods of intense rainfall * Corresponding author is the excessive accumulation of riverine species in the riverbed, which on certain occasions can cause flooding out of the riverbeds.
In this aspect, the management plans of the riverine areas of the Spanish Mediterranean basin are conditioned by the periods of heavy rains, considering a fundamental factor which is the risk of flooding (Arizpe et al., 2008). In order to reduce this risk, it is necessary to have species distribution maps to plan the selective or periodic clearing of the river basin in accordance with environmental, risk and landscape criteria (Apan et al., 2002). Management planning for riverine species is affected by the rapid changes that river dynamics cause in their structure, as well as by the need to have accurate three-dimensional information for the study of river mechanics (Stella et al., 2013). Some successful studies have been done using aerial and terrestrial laser scanning point clouds to classify forest species composition (Torralba et al., 2018). However, the frequency of application of these techniques is limited by their cost. Recently, advances in UAV allow an alternative acquisition of highresolution images with high temporal frequency and at low cost. With that purpose, it is only necessary to equip these systems with a consumer camera to be able to obtain point clouds using photogrammetric methods. Most methodologies for obtaining three-dimensional data from UAV-derived images use structure from motion algorithms (SfM). Products such as point clouds, 3D objects or orthophotos can be obtained from these algorithms (Fernández-Sarría et al., 2017). Usually, maps of species classifications obtained from SfM algorithms are based on the classification of the orthophotos, losing the three-dimensional information and making necessary to triangulate the point cloud, being this process the most demanding from a computational point of view (Michez et al., 2016a). In this regard, a direct classification based on the point cloud would preserve their spatial information, avoiding the meshing process, and saving computing and labour resources.
In recent years, some studies have applied ML techniques in the classification of photogrammetric point clouds (Nevalainen et al., 2017), mainly supervised classifications, but these classifications have not been focused on the classification of species in forest areas, where the complexity of the classification is greater (Zou et al., 2017). In addition, recent studies have shown how point clouds obtained from UAV-derived images allow the estimation of dendrometic variables in riverine species (Carbonell-Rivera et al., 2019). Thus, a direct classification of the point cloud will allow the delimitation of the riverbank species, allowing to obtain additional structural information of each individual. This information would improve the current species distribution maps, providing three-dimensional information to determine when silvicultural action is necessary according to the management plan of the riverine area.
Consequently, the main objective of this study is the development of a new methodology for the classification of riverine species applying ML algorithms for the supervised classification of RGB point clouds obtained from UAV images.

Study area
The selected study area was the public hydraulic domain of the river Palancia, as it passes through the town of Estivella, located in the eastern part of the Iberian Peninsula ( Figure 1). This area of the Mediterranean basin is formed by fluvial terraces from the Upper Pleistocene. The climate is transitional between the coastal Mediterranean and the inland Mediterranean climate, with cool winters and hot summers. Rainfall in summer is low, contrasting with autumn and spring, where most of the annual rainfall is accumulated (500 mm), usually with torrential rains. This part of the Palancia basin has suffered important changes in water flows, due to an important anthropic impact (over-exploitation of aquifers), river damming and related flow manipulation and the endorheic characteristics of the basin that make the riverbed practically dry. This over-exploitation is caused by the existence of 5,010 hectares of irrigated land (Confederación Hidrográfica del Júcar, 1999).

Data collection
Fieldwork was carried out in October 2018, consisting of seven flights using an ATyges FV8 UAV, capturing 1253 zenithal images, covering an area of 43.11 ha. The flight was carried out at an altitude of 120 m. with an average speed of 25.2 km/h, being the normal operating speed recommended by the manufacturer in favourable weather conditions. The Atyges FV8 is a multirotor weighting 3.5 kg, with a maximum payload of 1.5 kg. Its eight brushless motors are powered by two Li-ion batteries of 8,200 mAh, allowing for flights of up to 25 minutes, depending on the payload and meteorological conditions. This UAV is made up of an ATmega 1284P flight controller, which integrates the control electronics and navigation sensors and an Atmel ARM9 microcontroller in charge of the navigation control, as well as a u-blox LEA-6S GPS antenna module, providing metric accuracy. The ATyges FV8 was equipped with a consumer camera, model Sony A5000. This camera is composed of a CMOS sensor Exmor™ APS HD (23.2 × 15.4 mm) of 20.1 MP of resolution. The focal length was set at 16 mm, giving a sensor pixel size of 4 µm for a larger field of view. In-flight, the shutter speed was set to 1/800 seconds and the sensor sensitivity was ISO-100, in order to avoid inconsistency between photographs. The capture of the photographic data was done following the basic data recommendations for the application of SfM algorithms, namely: maximum available resolution, data capture in RAW format to avoid compression of the photograph, and constant focal length and f-number.
Two differential GPS models, Leica Viva GS16 and Topcon GR-5, were used to take 262 Ground Control Points (GCPs), randomly distributed throughout the study area.

Photogrammetric processing
The first part of the methods consists of the photogrammetric processing of the photographs to obtain a georeferenced point cloud. The photogrammetric process was carried out using Pix4D© software version 4.3.31, which is commonly used in photogrammetric surveys with proven efficiency (Niederheiser et al., 2016). The photogrammetric process uses the information captured by the inertial measurement unit (IMU) and the GPS position of the UAV, together with the GCPs to extract tie points, and features present in multiple photographs. In this phase, Pix4D© applies a colour alignment algorithm, accounting for both exposure mismatches between images (global offset and gain per image) and camera-related defects such as vignetting. Tie points are used to calculate the relative positions of the cameras to create a sparse point cloud. The geo-referencing process of the sparse point cloud is performed using the GCPs collected in field, in order to transform the resulting point cloud to the ETRS89/UTM zone 30 coordinate system (EPSG:25830). Once the low-density point cloud is computed and georeferenced, a high-density point cloud based on the relative positions of the sparse points and the locations of the cameras is generated (step 1 on Figure 2). After the densification process, a point cloud of 85,409,557 points was obtained. The geometric error of the point cloud was calculated by obtaining the Root Mean Square Error (RMSE) between the GCPs and the position of the computed 3D point. The RMSE for a given direction (x, y, z) is defined as: Where ei is the error of each point for the given direction, and N the number of GCPs.
The RMSE obtained for the directions x, y and z were 0.015m, 0.013m and 0.021m, respectively.

Normalization of heights
In order to introduce the height variable in the point classification, a height normalization of the point cloud was performed (step 2 of the Figure 2) using the LASTools© software (Isenburg, 2018). This process was divided into two steps. In the first step, it is calculated a Digital Terrain Model (DTM), which starts with the construction of an initial Triangulated Irregular Network (TIN) from the lowest point within the local area. From this TIN, a progressive densification is done in an iterative process, adding more points from the unclassified points to the DTM based on criteria of distance and angle between points. In the second step, once the DTM is created, the normalization of heights is carried out, computing the height of each point above the ground. The ground points were removed from the point cloud, with the aim of classifying only plant species, obtaining a point cloud of 29,347,272 points.

Point cloud classification
In this step, a UAV point cloud classification was applied using ML algorithms. Five different classes were defined for riverine species classification: three classes of plant species (Arundo donax, Tamarix Africana and Pinus pinaster), which are the most representative of the river section studied; the class other riverine, which contains species of small size or with a low representativeness in the area; and the class ground to detect those points that were not identified as ground in the generation of the DTM (e.g., bridges). A Python-based code was created to allow the users to take point cloud training samples of the species (Figure 3). In order to homogenise the data collection, the samples were taken in a rectangular shape, obtaining prisms of the selected data. In this process, a total of 5,054,408 random points were taken throughout the study area from the different classes analysed (  (Hunt et al., 2005;Jannoura et al., 2015). In addition, the colour alignment algorithm applied by Pix4D© implies that there are no differences in the RGB values of the same cover for different shots.
The spectral information chosen were the DN values for the R, G, and B bands, in addition to their normalised difference indexes, which are defined as: 3.3.2 Supervised classification methods: Different ML methods for supervised classification were tested using a Pythonbased code: Decision Trees, Extra Trees, Multilayer Perceptron (MLP), K-Nearest Neighbours (KNN), Random Forest, and Ridge. These methods were extracted from the Scikit-learn library (Pedregosa et al., 2011) for Python.
The accuracy of these methods was assessed by mean score cross-validation. Validation by cross-validation was realised to ensure the independence between training and test data. To perform this analysis, K iterations or K-fold cross-validation were used, dividing the sample data into K subsets. For this analysis the K number was set at ten. This value was not increased in order not to increase the processing time, suffering neither an excessively high bias nor a very high variance in the error rate of the test (Casella et al., 2013). One of the subsets is used as test data and the rest (K-1) is used as training data. This is repeated during k iterations with each of the possible subsets of test data. Finally, the arithmetic mean of the results from each iteration is performed to obtain a single result.
Decision Trees is a non-parametric supervised learning method. This method is a series of conditions organised hierarchically as a tree (Hernández Orallo et al., 2004). The classification of objects is based on questions about the values of their attributes, starting from the root node and following the path, which is determined by the answers of the internal nodes until reaching a sheet node. The class assigned to this sheet is assigned to the objects that have fulfilled the conditions leading to it. In this study, the algorithm on which the Decision Trees were based was Classification and Regression Trees, also known as CART (Breiman et al., 1984). CART constructs binary trees using the feature and threshold that yield the largest information gain at each node. For the execution of this algorithm there were set up different parameters with respect to the default parameters of the library. Gini impurity was applied to measure the quality of a split, choosing the best split. Gini impurity is the probability of incorrectly classifying a randomly chosen element in the dataset. The minimum number of samples required to split an internal node was set up to two, and the minimum number of samples required to be at a leaf node was one.
A Multilayer Perceptron (Rosenblatt, 1961) is a deep neural network classifier. It is based in the existence of an input layer which receives a signal, and an output layer which makes a prediction about the input. Between them, there are an arbitrary number of hidden layers. Except for the input nodes, each node is a neuron using a non-linear activation function. This method uses backpropagation as supervised learning technique for training. The following parameters were used for its execution: the number of neurons in the ith hidden layer was 100, being activated using the rectified linear unit function.
The classification using K-Nearest Neighbours is calculated from most of the features for the nearest neighbours of each point. Each point is assigned the data class with the highest number of values represented within the nearest neighbours. For the execution, the number of neighbours to use was set up to five, and the leaf size to 30.
Random Forest (Ho, 1995) consists of a large number of individual decision trees that function as a whole. Each individual tree makes a class prediction. The prediction that is most often repeated among all the trees becomes the model's prediction. The parameters of execution were the following: the number of trees in the forest was ten and the minimum number of samples required to be at a leaf node was one. To obtain a value of the feature importance in Random Forest, Gini impurity was used.
Extra Trees classifier, otherwise known as "Extremely randomised trees" classifier (Geurts et al., 2006), is a variant of the Random Forest classifier. There are two differences between the two classifiers: Extra trees do not apply the bagging procedure (use repeated sampling -bootstrapping-in order to reduce the variance) to build a set of the training samples for each tree, and also choose a node split very extremely, while Random Forest finds the best split. The parameters used when executing the classifier were: the number of trees in the forest (10), and Gini impurity was applied to measure the quality of a split.
Ridge regression, also known as Tikhonov regularization (Tikhonov et al., 2013), is often used as a method of regularization of problems. Ridge is similar to linear leastsquares regression but reducing parameter estimations in order to improve prediction accuracy, reduce variance, and assist in interpretation. To use this regression method as a classifier, the target values are converted into {-1, 1} and then treats the problem as a regression task.

Segmentation
After applying the machine learning classification methods, and in order to reduce the high spatial irregularity of the point classification, a geometric segmentation of the point cloud was carried out, with the aim of obtaining clusters of points representing different individuals (step 4 of the Figure 2).
The geometric segmentation of the point cloud was performed to segment individual objects in a sequence, taking advantage of the relative spacing between objects. This segmentation method is contained in the li2012 (Li et al., 2012)

algorithm of the lastrees
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) function of lidR (Roussel et al., 2018) package. To achieve the objective of obtaining a single class per segment, the algorithm was parameterised to perform an over-segmentation. This algorithm is based on the use of different parameters to determine whether a point is near or far from existing trees, see page 79 in Li et al. (2012). The parameters used were the following: threshold number 1 of 1.5 m, threshold number 2 of 2 m, a minimum height of a detected plant of 1.5 m, and a maximum radius of a crown of 5 m.

Object-based reclassification
Finally, a reclassification of points was done (step 5 of the Figure  2), based on the mode of the points contained in each segment, thus assigning to each segment the most repeated class of the points within it. In this way, the initial classification derived from a machine learning classifier is considered, and a statistical context model is used to increase the spatial regularity. Thereby, the classification of a given point not only takes into account the feature values of this point, but also the classes (and features) of its neighbouring points, moving from a classification by point to a classification by segment or object.

Evaluation
Evaluation was performed by providing accuracy indices per class and overall, by comparing the class obtained from each segment with the actual class of the object obtained by photointerpretation. In the case where the segmentation was erroneous, the classification of the object was assigned to the majority class within the object. In this process, 10% of the segmented objects was used for testing, i.e., 1,625 objects, evaluating 325 objects per class. These objects were randomly sampled from each class.

Area and volume
In order to obtain the representativeness of each analysed class in the study area, the areas and volumes of each class were calculated. To carry out this process, the study area was divided into grid squares of 15 cm side. If a single point of the analysed class was within the grid, the total area of the grid (225 cm 2 ) was added up. To calculate the volume, the area was multiplied by the average height of the points inside.

RESULTS AND DISCUSSION
In order to analyse the proposed methodology, intermediate results were obtained after carrying out the classification by points using the different machine learning methods. On the other hand, after the reclassification based on segments, the objects obtained and their class were analysed as final results.

Results of ML classification
The choice of the ML method used in the supervised classification was based on the mean score cross-validation and the overall accuracy obtained by each of the methods. Figure 4. shows the results obtained, most of the methods obtained around 0.80 in the mean score cross-validation, except Ridge which remained around 0.35. In this study, the methods with the best mean score cross-validation were MLP, K-Nearest Neighbours and Random Forest, obtaining results similar to Nevalainen et al., 2017. The low performance of the Ridge method is explained because originally this method is used for regressions. This classifier works quite well for text classifications, but it is not properly adapted to the case studied.
After analysing these results, it was decided to choose the Random Forest method to perform the point cloud classification, due to the high scores and low dispersion obtained. The search of ideal predictors is an important step in a classification. In this work the features were obtained directly from the points, but we can also obtain features based on the values of the neighbours in the 3D context, and they can even be obtained from segment-based features.
In this regard, the features used, listed in Table 2, have provided meaningful information to differentiate the required classes. The importance of each variable by Gini impurity (Figure 5), gives an idea of the capacity that each feature has to separate the classes. In this figure the most important variable to discern between classes was the normalised height. Next, the most important features were the normalised difference indexes, mainly NGBDI and NGRDI, with the NBRDI index and the R, G and B having a very similar importance. These results seem to be in line with the literature where indexes NGBDI and NGRDI are usually applied to describe vegetation (Chen et al., 2018), but there are not so many studies that apply the NBRDI index in classifications.

Results of the object-based reclassification
Once the workflow was completed, 16,231 classified segments were obtained. In order to evaluate the process, the 10% of segments were randomly selected to validate the results. On this sample, a geometric validation of the segmentation was carried out, checking that in each segment there was only one class analysed. The geometric validation was carried out on 1,625 objects, showing that 113 of them (6.95%) contained two or more classes inside. In this aspect, a good percentage of success has been obtained, mostly due to the over-segmentation carried out, which had as an objective that the same segment did not contain more than one class. Table 3 shows the evaluation matrix. Analysing the results, the highest values for user's accuracy (precision), producer's accuracy (recall) and F-measure were obtained in the class ground, achieving values of 0.97, 0.94 and 0.95, respectively. These results are attributed to its very specific spectral response. The next highest ranked class was Aleppo pine, obtaining good results in precision (0.91), recall (0.88) and F-measure (0.89). This class is characterised by being the highest species within the study area, as well as having a different spectral response from the rest of the species analysed, which is the reason for the results obtained. In the case of French tamarisk, the results are also remarkable, obtaining a percentage of 83% for the three accuracy indices. In the giant reed species, we found that the lowest statistic was the recall (74%), which explains the existence of false negatives, but on the opposite, the precision got a high hit rate (93%), obtaining a very low number of false positives. The combination of a very low number of false positives, and a slightly higher number of false negatives, indicates that the number of true individuals is greater than the number of predicted individuals. Finally, the class other riverine obtained more discreet statistics, which was expected, due to the mix of species that were introduced as training samples. An overall accuracy of 83.6% was achieved in the classification, where one of the most important steps was the reclassification, eliminating isolated and misclassified points, scattered in the point cloud, and also enforcing the homogeneity of closely connected sets of points. This step substantially improved the classification as the previous labelling was mostly correct. In addition, the over-segmentation realised, did not affect the final classification, since even having a smaller group of points, these contain enough information to explain the class to which they belong. In this aspect, segmentation, introducing context information, and the subsequent reclassification of the point cloud become fundamental in order to unify neighbouring points that have homogeneous characteristics.

Classified as
Other riverine  Table 3. Confusion matrix with precision (Pr), recall (Re) and F-measure (Fm) for the different classes analysed. Column headers are class labels, row names are respective class indices The overall accuracy obtained is comparable with similar studies applied in riverine areas. Adrien et al. (2016) obtained an overall accuracy of 84%, performing a species classification based on RGB and NIR orthophotos obtained from UAV-based photogrammetry. Nevalainen et al., 2017, carried out a classification of UAV-based photogrammetric point clouds and hyperspectral imagery, obtaining with Random Forest and MLP global accuracies close to 95%. In both cases they used the geometric information provided by the point cloud, showing that its use is key to obtain good results in the classification of species. Therefore, under optimal conditions, no additional spectral information would be needed, beyond the information that a consumer camera can provide. In the case of classifying similar species in geometry and spectrum, it would be necessary to add additional geometric (neighbourhoods, object-based features...) and spectral (multispectral or hyperspectral) information to obtain good results.
Analysing the results of areas and volumes occupied by the different classes (Table 4), the predominant vegetation species is the invasive plant giant reed, followed by the French tamarisk. The surface covered by the tamarisk is over 2.5 times that of the pine, but its volume is not, this is due to the difference in width and height between both species. Curiously, the volume for class ground was almost 22,500 m 3 . This class is mostly represented by bridges that were not detected as ground when the height normalization was done, being in these objects where the volume has been overestimated.

Class
Area ( Table 4. Area and volume occupied by class Figure 6 shows the intermediate results obtained after the height normalisation and the segmentation, as well as the final classified point cloud. The latter shows how almost the entire riverbed is invaded by giant reed, as well as some small areas of pine trees, and bridges, which stand out in the initial and final part of the riverbed. The large presence of Arundo donax in this section of the river proves the need of ecological restorations to remove invasive species as harmful as the giant reed, which not only causes environmental damage by displacing native species, but also increases the risk of flooding in the rainy season by accumulation of canes in the riverbeds that block the spans of bridges. This study shows how the areas that are invaded by this species can be delimited with great accuracy, allowing the possibility of carrying out multi-temporal studies that analyse the expansion of Arundo donax in Mediterranean basin areas.

CONCLUSIONS
This study shows that supervised classification of RGB point clouds obtained from UAV-derived images can be used to classify riverine areas. Attending to the results, we can confirm that the classification of point clouds allows obtaining maps of distribution of species with high accuracy. The proposed methodology could help in the management of riverine ecosystems providing additional data to complement, or potentially substitute, traditional riverbank planning, improving the temporal and spatial resolution of the current inventories and reducing their cost. Results were promising, indicating that when conditions are optimal, there is no need to board high-cost sensors on UAVs to classify forest areas. In this aspect, one of the most important aspects to carry out in a classification by points is the subsequent homogenisation of the classified points based on context information to improve the final results.
The proposed workflow is easily transferable to other tasks beyond riverine species which are targeting different classes (e.g., green areas detection, land cover classification, identification of marginal lands). Since the methodology, and the code produced, could be applied to classify any RGB point cloud, being only necessary the training samples as input data.
Furthermore, for future studies new features from objects, and not only from points, can be extracted to enrich the classification input in order to increase its overall accuracy.