The Use of Open Source Software for Monitoring Bee Diversity in Natural Systems: the Beems Project

: This work wants to highlight the results obtained during the BEEMS (Monitoring Bee Diversity in Natural System) project, which the main goal was to answer the following question: Which biotic and abiotic indicators of floral and nesting resources best reflect the diversity of bee species and community composition in the Israeli natural environment? The research was oriented towards the cost-effectiveness analysis of new aerial geomatics techniques and classical ground-based methods for collecting the indicators described above, based only on open-source software for data analysis. Two complementary study systems in central Israel have been considered: the Alexander Stream National Park, an area undergoing an ecological restoration project in a sandy ecosystem, and the Judean foothills area, to the South of Tel Aviv. In each study system, different surveys of bees, flowers, nesting substrates and soil, using classical field measurement methods have been conducted. Simultaneously, an integrated aero photogrammetric survey, acquiring different spectral responses of the land surface by means of Uncrewed Aerial Vehicle (UAV) imaging systems have been performed. The multispectral sensors have provided surface spectral response out of the visible spectrum, while the photogrammetric reconstruction has provided three-dimensional information. Thanks to Artificial Intelligence algorithms and the richness of the data acquired, a methodology for Land Cover Classification has been developed. The results obtained by ground surveys and advanced geomatics tools have been compared and overlapped. The results are promising and show a good fit between the two approaches, and high performance of the geomatics tools in providing valuable ecological data.


INTRODUCTION
One of the major factors in maintaining global human food supply is the pollination (Klein et al., 2007) in addition to the functional integrity of most terrestrial ecosystems (Ollerton et al., 2011).Wild pollinators have been defined as highly effective, often critical contributors to pollination services in natural and agricultural systems (Garibaldi et al., 2013), while colonies of the domesticated honey bee (Apis mellifera) offer their service mainly to crop pollination.Considering all wild pollinators, it is possible to define native bees as the most important pollinator group (Delaplane and Mayer 2000), because many bee communities usually provide more efficient and reliable pollination services than a single pollinator species (Blüthgen and Klein 2011) and are therefore of high conservation priority.The monitoring of these species and the climate change analyses has been emphasized the importance of these species, highlighting the ongoing global declines in both wild and managed environments.This has raised the interest in native bee conservation and restoration and therefore in long-term monitoring programs.However, executing such monitoring programs is a major challenge, as they are labor-intensive, and require high expertise in the collection of bees and their subsequent taxonomic identification.To date, little has been done in developing efficient tools for monitoring bee communities.The current prevailing approach for bee monitoring is site sampling oriented, time-consuming and labor-intensive upon collecting the required amount of data.Another approach is identifying, determining, and measuring which biotic and abiotic indicators of floral and nesting resources best reflect bee species diversity and community composition.Recent advances in the field of remote sensing have allowed performing such measurements parallelly to classical ground methods.The use of Radar technology (Milanesio et al., 2020), UAVs, and image analysis can provide a wide-scale, fast, data-rich digital-platform for developing cost-effective bee monitoring programs.These tools have been widely used for many research activities, starting from subfluvial springs' investigations (Aicardi et al., 2017), thermal analyses (Banding et al., 2012), forestry applications (Aicardi et al., 2016) and hyperspectral measurements (Weinmann et al., 2018).For this reason, the BEEMS project, a project of Scientific and Technological Cooperation between Italy and Israel (Scientific Track 2019), has focused the attention on the development of a technology-based approach for advanced bee community monitoring.The main idea is coupling photogrammetric tools, based on RGB images, with thermal and multispectral data, to develop a multi-scale and multi-temporal platform for monitoring bees and possibly other insect groups.After this introduction, Sections 2 and 3 show the case study considered and the data acquisition, while section 4 describes the photogrammetric approach applied to this research activity.Section 5 is dedicated to showing the results and discussion, while section 6 concludes this article by proposing also some future steps for the next research activities.

METHODS
This section presents dataset acquisition, feature extraction, preprocessing and the structure of the prediction models, including the training procedure with respect to the case studies described in Section 3. Colour and spectral imagery can be used to infer physical and chemical characteristics of vegetation and soil, allowing to develop several models or forecasting, monitoring and management.Classifying the land use and the land cover through maps is one of the most important tasks for several actors.
The proposed method for land cover classification of Very High Resolution (VHR) imagery collected from Uncrewed Aerial Systems (UAS) exploits the benefits of Artificial Intelligence (AI) in extrapolating useful features from heterogeneous geospatial data and clustering geographic maps.In the state of the art, several algorithms of machine learning and image analysis are already implemented in open-source tools commonly applied in remote sensing and earth-observation.Very few works (e.g., Pontoglio et al., 2021) have tried to use, adapt and validate those algorithms on VHR images, in particular in photogrammetric products from UAS (orthomosaic and DSM).This section will provide information on the methods applied in the entire pipeline of map classifications, from the data acquisition, feature extraction, data preparation, pre-processing, training and set-up of the prediction model.Pixel-based and Object-based approaches are applied and compared addressing the benefits and drawbacks of each methodology.Particular attention is paid on the data preparation as well as on validation of the results.The processing and the prediction model for land cover classification have been performed by exploiting the algorithms implemented in ORFEO ToolBox (OTB).OTB is an open-source project for processing satellite images, developed by CNSE (www.orfeo-toolbox.org,accessed 10 April 2019).The suite of libraries offered can be bound both in QGIS software (www.qgis.org,accessed 10 April 2019) and in a Python environment.Together with OTB, scikit-learn (https://scikitlearn.org/stable/) tools have been used in some steps of the analysis.

On-field and photogrammetric data acquisition
The Israeli research group, composed of ecologists and soil chemists, has carried out actions aimed at acquiring ecological indicators with classical field survey methods and analysing them statistically to model bee species diversity.In particular, the onfield exploratory survey of the region under study has allowed to define about 1000 samples to use as training and validation data.Those sample have been selected in each study case, assuring a good spatial distribution.As explained by Congalton and Green (Congalton and Green, 2019), to obtain a statistically consistent representation of the analyzed image, the accuracy assessment requires that an adequate number of samples should be collected per each land cover class.To this end, we considered it appropriate to select a total number of trainer polygons that did not deviate from 2% of the total number of segments of the entire image.Regarding the photogrammetric data, multispectral images acquired from UAS flight serve as input information for deriving radiometric, spatial and three-dimensional information of the area through dense image matching and structure from motion.As for any aerial photogrammetric survey based on UAVs, a flight plan has been designed, assuring lateral and longitudinal overlaps within frames of 80 % in both directions.The flight elevation above the ground level, as well as the velocity of the quadricopter, was settled in order to assure a ground sample distance (GSD) of about 1.5 cm and to avoid motion blurring.The UAS was the same for each case study (DJI Matrice 200), equipped with a multispectral camera (Slantrange 4P+) which integrates multi-constellation and multi-frequency GNSS receivers that allow accurate positioning of the cameras' frames and the direct georeferencing of the photogrammetric block.The data processing has followed the classical photogrammetric pipeline, from feature extraction to image matching and aerial triangulation.The details regarding the georeferencing procedure and the residual errors on the GCP are reported in Section 3. The only change between classical methods is related to the data preparation and, in particular, the reflectance calibration of the multispectral data.As is known, changes in sunlight conditions generate a strong variability in the spectral and colour response of the acquired surface unless proper calibrations are applied.Slantrange P4+ camera incorporates an on-board incident solar light measurement sensor, a range measurement and a geopositioning sensor to provide necessary input data to the on-field calibration algorithm.For each reflectance data acquired in a specific discrete spectral frequency range with the photo sensor, the algorithm allows correlating the spectral measurements of the ambient at the given time of acquisition, as well as the incident angle of the light source.The algorithm, as well as the implementation in the commercial sensor, are presented in Figure 1.The Raw Slantrange images have been calibrated radiometrically, spectrally and geometrically by Slantview black box software and exported as multiple single-band .tif in 8-bit format.Some analysis has been made in order to evaluate the data missing behaviour of the calibration tool and assure the replicability of the analysis.

Feature extraction and data preparation
The production of very high scale digital cartography allowed the extraction of the necessary data for training the proposed Artificial Intelligence model.These data were applied to two different approaches for automatic land cover classification.The first approach was based on unsupervised classification at the pixel level, while the second approach is based on object classification, i.e. vector polygons describing the boundaries of a real object.The algorithms operate differently on these two types of data.In fact, in the pixel-based approach, they are applied at the level of the single pixel, while in the object-oriented approach, they are applied to groups of homogenous pixels for a given feature.The implementation of all the training and validation phases of the proposed models was based on Python programming language using open libraries for data management (shapely, raster) and learning (sk-learn).The segmentation of the input data is fundamental in the approach in order to define the objects to be classified.Therefore, the OTB library was applied.
For object-oriented classification, we proceeded to apply automatic segmentation algorithms based on the analysis of multi-band spectral variability.In particular, the Large-Scale Mean-Shift (LSMS)-segmentation algorithm is used, producing a clustered image in which the pixels around a target pixel that present similar behaviour from both the spatial and spectral points of view are grouped together.The LSMS-segmentation algorithm is a non-parametric and iterative clustering method introduced in 1975 by Fukunaga and Hostetler (Fukanaga and Hostetler 1975).It enables the performance of tile-wise segmentation of large VHR imagery, and the result is an artefactfree vector file in which each polygon corresponds to a segmented image containing the radiometric mean and variance of each band.It is based on four steps: LSMS-smoothing, LSMSsegmentation, LSMS-merging and LSMS-vectorization.The final step procedure vectorizes these clusters in order to obtain objects from which it is possible to compute the geometrical features (roundness, elongness, etc.).Finally, the operator associates a label to each vector for the dataset's generation.
The dataset preparation is a fundamental step in ML.Considering the aims of this work in using UAS photogrammetric products to estimate land cover, two different test sites with peculiar characteristics for the interested region were selected.In those areas, direct measurements were made (MSI, coordinates of ground points, indicators), and from them, meaningful synthetic features were extracted.The object-oriented approach was applied for the Alexander Stream National Park site while the pixel-based approach was applied on the Judean Foothills area.Effective use of features as input data for a classification procedure can improve classification accuracy.Thus, the classification dataset was enriched with derivative features, namely, spectral-, textural-, and statistical-based features which can be descriptive of the object to classify.The of the ground cover classes has been assessed by the ecologist expert which have identified the most important indicator to monitoring bee diversity.The description of these classes is reported in Table 1.Knowing the classes and consequently identifying possible beneficial features, the raw MS images have been used to derive the aiding features.Table 2 reports the used measures for segmentation and classification.For each segment, the mean, the standard deviation, the median, the variance, the skewness and the kurtosis of the spectral, histogram and textural features were computed.The elevation is an important feature obtained by the photogrammetric procedure as well as the geometric-derived information.All those features have been stacked in an input data frame composed of thousands of samples/objects and hundreds of features.In image-based classification, independent features highly correlated with dependent features must be maintained, while independent features highly correlated between them must be eliminated.Therefore, once selected, they were subjected to importance and dimensionality reduction procedures in order to extract only the features with the highest weight for the given learning process.Firstly, the features were subjected to correlation analysis, reducing the features highly correlated in the function of a given threshold, then a feature importance analysis was addressed using the Gini criterion as described in Section 2.3.

Herbaceous plants
Plants

Machine Learning algorithms and Validation
The ML algorithm that has provided the best results for the present task is the Random Forest classifier.Random forest is an automatic learning algorithm which consists of a set of decision trees that, thanks to a specific criterion, create several subsets of trees, each providing a classification result.The labels assigned to each input sample are assigned from the trees with the most votes.The RF classification was performed in a Python environment using Pandas, NumPy and Sklearn libraries.
Random Forest classification model (Breiman, 2001) was trained using both sample and object datasets.The criterion used for the node splitting is the Gini (Hastie et al., 2009).The GINI gain is defined as the sum of impurity decreases from two nodes and the parent node.The GINI is calculated for each variable of the classifier.This parameter is a proxy of the importance of each feature within the model: the variables that have high GINI gain (so they have less impurity) are more important.Features that have GINI importance less than the median values of the feature importance values were excluded from the classification.The validation was performed in two steps.Firstly, the classical metrics to evaluate the model, then the comparison with the ground truth data acquired by the ecologist.The classification results were carried out using different validation metrics such as Precision, Recall, F1 score, etc.

CASE STUDY
The study involved two complementary study systems in central Israel (Figure 2), the Alexander Stream National Park, an area undergoing an ecological restoration project in a sandy ecosystem, and the Judean foothills area, to the South of Tel Aviv, which is characterized by an agro-natural landscape on vertisols.

Analysis of Slantrange on-field calibration
Once the collection is done, the imagery is uploaded to Slantrange calibration software, using the information from the ambient irradiance sensor and the positioning, timing and attitude information of the camera drone and performing the reflectance calibration as well as the pixel crops to stack the 6 optical sensor images.As the algorithms are patented and not publicly available, the algorithm's performance has been analyzed in terms of the quantity of information extracted from the raw data.

Photogrammetric results
The photogrammetric processing results are several digital products that contain radiometric, texture, spectral, and spatial information.These products are:  6. Summary of processing parameters and accuracy results of the Judean foothills site's digital products.

Segmentation
Tuning the LSMS-segmentation algorithm is quite complex due to the original implementation, which was mainly focused on remote sensing images and the spatial variability of the objects defined in the class statement.Therefore, in order to define the most suitable parameters to segment the two study areas, several tests have been performed, and the results are compared.The LSMS-smoothing algorithm requires defining two parameters called spatial r and range r.The spatial r corresponds to the spatial radius.A higher value will lead to stronger smoothing and will also take more time.Range r is the spectral radius which means how pixels in the spatial radius and with close radiometry values will be averaged.A higher ranger will increase the smoothing effect.After that, the LSMS-segmentation application produces a labelled image where neighbour pixels whose range distance is below range radius ranger (and optionally spatial distance below spatial radius spatial r) will be grouped together into the same cluster.The parameters tested for tuning the algorithm are reported in Table 7.The final used parameters are highlighted in the table and represent the most satisfying parameters for segmenting our areas of interest.An example of orthomosaic segmentation is presented in Figure 4.

Feature extraction and dataset preparation
Identifying the features of greatest interest in describing the classes is necessary, which can lead to better classification accuracy.The features are the attributes of the pixels or objects derived from the digital images, from the multiple acquisition bands, their spectral response, their combination, and the textural, spatial, and shape information.
Using the implemented algorithms in OTB 41 features have been computed for the pixel-based sample and for object based-sample.Then, QGIS statistical algorithm operating on vector polygon was applied in the Alexander stream objects, reaching a final number of 129 features.The dataset array has been composed as follows: 1011 rows and 41 columns for the pixelbased classification; • 1053 rows and 129 columns for the object-based classification.Staring from these datasets array, a Pearson correlation analysis has been performed, and the features with a correlation score of 0.9 have been dropped.The new dataset is composed of:

Classification and accuracy results
The experiences described in previous works (Trisasongko et al., 2017) (Immtzer et al., 2016) show that the default values of the OTB parameters for training and classification processes seem to provide optimal results.For this reason, the default value parameters for the RF algorithm have been used, in particular, the maximum depth of the tree was set to 10, while the maximum number of trees in the forest was fixed to 1000.The class weight was placed equal to the parameter 'balanced_subsample' while the criterion parameter was settled equally to 'Gini'.The results of the validation for pixel-based and object-based approach are expressed in Table 8 and Table 9, which shows the accuracy metrics of the classification procedure, in particular the F1-score, Precision and Recall values of each class.Figure 7 and Figure 8 represent the confusion matrices.Generally, the quality of the classification is good.Indeed, the overall accuracy of the RF model is 0.92 for the sample-based approach and 0.79 for the object-based approach.The difference between the two approaches can be assumed to be related intrinsically to the object-based method, in which each vector polygon is expressed by its feature mean value and therefore injects some spectral response noise in the computation.Considering

CONCLUSIONS
This work involved two complementary study systems in central Israel, the Alexander Stream National Park, an area undergoing an ecological restoration project in a sandy ecosystem, and the Judean foothills area, to the south of Tel Aviv, which is characterized by an agro-natural landscape on vertisols.A measurement campaign was conducted by the Israeli and Italian teams.We conducted surveys of bees, flowers, nesting substrates and soil in each study system, using classical field measurement methods.Simultaneously, we performed an integrated aerophotogrammetric survey, acquiring different spectral responses of the land surface through UAV imaging systems.The results of our analysis are summarised hereinafter.We found three focal habitat characteristics (indicators) shaping bee communities; the amount of bloom, perennial cover, and extent of bare ground (used by many bee species for nesting).VHR orthophotos and DSM have been produced with rigorous photogrammetric methodology and with accurate georeferencing.The multispectral sensors have provided surface spectral response out of the visible spectrum, while the photogrammetric reconstruction has provided three-dimensional information.Thanks to Artificial Intelligence (AI) algorithms and the richness of the data acquired, a methodology for Land Cover Classification has been developed.The results obtained by ground surveys and advanced geomatics tools have been compared and overlapped.The results are promising and show a good fit between the two approaches, and the high performance of the geomatics tools in providing valuable ecological data.

Figure 1 .
Figure 1.The workflow of the calibration procedure implemented by Slantrange and its relation to the sensor components.
The final dataset array has been prepared and each input feature was scaled using scikit-learn pre-processing MinMaxscaler(Pedregosa et al., 2011), which resizes each feature on the minimum and maximum values.At this point, the dataset has been split into training and testing (70% training -30% test).The results of the data preparation are a training and testing dataset arrays where the prediction variable constituted the Y dataset and the X is composed of the features, both sample-based and objectbased.

Figure 2 .
Figure 2. The two areas investigated in this study in Israel.In each study system, surveys of bees, flowers, nesting substrates and soil, using classical field measurement methods have been conducted.Simultaneously, an integrated aerophotogrammetric survey, acquiring different spectral responses of the land surface by means of UAV imaging systems has been performed.The acquisition of the indicators identified in the planning phase took place through several measurement campaigns conducted in the period between February 2020 and April 2020 located in two

Figure 3 .
Figure 3. Instruments used during the survey's campaigns.

Figure 4 .
Figure 4. Example of LSMS-Segmentation algorithm result.The green objects are vector polygons of the Alexander Stream National Park subject to regeneration policies.

•
1011 rows and 21 columns for the pixel-based classification;• 1053 rows and 47 columns for the object-based classification.After this feature dropping, a feature importance analysis was addressed using the Gini criterion during the random forest training.Figure5shows the importance analysis for the 21 features used in pixel-based classification, while Figure6shows the importance of the 47 features used in object-based classification; the features that have an importance Gini score greater than 0.02 have been extracted, and the model has fitted on this new decreased dataset.

Figure 5 .
Figure 5. Importance analysis with Gini criterion for the pixelbased classification.

Table 1 .
Class labelling for LC classification based on ecologist expert directives.

Table 2 .
Features selected for the classification input dataset in object-based approach, computed using ORFEO toolbox.

Table 3 and
Table 4 show the image extraction results for both case studies.

Table 5 .
The results are summarized in Table5andTable6for Alexander Stream National Park and the Judean foothills.Summary of processing parameters and accuracy results of the Alexander Stream National Park site's digital products.

Table 7 .
Spatial radius and range radius parameters tested to obtain the most effective segmentation result.

Table 8 .
Table8, the classification in pixel-based samples is well-performant in almost all classes, as demonstrated by the F1-score (0.78 as minimum value).Class 6, i.e. rocks and stones, has no predicted sample, which is due to the very low number of support data.The lowest accuracy is related to classes 4 and 6, bare soil and rocks, respectively.In this case, the number of false-positive increases probably due to the similarity in spectral response and the unbalanced number of objects supporting class 6. Accuracy metrics calculated for pixel-based approach.
Figure 7. Normalized confusion matrix for pixel-based RF classification.

Table 9 .
Accuracy metrics calculated for object-based approach.Figure 8. Normalized confusion matrix for object-based RF classification.Figure 9. Extract of land cover map obtained with methodology developed by the geomatics group of the Politecnico di Torino.