DETECTING AND COUNTING ORCHARD TREES ON UNMANNED AERIAL VEHICLE (UAV)-BASED IMAGES USING ENTROPY AND NDVI FEATURES

Multispectral images were acquired by a camera on board an Unmanned Aerial Vehicle (UAV) over two apple orchards in Prince Edward Island in summer 2016. A method was developed to automatically detect rows of planted trees and trees in each row using the normalized difference vegetation index (NDVI) image and its related entropy and variance images. The image-based tree position was then compared to the actual tree location measured in the field. We achieved an accuracy of 93% between the estimated and measured number of trees in both orchards. * Corresponding author


INTRODUCTION
Images acquired by cameras on board Unmanned Aerial Vehicles (UAV) can be obtained much cheaper compare to airborne and spaceborne images as it does not require a professional pilot and a complicated logistic to fly. These images have also a better spatial and temporal resolution than airborne and satellite images. One of the domains that will benefit the most from UAV-based images is the agricultural sector. Indeed, precision agriculture protocols require detailed spatial information on agricultural fields to help growers to manage crops.
Over orchards, updated information on the planted trees is required for effective treatments and crop insurance purposes. For example, crop insurance agencies need to know the exact number of planted trees to estimate the insurance cost and to assess tree damages in a fast manner. Fieldwork to assess tree damages and count trees is laborious and expensive. Satellite and aerial images can be used. For example, Santoro et al., (2013) applied a local minima aggregating filter on GeoEye_1 images to detect citrus trees. WorldView-3 images have been used tree detection in mango orchards (Rahman et al., 2018). However, Satellite or aerial images may not have the required spatial resolution. By contrast, cameras on board UAV can provide high-resolution images at low cost and almost any time. Cloud coverage may decrease the quality of UAV-based images, but images can still be acquired, as UAVs are usually flying below the clouds. This is not case when bad weather such as rainy, windy, or dark weather occurs.
UAVs have already been employed over treed areas. Nonforested areas in tropical rain forests were mapped using a simple threshold applied to excess brown index and excess green index images computed from images acquired in the visible bands (Cruz et al., 2017). Comba et al. (2015) used a Hough Transformation over an NDVI image to detect vineyard rows, but they were not able to detect the individual plants in the rows. Rokhmana (2015) visually interpreted UAV images acquired in the visible bands to count palm-oil trees. Palm trees were detected over RGB images either using histogram of oriented gradient features of palm trees derived in an SVM classifier (Wang et al., 2019) or . the combination of Hough transformation and morphological operations (Al Mansoori et al. 2018). RGB images were also used to detect trees in the urban area using the entropy image and fitting a circular shape over the tree region (Hassaan et al., 2016). Zarco-Tejada et al. (2014) used automatic 3D photoreconstruction to estimate tree heights in olive orchards from UAV images acquired in the green, red, and NIR bands. The method does not allow detecting individual trees. Only a tree height map is produced from the UAV images and compared to a digital surface model (DSM). To our knowledge, there is no study to present a method allowing automatic detection and counting of trees in orchards over UAV images. It is also important that the developed method allows detecting trees of a high variety of tree species that have a high variety of crown shapes.
The present study presents a method that detects orchard tree rows and trees in each row using multispectral images that were acquired by a camera on board a UAV over two apple orchards in Prince Edward Island in summer 2016. The study also uses field observations that consist of the positions and apple varieties for each tree in each orchard.

MATERIALS AND METHODS
The experiment was conducted during summer 2016 over two apple orchards in Souris, PEI, Canada (Lat. 46.44633N, Long. 62.08151W). Images of the orchards were taken with a MicaSense (MicaSense, Seattle, Washington, USA) multispectral camera. It has five sensors, one sensor for each of the following spectral bands: B1 (blue) centered at 475 nm; B2 (green) centered at 560 nm; B3 (red) centered at 668 nm; B4 (REP) centered at 717 nm, and B5 (NIR) centered at 840 nm. Each sensor records digital numbers (DN) between 0 and 4048 using frames having 1280×960 pixels.
The camera was mounted under a UAV that has been developed by A&L Canada. Its weight is slightly less than 2.0 kg. Both the camera and the UAV were connected to mission planner software to fly at 100 m above the ground with 70% overlap between adjacent images. Wind speed during image accusation was less than 20 km/h and the weather condition was sunnycloudy. In total for both fields, 870 images (174 set of 5 images) were successfully captured.
A Magellan Mobile Mapper (Magellan, San Dimas, CA, USA) and two 30 M fiberglass measuring tapes were employed to record the tree positions for evaluating the accuracy of the detection method. The accuracy of the Mobile Mapper is around 1 m. The accuracy of the measuring tape reading is about 5 cm. For the measuring tapes, the distance between trees was converted into trees coordinates by interpolation from the recorded position of the start and the end of each tree row. The fieldwork for recording tree positions was conducted by three persons in two days, so, only the tree position was recorded in 70% of tree rows, which correspond to a total of 2757 trees, i.e., 1523 trees for orchard#1 and 1234 trees for orchard#2.
The flowchart of the proposed method is shown in Figure 1. The UAV images were first orthorectified and mosaicked together using Pix4D (Pix4D, Lausanne, Switzerland) software. Pix4D can produce a mosaic with high accuracy for flat fields. However, the mosaicking accuracy drops when the height of an object in the image is relatively high compared to the ground. In our case, this problem does not occur as the flight height (100 m) was much higher than orchard trees (<3 m). The digital number (DN) values of each mosaicked image were not converted to reflectance but used to compute a vegetation index image and two related textural images.
Vegetation indices (VI) are algebraic combinations of two or three spectral bands that allow better discrimination of vegetated areas, such as treed areas. The vegetation index image that was considered in the study is the normalized vegetation index (NDVI) image, which is computed from the red and nearinfrared images as follows (Rouse et al., 1974) (Equation 1). (1) Where DNNIR and DNred are digital number values in the nearinfrared (NIR) and red bands, respectively.
Textural features that represent patterns in images were also computed. To this end, the gray level co-occurrence matrix (GLCM) method of (Haralick et al., 1973) was applied to the NDVI image to compute the entropy image as follows: ( 2) Where ( 1, 2|ℎ, ) is the relative occurrence of pixels with ϕ1 and ϕ2 NDVI values within space of h and rotation of θ. The entropy is low when there is no texture, such as over roads or smooth surfaces and it is high when there is a texture, such as over treed areas or rough surfaces. The NDVI image was also used to compute a local variance image (Fabijańska, 2011) as follows: (3) Where ̅ is the mean DN value of all DN values (x) of an image within a window. The variance is low for homogenous areas, such as grasslands, and is high for heterogeneous areas, such as forests or orchards.
The planted tree rows were detected as follows. First, the treed regions were determined on the NDVI entropy image using a threshold value of 0.9, as trees are associated with heterogeneous areas, which have high entropy. Then, a line was fitted to each detected treed region by the least square regression method. The fitted lines represent the planted tree rows in the orchard. A Harris corner detection (Gonzalez & Woods, 2008) was applied to the treed region image to find local DN maximums that correspond to the trees. Such Harris transformation detects corners by applying an edge detection algorithm in the x-and y-direction. Using the localized points, the occurrence of the tree points in each row is calculated. For the rows with the highest occurrence, a Euclidean distance algorithm was applied to compute the distance between trees in a row. The calculated distances were averaged. This distance is supposed to be a constant across the orchards, as the grower plants trees using the same distance between trees.
A rectangular moving window was then defined. It has as width 1.2 m and as length the distance between trees in a row as determined in the previous step. Trees correspond to NDVI >0.4 and entropy >0.5 because trees are chlorophyllous bodies that have high NDVI values and are heterogeneous with high entropy. When the window meets both conditions in the image, the center pixel is considered as a tree location. The NDVI values between 0.05 and 0.4 can correspond to either small trees or grass. In this case, to discriminate between trees and grass, we use the NDVI variance image. The variance for grass is less than 0.2, but higher than 0.2 for small trees. When the location of the moving window corresponds to a variance higher than 0.2, then the location was marked as a tree location. When multiple points of the moving windows correspond to high entropy values, the local entropy minimum was used to separate the points as follows: a high entropy corresponds to a single tree, but an entropy that increases then decreases and then reincreases corresponds to two trees. Such method helps to find trees that are planted in a small distance as entropy in tree location are high. All the detected tree positions are then converted into a shapefile for further analysis. The shapefile has as attributes the tree position, the orchard number, and the row number.
The total tree number of each resulting orchard map was then compared to the number of trees measured during the field survey. This tree counting and positioning were done for 70% of the rows in each orchard. This comparison allowed to compute the following accuracy (Equation 4): (4) Where |d| is the absolute value of the difference between the number of detected trees and the number of measured trees (T) during the fieldwork that was determined only over 70% of the tree rows in each orchard.

RESULTS
In both orchards, there were 21 species of apple trees planted in 49 rows. The species were Gala, Ginger Gold, Jona Gold, Honey Crisp, Spartan, Cortland, Sunrise, Gravenstein, Spygold, Alexander, Northern Spy, Russet, Cox Orange Pippin, Honey Gold, Virginia Gold, Macoun, Ambrosia, Jana Gold, Kestrel, Nova Spy Tydeman's Red, and Silken ( Figure 2). Most rows have one species, but several rows have a mix of species. Some tree species have a large canopy such as Cortland, while others have a smaller canopy, such as Jona Gold. The measured number of trees per species is presented in Figure 2 for both orchards. The UAV red and near-infrared mosaics were used to compute an NDVI image, such as the one of Figure 3 for orchard#1. Figure 3. NDVI image computed from the red and NIR mosaics over orchard#1.
The NDVI image was then used to compute the NDVI entropy image, such as the ones in Figure 4 for Orchard#1. The NDVI entropy image was used to detect tree regions using a threshold NDVI entropy value of 0.9 to map the treed regions, such in Figure 5 for orchard#1. Once the tree regions have been detected, they were dilated by a dilation filter (Gonzalez & Woods, 2008), and the treed regions having less than <50 pixels were automatically removed to increase the speed of the processing and decrease the noise. A least-square fitting regression applied to the treed region image allowed to automatically delineate the orchard rows, such as shown in Figure 6 for orchard#1. Figure 6. Orchard rows (lines) detected in the treed region image for orchard#1.
A Harris transformation (Gonzalez and Woods 2008) was applied the treed region image to detect the local DN maximum that corresponds to trees in the rows, such as the one of Figure 7 for orchard 1.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) Figure 7. Harris transformation of the treed region image for Orchard#1. The red dots represent local DN maxima that correspond to trees.
The Harris transformed image allowed deriving the distribution of tree occurrences in each row, such as the one of Figure 8 for orchard#1. Figure 8. Distribution of tree occurrence as a function of the row for Orchard 1 as derived from the Harris transformed image. Figure 8 shows that the highest number of detected trees occurred in rows #20 and #28. For these rows, a Euclidean distance between trees was computed and used to define the length of a moving window that was then used on the treed region image to find tree locations in each other row. The resulting tree locations were then exported as a shapefile to be used in ArcGIS. In total, 2554 and 1410 trees were detected for orchard 1 and orchard 2 (Figure 9), respectively.
For each surveyed tree row, the number of image-detected and field-measured trees is compared in Figure 10a, for orchard#1 and in Figure 10b, for orchard#2. The overall tree detection accuracy was 99.0% and 99.5% for orchard#1 and orchard#2, respectively, and the average accuracy was 91.1% and 97.7%, for orchard #1 and orchard#2, respectively. For most rows, we found a good correspondence between both tree numbers. However, the accuracy dropped to 79% for row#8 and 73% for row#29 of orchard 1 (Figure 10a). In row#8, the species is HoneyGold. The tree height for this species is too small to be detectable by the method and the tree planting distance was also smaller (1.2 m). Consequently, for this row, our method detects only one tree, where there were two trees, which appear as being a single tree. The trees in row#29 are too small and the pixel is dominated by the signal from the supporting pole. The value of NDVI for these trees is negative, as the tree does not have enough leaves.

CONCLUSIONS
A MicaSense camera was mounted under the light Unmanned Aerial Vehicle (UAV) of A&L Canada Labs Inc. to collect multispectral images over two apple orchards in Prince Edward Island in summer 2016. The images were georeferenced and mosaicked together. Each mosaic was then converted into an NDVI image, an NDVI entropy image, and an NDVI variance image. Trees were located using a moving window with the following values: NDVI >0.4 and NDVI entropy >0.5, For NDVI values between 0.05 and 0.4, trees correspond to an NDVI variance higher than 0.2. The resulting image-based tree number was then compared to the actual tree number obtained from fieldwork and the accuracy was higher than 93% for most tree rows of each orchard. The method has difficulties to detect small trees or trees that are planted too close to each other. While giving promising results, this novel method can be further improved to be able to detect small trees or trees that are planted too close to each other.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) Figure 10. Number of image-detected trees vs the actual number of trees for a) Orchard 1 and b) Orchard 2. The figure represents the accuracy between the imagebased and measured tree numbers for each row.