ABSOLUTE 3D ACCURACY ASSESSMENT OF UAS LIDAR SURVEYING

Surveying an area with small, unoccupied aerial systems (UAS) equipped with a lidar mapping payload—absent permanent, stable, geometrical reference surfaces—demands accurate, repeatable data collection procedures. While relative error within a single UAS lidar dataset may reveal itself in strip misalignment, absolute error (particularly horizontal error) can prove more difficult to detect, casting doubt upon the quality of both individual surveys and time change analyses of multiple surveys of the area. To gain insight on the UAS lidar error budget, this study presents an analysis of multiple UAS lidar surveys over a set of accurately surveyed geometric checkpoints. Each flight’s trajectory was processed multiple times using multiple static GNSS base observations, both autonomous and set over surveyed monuments, at varying distances from the study site. Custom algorithms were used to mensurate the geometric targets detected in each UAS lidar survey's point cloud, allowing for precise comparison of both absolute horizontal and vertical accuracy of each survey against the rigorous ground survey. The results of the analysis suggest that high horizontal accuracy can be achieved under a variety of conditions, whereas vertical accuracy is sensitive to the quality of ground control. and a discussion of the results explores the ultimate goal of isolating and understanding the sources and magnitudes of error in the UAS lidar error budget.


INTRODUCTION
The popularity of UAS lidar mapping continues to grow, implying a growing reliance upon the method in both research and industry. For many UAS lidar surveys, two of the main components used are considered survey-grade (i.e highly accurate, tactical-grade IMUs and geodetic-grade GNSS receivers); however, some of the commonly used lidar sensors, such as the Velodyne family of rotating multibeam sensors, are relatively inexpensive and known to researchers as noisy and unstable to a degree which may preclude one from asserting that a UAS lidar mapping product is survey grade. While the main source of error may be the sensor, ascertaining the absolute accuracy of a UAS lidar system must be approached holistically, examining the effects of the static GNSS base station, the trajectory solution, and the resulting point cloud characteristics. Analyzing the accuracy of a lidar point cloud typically relies upon flat, rigid surfaces visible in the study area, such as roofs or paved surfaces. But as the prevalence of UAS lidar grows, so too does the scope of the surveys undertaken by practitioners. As the method gains popularity in forestry, geomorphology, and ecology, to name a few, the likelihood decreased of human-made reference structures existing in the scene.
While vertical-only checkpoints on non-geometrical surfaces (assuming they are not occluded, e.g. by vegetation or tree canopy cover) can be surveyed in the field and checked against a point cloud with relative ease, the problem of checking the horizontal accuracy of the point cloud remains. The solution to this issue in the natural environment is to deploy geometrical targets in the scene. A few approaches using intensity-based, flat targets have been proposed with some success (Csanyi and Toth, 2007;Wallace et al., 2016). These approaches are less reliable in the horizontal than the vertical due to the geometry of the flat targets (Wallace et al., 2012), and their reliance upon intensity is problematic considering the observed low quality of intensity * Corresponding author (halassiter@ufl.edu) readings in the popular Velodyne family of rotating multibeam sensors (Kidd, 2017).
A recent solution for an artificial geometrical target for use in a natural scene is a lightweight, foldable pyramid that allows mensuration in the point cloud based on structure alone. The pyramid also has a reference point-the apex-which can be both mensurated and surveyed in the field to obtain not only its vertical but also its horizontal position (Wilkinson et al, 2019). Thus a true 3D assessment of the point cloud can be undertaken, which is not often practiced (Kim et al., 2019).
One method of robust assessment of the accuracy of a point cloud would be to compare the positions of mensurated geometrical features to their ground surveyed positions. Such an assessment would be limited in scope by the accuracy of the ground survey. Many studies of UAS lidar accuracy use RTK GNSS (Bakuła et al., 2017) or terrestrial laser scanning (TLS) of unknown provenance (Jozkow et al., 2016) as the basis of comparison; however, best practices dictate that the reference data be at least four times more accurate than the data product whose accuracy is being assessed.
The calibration of the hardware on the lidar mapping payload is a crucial component of the UAS lidar error budget (Habib), particularly the position and attitude of the lidar sensor with respect to the INS frame. The boresight and leverarm parameters of the components of the lidar mapping payload are nominally unchanging, but experience with Velodyne sensors has shown that the position and attitude of the scanner head can change over time.
The primary objective of this study is to present a method of analysing the absolute accuracy of a UAS lidar point cloud which accounts for the issues mentioned above, and to apply the method to the UAS lidar mapping system on hand (Section 2.2). The absolute accuracy is deduced from a comparison of the mensurated position of geometric targets within the UAS point cloud to the conventionally surveyed position of those same targets. The validity of this comparison is supported by rigorous analysis of the accuracies of the conventional survey and the ground control upon which it is based, and the UAS lidar system, i.e. the trajectories (or navigation solutions) and the static GNSS base stations used to solve for them, the boresight and leverarm components of the payload, and characteristics of the laser returns. The comparison takes place the ground survey of the geometric targets and a total of 18 UAS lidar datasets were compiled from three flights and six independent static GNSS base stations.

Study site
The study was conducted at the University of Florida/IFAS Plant Science Research and Education Unit near Citra, Florida, USA. The site was chosen for safety of operations, ease of access, open skies for quality of GNSS observations, and the large available area. (One of the study objectives was to ascertain the absolute accuracy of a typical UAS lidar survey; therefore, the flights were designed to have flight lines of reasonable length.) Figure 1. Overview of study site. Control points A-F (magenta) labelled, and target locations shown as white triangles. The area in the southeast corner was avoided so as not to disturb another experiment in progress. (Base imagery courtesy Google Earth)

Materials
The UAS lidar mapping payload used for this study was the Phoenix LiDAR Systems SCOUT-32, an integration of a Velodyne HDL-32E laser scanner and a NovaTel GNSS/INS navigation system. The HDL-32E is a rotating multibeam sensor with a 41.33° field of view which rotates 360° with respect to its vertical axis, with typical ±2 cm range accuracy and 0.1 -0.4° angular resolution. The NovaTel navigation system provides 1 cm horizontal and 2 cm vertical accuracy and 0.1 mrad (roll, pitch) and 0.33 mrad heading accuracy (RMS).
All static GNSS observations were made with one of six identical Topcon HiperLite+ GNSS receivers. The HiperLite+ is a dualfrequency, GPS+GLONASS receiver with multipath mitigation, and exhibits a static survey accuracy of 3 mm + 0.5 ppm (× baseline length) horizontal and 5 mm + 0.5 ppm vertical.
The horizontal ground control and target surveys were conducted with a Topcon GPT-3005 total station and retroreflector prism. The GPT-3005 touts a 3 mm + 2 ppm range accuracy and angle measurement capabilities of 1" precision and 5" accuracy. The vertical ground control and target surveys were conducted with the Leica Sprinter 200 digital level with aluminum barcode staff for electronic readings, which exhibits height measurement accuracy of 1.5 mm/km (1 ) and 0.6 mm per electronic reading.
The targets used for the study were corner-cube, trilateral pyramids approximately 1.1 m along each base and about 0.4 m above ground at the apex. The function of these targets is to allow for precise mensuration of the position of the reference pointthe apex-both from a dense, 3D point cloud and from a ground survey with total station and digital level.

Ground survey of control and targets
To survey the targets accurately and reliably for comparison against the UAS lidar survey, a network of ground control was established first. A total of six 60 cm × 1.25 cm iron rods with scored tops were each set about 3 cm below the surface of the natural ground around the perimeter of the study area ( Figure 1). A rigorous GPS 2 network survey was then conducted to obtain the initial positions of these six control points. Three sets of fivehour, co-temporal static observations over each monument were collected 2020 August 12, 13, and 14, for a total of 18 static sessions.
For each observation on each day, the same receiver was used over the same monument. The tripods were left in place, and the antennas and tribrachs were reset at the beginning of each session. This resulted in the same slant height (therefore the same antenna reference point height) for each observation over each monument, which was verified at the beginning and the end of each session.
The set of static sessions were adjusted using the Online Positioning User Service (OPUS), a tool developed by the National Geodetic Survey (NGS) to allow users to receive via email the geodetic coordinates of a static GPS observation. OPUS-Projects is a more sophisticated software that allows processing and adjustment multiple static sessions while allowing the user to interact with, visualize, and manage the adjustment process. Adjustment of the static observations is done with respect to Continually Operating Reference Stations (CORS) in the immediate area of the study site, providing both a network of baselines to known points and a means of applying tropospheric corrections. A total of nine CORS were suggested by OPUS-Projects as suitable and thus used for the adjustment.
The control points were also surveyed via total station and digital level on 2020 August 19 and 20. Together with the solutions provided from OPUS-Projects, two unified, weighted least squares (WLS) adjustments-one horizontal and one verticalwere conducted to determine the final coordinates to be held for the control points. The weights for each observation were derived from the OPUS-Projects solution (GPS) and the manufacturer specifications for the total station and digital level. The adjustments were conducted via custom Python code and Microsoft Excel spreadsheets.
On 2020 August 18, the day of the UAS lidar collections, a total of 28 geometric targets were placed in the scene and secured in place with steel spikes. The horizontal location of each target was measured via total station (angle and distance) from at least two setups, and a series of level runs from one control point to the next were run through the targets such that each target's elevation was measured at least twice. The final held coordinates of each target was solved by averaging these observations.

UAS lidar data collection and processing
2.4.1 Collection. On 2020 August 18, three UAS lidar data collections were conducted over the study area at nominal heights above ground level (AGL) of 40 m, 50 m, and 60 m. Each of the three flights were flown in a typical boustrophedonic pattern with two perpendicular flight lines, at 6 m/s forward speed and 50% swath overlap at 90° downward field of view.  Based on correspondence with NovaTel, the GNSS/INS equipment manufacturer, and Phoenix LiDAR Systems, the payload integration company, the typical UAS lidar collection flight is initiated by flying the payload in a particular calibration pattern. This pattern consists of (1) a five-minute period of static alignment after powering on the payload, and (2) a kinematic alignment that consist of (a) flying in a straight, horizontal line at cruising speed for at least six seconds, and (b) conducting a series of figure-eights until the reported attitude uncertainty covariance (which can be monitored in real time via wifi) reaches an empirically derived level of <0.009°, which has been found to be optimal for high-quality trajectory solutions. This process is repeated in reverse order at the end of data collection to facilitate a forward and reverse, tightly coupled, multi-pass trajectory solution. The three flights were conducted consecutively, each 1. Power on 2. 5 min static alignment 3. Take off 4. Kinematic alignment (alignment then figure-eights) 5. 40-m AGL mission 6. 50-m AGL mission 7. Landing and swapping of batteries, using redundant battery power to keep continuous power to the payload 8. 60-m AGL mission 9. Kinematic alignment in reverse order 10. Landing 11. 5 min static alignment 12. Power off 2.4.2 Processing. On the day of the flights, a total of six static GNSS receivers were operated to use for the subsequent trajectory processing. One was set over control point A, and the other five were set arbitrarily at distances of 0 km, 1 km, 5 km, 10 km, and 20 km from the study site. This allowed for the processing of six trajectory solutions in Waypoint Inertial Explorer v8.70 for each flight. Using these trajectory solutions, each of the three flights was then georeferenced in Phoenix LiDAR Systems' SpatialExplorer v4.0.3, keeping only those returns (1) within a 90° downward field of view. This process resulted in a total of eighteen UAS lidar point clouds.

Boresight calibration.
To evaluate the accuracy of the boresight calibration of the lidar payload, a custom algorithm utilizing an iterative closest point (ICP) variant (Rusinkiewicz and Levoy, 2001) was applied to one of the eighteen point clouds (under the assumption that the boresight parameters did not change during collection) to capture significant corrections to the boresight parameters. The process used to identify these corrections consists of three steps: 1. Interpolation of point time data. A number of the geometric targets in the scene were isolated and, through interpolation, the isolated points' GPS times were correlated with the trajectory solution. 2. Separation of data by swath. 3. Use of ICP on conjugate features. Finally, a least squares adjustment based on the ICP method minimized the point-to-point distance of each isolated target's like points. These closest points were used to solve for the corrections to the initial boresight rotation matrix in the INS frame, from with the three boresight angles are derived. The RMSE of conjugate points and the standard deviation of unit weights per boresight parameter were examined to assess their significance for each correction identified.

Geometric target mensuration
2.5.1 Template fitting. Initially, a template matching the physical construction of the geometric targets are fitted to the targets present in the point cloud. An iterative least-squares solution is used to fit the template to the target in the point cloud.
The full algorithm is presented in (Wilkinson et al., 2019); a brief outline is presented here.
1. The section of the point cloud containing the target is manually identified and isolated. 2. The template is initialized on the isolated points. The reference point of the template, its apex, is initialized as the highest point present in the isolated set. The template is oriented as level, with one of its three facet edges aligned in the east-west direction. 3. The iterative least squares solution, which minimizes point-to-plane distances, is used to solve for the corrections to the template's initial position and attitude described in step 2. 4. Upon convergence, the final position of the apex is held as the position of the mensurated target, and each point in the solution is assigned to one of the three facets of the template.
Prior experiments indicate that typical UAS lidar flight parameters with the HDL-32E and similar sensors provide enough returns from the geometric targets to mensurate the reference point (the apex) with an estimated uncertainty of ≤1 cm (1 )

Fitting and intersection of planes.
To further verify the results of the template fitting, a plane was fit to each facet's point set, and the resulting intersection of the three facet's planes was held as a second solution for the reference apex. The method used to find the best-fit planes for each fact point set, and the resulting intersection of the three planes, is not explicitly stated in the reference material used for this study, so the methods are explained below.
The nonlinear least squares adjustment used to solve for the bestfit plane to a facet point set is expressed as (1) whose terms and solution steps, as described in (Ghilani and Wolf, 2017), are defined below.
To fit a plane to a set of points, one may choose to minimize the sum of squares (least squares) of distances from each point to the solved plane. The adjustment is built from the observation question which is the equation of a plane in 3-space. The adjustment also introduces the constraint equation 2 + 2 + 2 = 1 (3) which constrains the solution for the plane's normal vector = 〈 , , 〉 to unit length.
While the observation equation is linear, the constraint equation is second-order, therefore necessitating the use of a nonlinear least squares adjustment. A nonlinear least squares adjustment solves for a nonlinear system of equations by using a first-order Taylor series approximation where some observed value is related to a function of the unknown parameters in the model. In this case, the unknown parameters are the plane coefficients , , , and the observed value = 0 (and = 1 for the constraint equation).
where 0 , 0 , 0 , 0 are the initial approximations of the unknowns. (For the constraint equation, because the unknown is not present, / = 0.) After selecting initial approximations, the remaining unknowns are the correction terms , , , .
The above is then expressed in matrix form. First, the Jacobian matrix A is formed from the partial derivatives of the set of observation equations with respect to the unknown plane parameters: where the i th observed point = ( , , ).
The L 1 vector contains the constant values of the observation equations minus their values as computed using the initial approximations, which is formed from rearranging EQUATION ##: Finally, the correction vector which houses the correction terms is expressed as The Jacobian matrix C is formed from the partial derivatives of the constraint equation: Because there is only one constraint equation, L 2 is of size 1 × 1: The value Δ 2 is an unused variable that is added only to preserve the dimension property of matrix multiplication; its resulting value in the solution is arbitrary and useless.
The corrections to the approximations are then found by The corrections in Eq. 10 are applied to the initial approximations, the updated approximations are applied to Eq. 8, and the process is iterated until convergence.
Once converged, once each point used in the adjustment has an associated residual, the z-score of each residual with respect to the standard deviation of all residuals is found. Those points with a z-score >3 are removed, and the adjustment is repeated without the offending points. This process is repeated until no outliers are found.
It bears noting that there are faster and more robust methods of plane fitting (e.g. singular value decomposition or RANSAC), but the above method was chosen to facilitate classical error propagation.

Ground control and target survey
Selected results of the WLS adjustments for both the horizontal and vertical networks are provided in Both adjustments were further verified with a 2 goodness-of-fit test (Ghilani and Wolf, 2007). The horizontal adjustment passed the goodness-of-fit test, but the vertical adjustment failed in the upper bounds when holding the weights of observations provided by OPUS-Projects. Failure of the goodness-of-fit test may indicate a failure of the stochastic model; in other words, the residuals of some observations are higher than their a priori standard deviations. Indeed, the residuals of the OPUS-Projects GPS solution observations were higher than expected. While the digital level loop closed at 2 mm, the residuals of the adjustment-when holding the reported standard deviations of 2 mm for the elevations of the control points-are 7, 8, and 14 mm for points D, E, and F, respectively. Increasing the a priori standard deviations of the OPUS-Projects observations to 10 mm brings the goodness-of-fit test within the acceptance region. This indicates a larger source of error in the OPUS-Projects solution, which is likely expected error from GPS observation, although the possibility of instrument error in the setups of the GPS receivers should not be dismissed. Regardless, considering these results, the final elevations for the survey control held the OPUS-Projects elevation of control point A and the level loop differences; the vertical WLS adjustment was discarded (see Discussion).
The ground survey of the targets was adjusted to the held coordinates above. As mentioned previously, each target was measured at least twice each with the total station (angle and distance) and with the digital level, and their final coordinates were the average of the observations. No blunders were detected in either the total station or digital level observations collected.
The held position of each static base receiver was solved using OPUS. The overall RMS for all six static solutions ranged from 0.020 -0.023 m.

Boresight calibration
The results of the boresight calibration procedure yielded insignificant corrections to the initial boresight parameters. While this was a valuable verification of the quality of the dataset, no corrections to the initial boresight parameters were applied to the point clouds in this study.

Target comparison
The template fitting followed by the subsequent step of fitting and intersecting three planes (see Section 2.5) yielded favorable results for all targets in the scene except for one: pyramid 24. The cause of the consistent failure of template fitting to this target is not known; regardless, this target has been excluded from the following results. A single pyramid from the 60 m AGL, 20-km baseline flight, pyramid 17, was removed as well (see Table 3).
Overall, the RMSEs of the plane fit residuals for all targets range from 1 -2 cm over an average of about 100 points per facet. While this can possibly be interpreted as a proxy for the individual point accuracy, the following results do not reflect individual point accuracy.
The horizontal error of the mensurated targets to their ground surveyed positions is presented in  Table 4. Summary statistics of the horizontal error between mensurated and ground surveyed targets. Each target mensurated from each point cloud is included in the summary.
The vertical error between the mensurated and ground surveyed targets is presented in Tables 5 and 6 and Figure 5. Much of the variability in the mean height error is likely a result of using the ad hoc positions of the various static base receivers. There is also an apparent positive height bias in the 40 m AGL set of point clouds (see Discussion). The 40 m AGL flight aside, the flights whose trajectories were processed holding the rigorously solved coordinates of control point A yielded the best results.

Elevation from static GPS observations
As mentioned in the Results section, the WLS adjustment of the OPUS-Projects elevation solutions and digital level loop was discarded in favor of holding only the digital level loop and a single elevation from the OPUS-Projects solution (point A). This decision was based primarily on the statistics presented in the Results section. To further confirm this decision, the comparison of the mensurated targets to the ground survey of targets was conducted twice, once holding the WLS adjusted elevations and again with the held level loop coordinates. When holding only the level loop, the mean differences between the mensurated targets and the ground-surveyed targets decreased favorably (i.e., toward zero) by 4 mm. While alone this is not definitive evidence that the decision to discard the WLS adjustment was correct, when taken together with the statistics of the WLS adjustment and the physical nature of GPS observations, it does further support the decision.
On the topic of greater variability of the vertical component of GPS observations: the day-to-day variability in static solutions from OPUS is evident in the sets of static solutions processed through the service. Each control point was observed three times under favorable conditions (four times in the case of point A, counting the day of the flights), and the variation among each control point's set of solutions ranges on the order of 5 cm. The roughly 4-cm bias present between the target comparison results holding the point A baseline versus the 0-km arbitrary baseline (the receiver for which was mere meters away from the receiver over point A) encapsulates this variability (Table 6). This highlights the need to use a stable monument whenever possible when surveying a site with UAS lidar, and to use the same monument when revisiting the site when surveying for changes in vegetation, topography, or other features of interest.

Source of bias in laser ranging
Inspection The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-M-3-2021 ASPRS 2021 Annual Conference, 29 March-2 April 2021, virtual AGL flights. One possible explanation for this is range bias in the scanner as a result of the scanner not being properly warmed up. A previous study reported an approximate warm-up time of over 30 minutes was necessary for most of the HDL-32E's 32 lasers to stabilize in their range measurements (Chan et al., 2013). The ranges were biased toward the scanner, which from the UAS pose would result in an apparent bias of higher elevation, which is what presents in this study.

Interference in detecting targets
As stated in Section 3.4, the results used for the mensurated targets came from applying the template fitting algorithm with an added step of fitting three individual planes to the mensurated target and calculating the planes' intersection to hold as the reference point. Under perfect conditions, this extra step would yield identical results to the template fitting reference points, but this was not the case. The template fitting yielded results that exhibited a consistent positive height bias when compared to the results from the extra plane-fitting step. After visual inspection of both sets of results, the authors decided to hold the plane-fitted targets as the mensurated targets. One possible explanation for this bias is noisy or false returns from the base of the pyramid targets. The grass in the study area had grown to roughly 40 cm in height by the day of the flight, and despite efforts to tamp down the grass around the targets, visual inspection of the point clouds indicates this may not have been fully effective. Most of the returns from a pyramid will come from the base; if the base is partially or wholly occluded, this could lead to inaccurate template fitting. In this scenario, the extra plane fitting step appears to have helped counteract this speculated effect.

Future work
The authors plan to repeat this study in the near future with improvements in the study design made apparent from the study presented here. First, a study site devoid of tall grass or other obstructions to the targets will be used to avoid the suspected interference in the detection of the geometric targets. Also, to gain a more thorough understanding of the effect of static baseline length on the relative and absolute accuracies of a UAS lidar point cloud, a static GPS network survey will be conducted for the baseline observation points, such that their solved positions can be used in lieu of the arbitrary positions held from single, static OPUS solutions. Finally, a temporal analysis based on the one presented in (Chan et al., 2013) will be repeated using the laser scanner used in this study to verify the warm-up time and characterize the temporal range bias. Based on those results, the future set of UAS flights will be conducted with a proper warm-up period for the scanner.