ON-SITE GEOMETRIC CALIBRATION OF THERMAL AND OPTICAL SENSORS FOR UAS PHOTOGRAMMETRY

UAS imagery has become a widely used source of information in geomorphic research. When photogrammetric methods are applied to quantify geomorphic change, camera calibration is essential to ensure accuracy of the image measurements. Insufficient self-calibration based on survey data can induce systematic errors that can cause DEM deformations. The typically low geometric stability of consumer grade sensors necessitates in-situ calibration, as the reliability of a lab based calibration can be affected by transport. In this research a robust on-site workflow is proposed that allows the time-efficient and repeatable calibration of thermal and optical sensors at the same time. A stone building was utilised as calibration object with TLS scans for reference. The approach was applied to calculate eight separate camera calibrations using two sensors (DJI Phantom 4 Pro and Workswell WIRIS pro), two software solutions (Vision Measurement System (VMS) and Agisoft Metashape) and two different subsets of images per sensor. The presented results demonstrate that the approach is suitable to determine camera parameters for pre-calibrating photogrammetric surveys.


INTRODUCTION
With the availability of increasingly powerful SfM photogrammetry software and off-the shelf Unmanned Aircraft Systems (UAS) comes an increase of photogrammetric applications in geomorphologic research. SfM photogrammetry is applied in several areas of geomorphic research, including the study of mass movements, coastal erosion and fluvial environments. Surveys are cost-efficient and thus SfM photogrammetry from UAS allows repeat surveys to monitor earth surface processes. Modern SfM photogrammetry workflows detect and match keypoints and subsequently determine lens distortion parameters, camera position and orientation in a self-calibrating bundle adjustment. The algorithms approximate intrinsic and extrinsic camera parameters as an optimisation problem with a large numbers of variables. However, the simultaneous solution of several parameters can cause overparameterisation (James et al., 2017).
Moreover, image sets with near parallel nadir viewing directions can result in the inadequate determination of lens distortion during bundle adjustment, resulting in dome shaped DEM deformations (James et al., 2020, SanzAblanedo et al., 2020. Vertical imaging survey designs were common practice in classical aerial photogrammetry from manned aircraft and required pre-calibrated metric cameras with high geometric stability. However, systematic errors from weak geometric configuration can be omitted by careful design of flight plans with sufficient overlap, various viewing angles and flying heights (James, Robson, 2014).
Nevertheless,certain applications require data acquisition strategies that counteract a robust bundle adjustment (Cramer * Corresponding author et al., 2017). For example, in the case of through-waterphotography for bathymetric SfM, imagery must be captured with the optical axes in the near vertical direction. This is to minimize the offset of the apparent positions of submerged areas due to the refraction of light at the water surface. Besides affecting the generated DEMs of the submerged areas, the misplacement of tie points by refraction affects the self-calibrating bundle adjustment when water is present in the scene. In such survey, an alternative strategy to self-calibrating bundle adjustment is to determine an accurate camera model in a separate pre-calibration.

Problem statement
The overarching research to which this paper relates explores the application of bathymetric SfM photogrammetry to assess geomorphic changes induced by river restoration on the River Gairn, Scotland. Dry and submerged topography were monitored over several time steps before and after the placement of artificial logjams into the stream. This paper describes the development of a geometric pre-calibration approach for the adopted UAS that is equipped with both optical and thermal sensors.
A common approach for geometric sensor pre-calibration involves using 2D or 3D calibration arrays with known geometry (e.g. surveyed targets) in laboratory environments (Cramer et al., 2017). Camera calibration parameters are determined by self-calibration bundle adjustment from a survey designed to create a robust network. However, off-the-shelf sensors often have low geometric stability that can affect the calibration when transported, e.g. over rough tracks and due to deformation with changing temperature (Elias et al., 2020, SanzAblanedo et al., 2020. Cramer et al., (2017) investigated user grade UAS sensors in lab experiments and found large variations of geometric stability. However, they found especially low deviations in the DJI Phantom 3, suggesting geometric stability "close to the concept of a metric camera". Similar results have been described for the DJI Phantom 4 Pro (e.g. Fraser, 2018)). Nevertheless, ideally the sensor system is calibrated on-site immediately prior to the survey and repeated afterwards for consistency (Cramer et al., 2017). Since field work is often planned with a tight schedule and subject to real-world restraints such as weather conditions, calibration must be carried out quickly and effectively. Consequently, a permanent, re-usable, on-site structure would be preferable for field surveys. However, the reference features of any such object must provide sufficient contrast in the observed spectra and must be large enough to be recognizable in all of the sensors being utilized (in this work, optical and thermal cameras are utilized simultaneously). Moreover, considering the critical influence of focal length on the calibration, research (Griffiths, Burningham, 2019) suggests that precalibration should be conducted at the scale of data acquisition. Hence, ideally the distance between sensor and calibration object should be equivalent to the targeted flying height.
In summary, an on-site calibration object should be suitable for optical and thermal sensors and stable over time so as to allow time-efficient repetitions. The survey design must yield a robust photogrammetric network with similar distance as in survey to be conducted. A new workflow has been developed to meet these criteria for camera pre-calibration ( Figure 2).

METHODOLOGY AND MATERIALS
Permanent structures that can potentially be used as calibration array, e.g. bridges or walls, can be found in many locations. Cramer et al. (2017) used an abandoned industrial wasteland featuring several towers and Fraser (2018) used detached buildings in several experiments. In this research, a freestanding granite building in the study area was an obvious choice, as granite shows minimal rates of expansion with temperature and is thus geometrically stable (Richter, Simmons, 1974). Structural features are recognizable at a variety of image resolutions and scales (Figure 1 b). Furthermore, the emitted radiation of the stone allows feature recognition in thermal images while doors and windows are large enough to be recognisable despite the low image resolution of the thermal images (Figure 1 a). The higher resolution of the scenes captured by RGB sensors allow recognition of features on a smaller scale such as joints, individual stones or window frames.

Study site and data acquisition
We acquired all datasets on 13 th September 2019 from a reach of the River Gairn in the Cairngorms National Park, Scotland  We used two UAS to collect thermal and RGB imagery. Thermal imagery was captured from a DJI M600 system equipped with a Workswell WIRIS pro sensor. The resolution of the WIRIS pro is 640 x 512 pixels with a pixel size of 17 microns and a nominal focal length of 13 mm. Visible optical imagery was acquired separately using a DJI Phantom 4 pro with the built-in RGB sensor of 5472 x 3648 pixel resolution, pixel size of 2.41 microns and nominal focal length of 8.8 mm. The RGB sensor is equipped with a polarizing filter which reduces the reflection at the water surface in bathymetric photogrammetry survey.
We captured imagery in circular flight patterns around the building to incorporate various angles, heights and distances (Figure 3 a). Circular flight geometries create the most convergent images and thus allow strong photogrammetric networks to determine the most accurate calibration parameters (SanzAblanedo et al., 2020). The sensors were set to trigger at 3 second intervals and a total of 158 thermal and 101 RGB images were captured.
To generate a reference dataset, we scanned the building with a Leica ScanStation P40 Terrestrial Laser Scanner (TLS) from three perspectives. The scans were captured from a minimum distance of 20 m with a resolution of 3.1 mm at a distance of 10 m. We did not georeference the scans as only a local coordinate system is required in the workflow. We registered, merged and exported the collected point clouds using Leica Cyclone 9.2.1. Subsequently, we used CloudCompare (GPL Software, 2019) to extract visible features from the TLS point cloud in order to create the calibration array. Firstly, we exported approx- In the next step this preliminary label-x-y-z point cloud was split up into several single points to be placed accurately using the Translate Tool and merged again in CloudCompare. The split and merge operations were performed using R scripts (R Core Team, 2019). Finally, the generated coordinate reference network was divided into control points and check points.

Generating image observations
For sensor calibration we adopted two different software solutions. Firstly, VMS (Vision Measurement System) (Geomsoft, 2008), close range photogrammetry software that has been applied for geometric calibration (e.g. James et al. 2020;Shortis and Luhmann 2018). In the context of this study the calibration in VMS serves as a benchmark against which the widely used SfM photogrammetry software Agisoft Metashape Professional (Agisoft LLC, 2020) is compared.
We aligned the full datasets of both sensors using Metashape to efficiently create subsets (Figure 3 a). We imported the reference target coordinates and placed a few markers in order to locate the preliminarily aligned photos in the reference dataset. Specialized calibration software such as VMS is designed to run on a smaller number of convergent images and a large number of target observations.Therefore, we created two subsets from both sensor datasets: A smaller one of 14 images, hereafter referred to as SUBSET and a larger one of 20 images ADD6. The preliminary SfM based alignment facilitates the selection of sufficiently convergent images. Subsequently, we carefully placed markers as observations on the features defined as reference points in the TLS point cloud. Metashapes suggested marker placement speeds up the process of digitizing a large number of image coordinates. SUBSET comprises all images and target observations from ADD6. Removing images that would not align or adding images to improve network stability in certain areas during calibration resulted in different numbers of images. An overview of the datasets can be found in Table 1.
Finally we exported all target observations using a script in Metashape's Python console. Subsequently, we generated the standardized initial VMS project files (project, camera calibration, image observations, target reference and photo orientation) via a R script (R Core Team, 2019).

Determining calibration parameters
We used VMS to determine the camera calibration parameters. To speed up the calibration process in VMS we adopted parameters from Metashape as initial values, thereby facilitating the initial orientation of images. We calculated the first network keeping all parameters fixed. Subsequently, we released further parameters over several iterations as suggested by Shortis and Luhmann (2018): starting with the radial distortion parameters (K1, K2 and K3), adding the principal point coordinates (PPx and PPy), the tangential distortion parameters (P1 and P2) and finally affinity and orthogonality terms (a1 a2). With each additional enabled calibration parameter the initial parameters were successively overwritten. We kept the the observations and targets fixed by resetting them from the initial files in every iteration to avoid drifting of the network.
VMS does not provide an option to determine the radial distortion parameter K4. To test if K4 could be omitted, it was calculated using Metashape (for RGB ADD6) and the correlation between K3 and K4 was found to be 0.99, thus we decided that it could be safely neglected for the sake of comparison with VMS. The omission of lower order radial distortion parameters is common practice, especially for smaller sensor sizes on nonmetric cameras (e.g. Eltner and Schneider 2015; Remondino et al. 2012). .
We subsequently we applied a similar workflow in Metashape: starting from an aligned chunk with the whole dataset we created a copy and removed unnecessary images to create the subsets. The small number of tie points between adjacent thermal images resulted in misalignment rather than improvement of the network. The weak tie points might be caused by the characteristics of the thermal images having low resolution in combination with angle dependent radiation, Metashape's image matching algorithms having not been designed for processing thermal images. For this reason and to ensure comparability with VMS, we removed all tie points and calibrated solely based on the markers. We determined the calibration parameters in the same order as in VMS iterating through alignment, camera calibration and updating the errors of the reference. We exported the calibration parameters in the 'australis' format.
To visualize lens distortion we calculated the profiles according to Fraser et al., (1995): for the radial distortion and: for the tangential distortion along the direction of maximum distortion (Figures 5 and 6). To evaluate the distribution of observations on the sensor area through the subsets, we created scatterplots with marginal histograms (Figure 4).

RESULTS AND ANALYSIS
The three point clouds obtained from the TLS survey were registered with an RMS of 0.011 m and a total of 11,163,796 points (Figure 3 b). An overview of the included images and observations, as well as the resulting metrics, can be found in Table 1. The calculated calibration factors for the datasets can be found in Table 2. The distortion profiles are visualized in Figure 6 for thermal and Figure 5 for RGB. We removed images that did not align and observations with excessively large residuals during the processing in VMS. Therefore, the number of images varies between the datasets. The distribution of observations over the sensor area is visualized in Figure 4. The distribution shows a higher frequency of observations in the central sensor area than the edges and the corners. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q qq qq q q q q q q q q q qq qq q qq qqq qq  The radial distortion profiles (RGB: Figure 5 a and thermal: Figure 6 a) show similar patterns for both sensors. The greatest similarity is between the data pairs that come from the same software, where VMS is slightly higher than Metashape. The tangential distortion profiles (RGB: Figure 5 b and thermal: Figure 6 b) show smaller magnitude and less similarities between the sensors. For RGB the tangential distortion values remain smaller than 1.3 and the thermal tangential distortion profile is very inconsistent between all data sets.
For both sensors the values calculated for the principal distance (C) and the principal point coordinates show only small vari-ance between the datasets (Table 2). Analogously we gained by far the highest precisions for the principal distance. The precision values were 3 (thermal) and 4 (RGB) orders of magnitude above the other parameters. At the principal point we calculated higher precisions for X displacement than the Y value. The radial parameters displayed similar values for RGB and were more heterogeneous for thermal. This is also reflected in the precisions that are between a factor of 2 and 5 above the parameter values for RGB and lower for thermal. Overall the precision was low with the exception of C. RGB showed higher precisions than thermal and VMS had a higher precision than Metashape.

DISCUSSION
Using TLS instead of total station measurements as a reference allowed us to revisit the dataset during processing. The measurements were carried out on the point cloud rather than during the often limited time on-site. We registered the three scans with an overall RMS of 0.011 m, which is sufficient to prove the concept. However, we do suggest using a larger number of scans to reduce RMS by introducing more redundancy through overlap for future applications. A robust point cloud registration is crucial for creating a reference dataset for camera calibration. The wavelength used by conventional laser scanners provides high contrast in the measured intensities due to the radiative properties of the stone building ( Figure 3). Combined with a high point density this allowed precise placement of reference points.
To find out whether our proposed workflow is suitable for simultaneously calibrating both thermal and optical sensors, we investigated if the reference points could be located with sufficient accuracy in the imagery. To evaluate the internal consistency of the network solutions, we first looked at the RMS. Under ideal conditions i.e. auto detection of coded targets, a calibration can yield a RMS of 1/10 of a pixel whereas 1/2 or 2/3 are realistic for manual target measurements or natural features (Fraser, 2018, Shortis, 2015, Geomsoft, 2008. In our study design VMS served as benchmark software for the calibration performance (James et al., 2020). We interpret VMS' better performance for accuracy and precision as confirmation. The calibration of the thermal sensor in VMS resulted in RMS values of 0.48 (SUBSET) and 0.53 pixels (ADD6) and a sigma of 1 indicating a good calibration accuracy. These values exceeded our expectations due to the low contrast and resolution imagery. However, these image characteristics could have caused a wider distribution of the target observations around features. This could be related to the the low parameter precisions. Furthermore we compared images acquired under different lighting conditions and found that direct sunlight is preferable over diffuse irradiation under cloudy skies. Recognizing individual bricks in the north facing wall remained difficult. We assume that a calibration with cloud cover would still be feasible but might lead to larger errors.
For the RGB datasets, VMS calculated RMS values of 0.88 (SUBSET) and 1.02 pixels (ADD6) and sigma 1.36 and 1.23 that are above the expected range. The precision values of the RGB datasets were larger than the thermal. This is probably due to a combination of different reasons. Firstly, due to the higher image resolution the marker placement allows a more precise placement of the observations. This, in turn, increases the requirements for the accuracy of the feature definition in the reference. However, this is challenging for natural features such as corners of bricks or intersections between window frames and horizontal window sills. This is clearly a disadvantage over automatic detection of target features in conventional calibration arrays measured by total station. Secondly, the smaller ground sampling distance increased the impact of the accuracy of the reference dataset and the point cloud registration. Thirdly, the low geometric stability of consumer grade optical cameras is further enhanced by the high sensor resolutions. In this context it is certainly advisable to use optical systems of higher construction quality. At the time of our survey, however, no other optical sensor could be accommodated on the available UAS. However, in comparison with other consumer grade UAS sensors, the DJI Phantom series has been shown to be one of the most stable (Cramer et al., 2017, Fraser, 2018. We will investigate the geometric stability of our system in further experiments and by processing additional datasets from surveys at different time. The processing in Metashape resulted in slightly higher RMS values compared to VMS which is in line with our expectations (James et al., 2020). While VMS is a dedicated photogrammetric software, in the workflow we have deprived Metashape of its SfM functionality by removing the tie points in order to perform a purely marker based calibration. Even though features are easier to recognize in RGB than in thermal, higher RMS values resulted for RGB. The reason could be that the higher number of observations introduces a higher variation of placement. Similarly additional images caused an increase in RMS. Metashape does not provide reprojection errors in a marker based calibration. Therefore, we could not compare the fit of the network solution against the corresponding VMS' σ value.
Radial distortion is the main geometric optical effect in nonmetric cameras with values up to two orders of magnitude above the tangential distortion (James et al., 2020). Our results show a similar ratio of the radial and tangential profiles. With magnitudes below 0.5 pixels the distortion values exceed the corresponding precisions and cannot be determined as significantly different from zero. For this reason tangential distortion is often removed from camera models (Fraser, 2001, Gruen, Beyer, 2001. Due to high correlation between the radial parameters, overparameterisation (James et al., 2017) may occur and often two radial distortion parameters provide a sufficient camera model (Eltner, Schneider, 2015, Remondino et al., 2012. We assume that we can determine a more robust camera model, less prone to systematic errors (e.g. DEM doming), by omitting more parameters (James et al., 2020). Therefore, we will carry out further experiments to avoid potential overparameterisation.
The uniformity and high precisions in focal length and principal point coordinates indicates that we can transfer our parameters to the SfM photogrammetry survey that was subsequently conducted with a similar distance between UAS and object (James et al., 2020, Griffiths, Burningham, 2019. However, maintaining the distance has the drawback that the image format is not always covered by the calibration array ( Figure 4). To improve the coverage towards the sensor edges, the UAS operator must therefore ensure that the array is not always in the center of the frame (Shortis, 2015). In addition, an equal distribution of observations must be considered when selecting the calibration images.

Conclusions
This research has demonstrated a field approach to calibrate a thermal and an optical sensor using the same calibration object. Our method allows the time-effective acquisition of calibration data on site that is required for work with UAS borne sensors of low geometric stability. We are confident that our calibration results are suitable to be transferred for pre-calibrating the sensors in the subsequent SfM photogrammetry survey. We will compare the performance of a self-calibration against precalibration for our bathymetric survey dataset. In the scope of this paper we only analysed the data from one epoch, and which will be used for the calibration of the survey which was subsequently performed. However, we can not provide a conclusion on the geometric stability of the sensor system and the magnitude of disturbance by, for example, transportation, temperature variation and handling. To investigate this question we will analyse further datasets that were acquired during the same field campaign.
In addition to this, we found several potential strategies to further improve our calibration workflow. First of all, we assume that by including observations of a maximum of three sides of the calibration building we would achieve a higher overlap using the same number of images, while maintaining a stable network. The registration of the TLS point clouds could be improved analogously, but more scan positions must be added. Another consideration must be the on-board pre-processing in consumer grade UAS sensors to mitigate radial distortion of the lenses (Cramer et al., 2017). We will test if the unprocessed RAW images allow us to calculate improved camera parameters and whether proprietary correction affects the tangential parameters (James et al., 2020). Finally, we are confident that we can apply our workflow to calibrate additional sensors. During the data acquisition we also acquired a multispectral calibration dataset using a MicaSense RedEdge-m sensor, that was mounted on the M600, and further investigations will follow.