PREDICTING THE INFRARED UAV IMAGERY OVER THE COAST

Abstract. The infrared (IR) imagery provides additional information to the visible (red-green-blue, RGB) about vegetation, soil, water, mineral, or temperature, and has become essential for various disciplines, such as geology, hydrology, ecology, archeology, meteorology or geography. The integration of the IR sensors, ranging from near-IR (NIR) to thermal-IR through mid-IR, constitutes a baseline for Earth Observation satellites but not for unmanned airborne vehicles (UAV). Given the hyperspatial and hypertemporal characteristics associated with the UAV survey, it is relevant to benefit from the IR waveband in addition to the visible imagery for mapping purposes. This paper proposes to predict the NIR reflectance from RGB digital number predictors collected with a consumer-grade UAV over a structurally and compositionally complex coastal area. An array of 15 000 data, distributed into calibration, validation and test datasets across 15 representative coastal habitats, was used to build and compare the performance of the standard least squares, decision tree, boosted tree, bootstrap forest and fully connected neural network (NN) models. The NN family surpassed the four other ones, and the best NN model (R2 = 0.67) integrated two hidden layers provided, each, with five nodes of hyperbolic tangent and five nodes of Gaussian activation functions. This perceptron enabled to produce a NIR reflectance spatially-explicit model deprived of original artifacts due to the flight constraints. At the habitat scale, sedimentary and dry vegetation environments were satisfactorily predicted (R2 > 0.6), contrary to the healthy vegetation (R2 < 0.2). Those innovative findings will be useful for scientists and managers tasked with hyperspatial and hypertemporal mapping.



Handborne Infrared Spectrophotometry
The integration of the infrared (IR) spectral information has enabled to enlarge the reflectance signature of a variety of objects, in order to better detect them. Since the discovery of the solar thermal IR radiation in 1800 by Herschel (Ring, 2000), IR ground spectroscopy studies have early attracted scientists' attention working on algae (Mestre, 1935), leaves (Billings and Morris, 1951), soils and waters (Myers et al., 1966), as well as terrestrial (Adams and Goullaud, 1978) and extra-terrestrial minerals and rocks (Adams, 1974). By augmenting the electromagnetic spectrum of the traditional visible (red-greenblue, RGB) information, natural and anthropogenic features can indeed be more easily discriminated given their specific spectral signature in such longer wavebands (Knipling, 1970). These pioneer research works consisted of the ground proof-of-concept studies, whose results were the rationale to embed IR sensors into top view platforms.

Manned Airborne Infrared Imagery
Beyond the 1D spectral signature, the photographic sensors capable to capture 2D imagery in the visible and IR spectrum (Clark, 1946) were mounted in manned airborne vehicles (MAV). The IR remote sensing has therefore been successful for studying geology (Laftman, 1963), hydrology (Abdel-Hardy, 1970), ecology (Knipling, 1969), archeology (Estes, 1966) and meteorology (Roads, 1973). Following the declassification of the IR imagery by Defence Ministers or Departments in various * Corresponding author countries, a plethora of scientists have used this IR imagery as a stand-alone resource provided with increasing finer spectral resolution, topping with the Compact Airborne Hyperspectral Imager (Babey and Anger, 1989). These latter findings at hyperspatial resolution (close to the meter grain size) were nevertheless constrained by a local spatial scene.

Spaceborne Infrared Imagery
The launching of 'Earthward' spaceborne platforms allowed some regional scenes to be acquired by visible and IR sensors. The USA Television and InfraRed Observation Satellite (TIROS), launched in 1960, constitutes, to date, the first IR imager embarked on a civilian satellite. Then, the seminal Earth observation satellite programs, spearheading the Earth resources technology satellites (future Landsat, launched in 1972), and the Advanced very-high-resolution radiometer (TIROS-N, launched in 1978), integrated imagers provided with IR bands (Table 1).

Band number
Sensors' spectral windows (µm) AVHHR/1 TIROS-N Landsat-1 MSS 1 0.55 -0.9 (Red+NIR) 2 0.725 -1.1 (NIR) 3 3.55 -3.93 (MIR) 4 10.5 -11.5 (TIR) 0.5 -0.6 (Green) 5 0.6 -0.7 (Red) 6 0.7 -0.8 (NIR) 7 0.8 -1.1 (NIR) The TIROS-N and Landsat-1 imageries were dedicated to the atmosphere and to the terrestrial biosphere monitoring and dynamics research. The measured reflectance in visible and IR enabled to map air temperature and humidity, as well as land moisture and vegetation, using the normalized difference vegetation index (Rouse et al., 1974). The spatial resolution of TIROS-N and Landsat-1 attained 1 km and 79 m (now resampled at 60 m), respectively. Fifty years later, the Landsat-8 and Sentinel-2 multispectral sensors continue delivering visible and NIR imageries, but at 30 m and 10 m pixel size, respectively. Covering the whole Earth, they leverage a 16day and 5-day temporal resolution, respectively. However, imagery built with 10 m spatial resolution cannot reconstruct the fine-scale spatial patterns and related biophysical processes (Collin et al., 2016). Since 2000, hyperspatial satellite sensors have been able to produce multispectral imagery between 0.3 to 1 m pixel size (see. Collin et al., 2021), but at the detriment of the temporal resolution (monthly to quarterly). A novel platform is expected to capture multispectral imagery at hyperspatial and hypertemporal resolution.

Unmanned Airborne Infrared Imagery
Unmanned airborne vehicles (UAV) have the capabilities to collect imagery at the decimeter (Mury et al., 2019) and daily scale, depending on the favourable settled weather. Even if the spatial extent remains local, spanning several km 2 , the UAV imagery can be utilized as a stand-alone platform  or as a linkage between ground-truth and satellite imagery (Collin et al., 2019a). The UAV is highly cost-efficient given the low purchase and maintenance prices, while substituting dozens of geo-photographers covering hundreds of m 2 . However, most of the commercial UAVs are only provided with a RGB sensor (e.g., Schiefer et al., 2020). In line with MAV (Collin et al., 2018a) and satellite (Collin et al., 2018b) imagery, the integration of the NIR can yet significantly improve the UAV-based estimation  and classification of continuous and discrete environmental variables (Collin et al., 2019b.

Predicting Infrared Imagery
To date, the use of the IR, from NIR to TIR, remains very erratic in the UAV research study, insofar as it is constrained by the mounting of a dedicated sensor onto the platform. This operation could furthermore detract from the compliance with the flight legislation and therefore downgrade the UAV cost-efficiency.
Embedded into a scientific era featured with increasingly massive data and efficient machine learners, we propose a novel approach to predict the NIR reflectance response from RGB digital number (DN) explanators using various state-of-the-art regressors, from linear to non-linear regression methods.
This original experience will be tested with a consumer-grade UAV, whose the transferability power is very high. An in-depth statistical modelling, based on calibration, validation and test datasets, will be applied to the overall scale of a complex coastal landscape featured with 15 representative habitats for all regressors, and also to the individual scale of habitats for the best prediction ( Figure 1). Results will be discussed in the light of the temporal, spatial, spectral, radiometric and numerical perspectives.

Study Site
The study site is located on the temperate coastal fringe (48°69'N, 1°95W) of the Northern Brittany (France). The rectangular test area covers 190 000 m 2 (500 m × 380 m), and its altitude, referenced to the sea level of the lowest astronomical tide, ranges from 0 to 15 m. Subject to a megatidal range (14 m amplitude during the spring tide) and encompassing the three representative environments of the coast (rocky, sandy, and muddy), the site's ecosystems are highly diversified: a reflective beach, a dune complex provided with a stratified succession (from the marram grasses to the pine trees), tidal flats, and a salt marsh featured with a stratified succession (from the pioneer cord grasses to the sea lavenders).

Ground Measurements
A field campaign was carried out just before the UAV flight on July 2, 2020, from 9 to 11h am (UTC+2). Firstly, an array of 13 ground control points (black stars on the Figure 1) was evenly distributed over the site and their geolocation was accurately measured using a D-GNSS (Topcon HiPer V). The centimeter accuracy was reached along horizontal (XY) and vertical (Z) coordinates using the post-processing freeware RTK lib (Takasu and Yasuda, 2009).
Secondly, a series of 30 photoquadrats (coloured spheres in Figure 1), whose the dimensions reached 0.5 m × 0.5 m, was also geolocated with the D-GNSS and sampled with a 12-megapixel Olympus Tough Camera. A statistical hierarchical clustering of the photoquadrats, based on their areal coverage, led to the constitution of 15 habitats (

Unmanned Aerial Vehicle Survey
Following the precedented fieldwork, the UAV flight occurred between 11 and 12h am (UTC+2). It was planned thanks to the application DJI GS Pro, ensuring a consistent 50 m height, as well as front and side overlap ratios of 80% and 70%, respectively. These flight constraints enable the further photogrammetric reconstruction to be optimized (Collin et al., 2019b). The UAV comprised of a DJI Phantom 4 Pro V2 (P4V2) augmented with a Parrot Sequoia+. The P4V2 leverages a 4864 × 3648 RGB sensor (Figure 1), and the Sequoia+ includes a 1280 × 960 NIR (centered at 790 nm, and wide of 40 nm) nadiral sensor ( Figure 2) and a zenithal irradiance sensor. The P4V2 and Sequoia+ collected 648 DN and 310 reflectance geolocated images, respectively. The orthomosaics were produced using the Pix4Dmapper software and georeferenced in the RGF 93 datum, tailored with the conic conform Lambert 93 projection.

Regression
The potential to estimate the Sequoia+ NIR reflectance response from the P4V2 RGB DN predictors was examined using the estimation based on regressions. Assuming that each of the 15 habitats is representative of an inherent sub-regression, a rigorous statistical stratification was undertaken. For each habitat, the seed pixels, measured in situ by the geolocated photoquadrats, were grown to the neighbour pixels based on their spectral signature membership. When 1000 pixels were reached, they were divided into 400 calibration, 400 validation and 200 test pixels. A suite of five families of regressor was inspected.

Linear Model:
The standard least squares (SLS) regressor shapes linear models for the numeric NIR response data with fixed effects by minimizing the sum of squared residuals derived from the numeric RGB DN: where yi = i th observed NIR value f(xi) = i th modelled NIR value 2.4.2 Partition Model: Three kind of partition models were tested to predict the NIR reflectance. The decision tree (DT) is a method that recursively splits NIR values using a cutting value from RGB predictors, maximizing the difference in the means of the NIR response between the two nodes of the split (Hawkins and Kass, 1982).
The boosted tree (BT) is a method that creates an additive DT, derived from an array of smaller DT, aligned in layers. Each layer is grown through the recursive fitting model, and produces a DT containing a small amount of splits. For every DT, the modelled NIR value in the leaf corresponds to the mean of all observed NIR values in that leaf. The final partition is the sum of the NIR predictions for a NIR observation over all the layers (Hastie et al., 2009). The bootstrap forest (BF) models the NIR response by averaging the NIR predictions across many DTs. Every tree stems from a random sample of observed NIR values, drawn with replacement, and the RGB predictors are sampled at every split. The final partition is here the average of the NIR predictions for a NIR observation over the population of DTs, namely the forest (Hastie et al., 2009).

Neural Network:
The neural network (NN) constructs a fully connected one-or two-layer perceptron, in which each (hidden) layer includes derived inputs, called (hidden) nodes or neurons (Heermann and Khazenie, 1992). For every node, a function of transformation, called activation function, is applied as a linear combination of the RGB predictors: where wj = j th weighted activation function nj = j th node X = RGB predictors The first activation function experienced was a sigmoid function (fuzzy logic), defined as a hyperbolic tangent function (TanH), that scales values between lower -1 and upper 1 bounds: where z = a linear combination of the RGB predictors (X) The second activation function executed was a Gaussian function (Gauss), that is likely to fit better the NIR response surface when it is normal in shape: Three series of NN fully connected multi-layer perceptrons were implemented: one hidden layer ranging from one to five nodes for separate sigmoid (hyperbolic tangent, TanH) and Gaussian (Gauss) functions; two hidden layers ranging from one to five nodes for separate TanH and Gauss functions; and two hidden layers ranging from one to five nodes for combined TanH and Gauss functions. For the sake of comparison at the regressor family scale, an average of all NNs was compiled.

Accuracy Evaluation:
All model predictions (SLS, DT, BT, BF, NN) were evaluated from the independent test dataset using the coefficient of determination (R 2 ) and the root mean square error (RMSE).
The results were further analyzed at the habitat scale for the best prediction. The formula found from that prediction was applied to the RGB wavebands, so as to rasterize the NIR reflectance response over the entire study area.

Infrared Regressor Families
Following the building of the five predictions of the NIR reflectance based on calibration and validation datasets, the results stemming from the test dataset showed an increasing sorting (Figure 3): linear SLS (R 2 =0.29), then non-linear partition models (DT, R 2 =0.57; BT, R 2 =0.58; BF, R 2 =0.59), and finally the average of all non-linear NNs (R 2 =0.62). These findings corroborate the evaluation of these regressors' performance for bathymetry (Collin et al., 2017) and reef virality modelling (Collin et al., 2018a).

Figure 3.
Barplot of the regression models' test results for linear standard least squares, non-linear decision tree, boosted tree, bootstrap forest, and the average of neural networks.

Infrared Neural Network Regressions
The R 2 results derived from the three NN series ranged from 0.29, for the least efficient single one-nodded hidden layer (Figure 4), to 0.67, for the best efficient two ten-nodded hidden layers (Figure 5a and 5b).

One-Layer Neural Network:
The results for the single hidden layer indicated that the addition of the nodes progressively improved the NN prediction, with a strong disruption at the second node for both activation functions ( Figure 4). From the three-nodded NN, the results yielded with the Gauss function slightly surpassed those with the TanH function.

Two-Layer Neural Network:
The inclusion of the nodes logically augmented the NN performance, either for the separate or combined activation functions (Figure 5a and 5b). While the second node break is obvious for the separate activation functions, whose results are dominated by TanH (blue series in Figure 5a), it is tangible but much less dramatic for the combined activation functions, whose scores are led by the crossed combinations (red series in Figure 5b).

Infrared Neural Network Regression at Habitat Scale
The best NN model stemmed from the hidden ten-nodded double layers provided with crossed TanH-Gauss combinations, resulting in a highly complex architecture ( Figure 6). At the habitat scale, the best NN model predicted the NIR reflectance values in four distinct habitat groups, according to their inner regression score (Figure 7). The first group, composed by dry mud and dry salt marsh sand, was very satisfactorily predicted (R 2 from 0.8 to 0.6). The second group, comprising wet mud, wet salt marsh sand and interdune vegetation, was satisfactorily estimated (R 2 from 0.6 to 0.4). The third group, containing dry beach sand, primary dune vegetation, wet beach sand, road, low salt marsh vegetation and shrub, was moderately explained (R 2 from 0.4 to 0.2). The last group, gathering mid salt marsh vegetation, high salt marsh vegetation, tree and rock, was not satisfactorily modelled (R 2 from 0.2 to 0.0).

Figure 7.
Scatterplot of the observed test versus best predicted infrared reflectance.
A general pattern could be drawn from these outcomes: the NIR reflectance prediction from RGB tends to be more efficient with sedimentary habitats compared to vegetated habitats. These findings could be logically explained by the spectral signature of the habitats at stake. Mineral features commonly display a moderate level of reflectance and its gradual increase in the visible spectrum, which continues in the NIR spectrum (Zhang and Baas, 2012). This spectral continuity is likely to be easily modelled by a welltrained multi-layer NN. Likewise, the dryness and the wetness of the sedimentary habitats were well captured by the variability explained by the modelling. We could therefore advocate that the various soil types, whose spectral signatures are relatively linear from the visible to the NIR spectrum (McCarty et al., 2002), saturated by water or not, have the potential to be well predicted by this NN approach. Contrariwise, vegetation habitats, exhibiting a relatively low and sinusoidal reflectance trend in the visible gamut, show a sharp gain in reflectance in the NIR spectrum (Zhang and Baas, 2012). This discontinuity could certainly be the source of discrepancies in the NN modelling, even if this model works with the nonlinearity. Healthy vegetation, rich of chlorophyll pigments, typically follows this tendency: mid and high salt marsh vegetation, as well as rocks covered by macroalgae. The tree habitat, embodied by a pinewood (Pinus pinaster) in this study, echoes the same spectral disruption between the visible and NIR, even if the reflectance remains lower than green vegetation (Rautiainen et al., 2018).
However, it is worthwhile to underline that the NIR reflectance related to the herbaceous and low arbustive vegetation was correctly predicted, such as interdune, primary dune, low salt marsh, and shrub vegetation. Those habitats were covered, during the summer flight acquisition, by dry or senescent leaf blades, appearing yellowish in the visible spectrum, what facilitates a spectral continuity (a lower reflectance than this of the vivid vegetation) in the NIR spectrum, thus the better predictions.
Temporal, spatial, spectral and radiometric specificities of the UAV flight constrain the power of explanation related to the RGB predictors. Both abiotic and biotic coastal habitats change with the season due to the ocean hydrodynamics, watershed hydrology, and plant phenology. The seasonal variability in spectral signatures of the habitats might have a significant influence on the NN performance, given the modification in sediment grain-size and wetness, as well as photosynthetic pigments. Given its hypertemporal resolution, the UAV has the capabilities to capture this spectral variance across time . The spectral signatures of these habitats can also vary with the spatial resolution. The UAV flight was, here, ceiled at 50 m height, providing a centimeter-scale pixel size. However, a higher UAV height or a MAV will collect RGB imagery at the decimeter-scale but over regional extents, thus including a wider diversity of land use / land cover and sea use / sea cover types (Collin et al., 2021). On-going research is quantifying the impact of the spatial scale on the NN prediction.
The predictive modelling focuses here on the NIR spectrum. Insofar as the obtained results were suitable for features provided with a linear trend across the electromagnetic spectrum, we could assume that the MIR signatures of the sediment and dry vegetation habitats, studied here, as well as various soil (McCarty et al., 2002) and snow (Warren, 2019) types, might also be satisfactorily predicted. The NIR reflectance was predicted from the RGB variables at the raw radiometric level, namely DN. The NN modelling was based on the punctual sampling of the response and predictors (see 15 habitats' geolocations in Figure 1). Following the rasterization of the complex formula stemming from the multiple links of the tennodded two-layered architecture (see Figure 6), the spatiallyexplicit model of the NIR reflectance ( Figure 8) did not interpolate the artifacts of the reflectance due to the flight conditions (see vertical bands in Figure 2). The NIR sensor was indeed fixed on the UAV structure, what explains these artifacts due to the UAV roll change to compensate for the lateral wind gust. This acquisition artifacts could have been avoided with a dedicated gimbal. Overall, the improvement of the NIR raster through the NN modelling could be helpful for further studies requiring radiometrically-corrected 2D models.
This NIR predictive modelling was built from a fully connected perceptron, limited to two hidden layers and ten nodes in each one, that is to say 20 nodes and 140 node connections. This number of weighted activation functions requires substantial amounts of memory and computing resources for training the network. The numeric limitation could be overcome by creating a convolutional neural network (CNN) that minimizes the number of node connections by only focusing on the local region of every node. A CNN U-net could be advised since the output imagery will correspond to the strict similar size as the input imagery (Letard et al., 2020).

Figure 8.
Infrared reflectance orthomosaic (7132 × 8974 pixels) modelled by the ten-nodded two-layered neural networks based on red-green-blue digital number predictors.

CONCLUSIONS
Predictive modelling of the NIR reflectance from RGB DN in a context of a hyperspatial UAV survey over a structurally and compositionally complex coastal area has been investigated. Five families of regressor have been experienced: linear, partition (regular and boosted tree, bootstrap forest), and neural network (NN) models. Ground-truth data, divided into calibration, validation and test datasets across 15 habitats, have been used to quantify the prediction accuracy at the overall scale. The NN model, whose the architecture comprised two hidden layers with five TanH nodes and five Gauss nodes each one, yielded the best accuracy (R 2 =0.67). At the habitat scale, the NN model satisfactorily predicted sedimentary, dry and senescent vegetated habitats (such as herbaceous and low arbustive vegetation), while being few reliable on the healthy vegetation, including macroalgae, mid and high salt marsh vegetation. This trend in the NIR prediction has been discussed in the light of the spectral continuity or discontinuity between the visible (the predictors' window) and NIR (the response's window) spectrum. Those original findings hold great promise in the spatiallyexplicit modelling of the NIR, and more largely of the IR, at various spatio-tempo-spectral scales.