MULTI-TEMPORAL IMAGE CO-REGISTRATION OF UAV BLOCKS: A COMPARISON OF DIFFERENT APPROACHES

Traditionally, data co-registration of survey epochs in photogrammetry relied on Ground Control Points (GCP) to keep the reference system unchanged. In the last years, Unmanned Aerial Systems (UAV) are increasingly used in photogrammetric environmental monitoring. The diffusion of affordable UAV platforms equipped with GNSS (Global Navigation Satellite System) centimetre-grade receivers might reduce, but not eliminate, the need for GCP. Conversely, if GNSS-assisted orientation cannot be used or if additional ground control and reliability checks are required, alternatives to repeated GCP survey have been proposed, taking advantage of Structure from Motion (SfM) photogrammetry. In particular, co-registering different epochs image blocks together, identifying corresponding features, has been demonstrated as a viable and efficient approach. In this paper four different strategies easily implementable in a generic commercial photogrammetric software are presented and compared considering three different test sites in Italy subject to different amounts of environmental changes. The influence of the amount and distribution of inter-epoch corresponding points on the accuracy of the reconstruction is investigated. The results show that some of the tested strategies obtains very good results and can be used (although not needed) also in RTK centimetre-grade UAV surveys, leveraging the additional information coming from previous epochs survey to actually increase the survey accuracy and reliability.


INTRODUCTION
UAV are increasingly used in photogrammetric environmental monitoring, providing still accurate results, fast acquisition operations and almost fully automated processing at low cost. Traditionally, data co-registration of survey epochs relied on GCP to keep the reference system unchanged. GCP setting and maintenance require careful planning and additional efforts in data acquisition and processing (e.g., identifying corresponding points to improve co-registration). The diffusion of affordable UAV platforms equipped with GNSS centimetre-grade receivers might reduce, but not eliminate, the need for GCP for reliability reasons and limits on attainable accuracy. To reduce costs, alternatives to repeated GCP survey have been proposed. In (Peppa et al. 2019), a morphology-based strategy to co-register multi-temporal UAV Digital Elevation Models (DEM) is based on curvature analysis to find stable regions acting as pseudo-GCP. A somehow similar approach is presented in (Colomo-Jiménez et al., 2016) where blocks are registered using as constraint a Lidar DTM. In (Borgogno Mondino, 2015) a Multi-Temporal Bundle Adjustment is proposed to strengthen epochto-epoch registration. In (Wei et al, 2017) tie points are automatically extracted and matched between epochs by feature matching, to estimate an image-to-image registration model. In a slightly different context, (D'Angelo, 2013) runs a single Bundle Block Adjustment (BBA) including multi-temporal satellite images using SIFT extract tie points and RANSAC (Fischler and Bolles, 1981) to remove outliers. A similar approach, exploiting SfM algorithms, can be used. After accurate geo-referencing of the first (reference) survey, subsequent acquisitions are coregistered to the reference one by identifying corresponding image features. Ensure steady transfer of points over the epochs is however challenging. Changes in image scale, imaging sensor, * Corresponding author lighting or scene background (e.g., seasonal vegetation changes, landslides, etc.) can make it hard to find well distributed matches or cause matching errors. In (Wang et al., 2018) the importance of a fast, not too computationally demanding, co-registration procedure for emergency response is stressed. In (Aicardi et al., 2016) an automatic procedure selects a set of anchor (not changing) images that act as reference for epoch-to-epoch registration using SIFT (Lowe, 1999). In (Zhuo et al., 2017) some limits of SIFT detectors are highlighted and a routine more focused on the feature matching scheme is presented. In , a new approach for the multi-epoch BBA stage is presented. (Mc'Okeyo et al., 2020) proposes an original subpixel approach to band-to-band co-registration of multispectral UAV images. In this paper four different strategies of multi-epoch block coregistration, easily implementable in commercial photogrammetric software, are presented and applied in three test sites. The first paper objective is to compare their accuracy and reliability as well as their efficiency in computing time. A second objective is an evaluation of the different procedures performances as a function of the distribution of Stable (unchanged) Tie Points (STP) matched inter-epoch. Indeed, if one or more regions in the survey area do not provide such points, because of terrain creep, seasonal vegetation changes, etc., coregistration accuracy may be severely affected.

MATERIAL AND METHODS
In monitoring applications, from an operational point of view, whenever the surveyed area is interested by movements (e.g., landslides) or changing environmental conditions (for instance different vegetation outcrops due to seasonal changes or snow coverage during winter) the creation/maintenance of GCP can be, in some circumstances, extremely troublesome and implies additional costs and in situ operations. As already stated in the introduction, the use of GNSS-assisted UAV platforms does not represent a definitive solution to the problem, being required at least some GCP to improve image block control and assess the actual level of accuracy of the survey. In recent past, thanks also to an increasing diffusion and improvement of SfM algorithms, this problem has been addressed by several researchers considering the opportunity of co-registering different epochs image block together, identifying corresponding features in the images themselves. Such approach can be applied also in GNSS-assisted photogrammetric blocks obtaining a refined, (most likely) more reliable and accurate coregistration.

Epoch-to-epoch image block registration strategies
Four different strategies (see Table 1) for epoch-to-epoch image block registration have been considered, focusing on their performances in terms of reconstruction accuracy and computational load. The common element of the strategies is finding STP across epochs and run a BBA including the reference block and one (or more) block(s) acquired in later epochs.

BBA Procedure
Method The third method (Method 3) applies a single global SfM orientation procedure, using the images of both the reference and the subsequent epoch datasets together. GCP are measured only in the reference image subset (as in this case they need not to be maintained in the next epochs) and image matching is performed, in a single step, identifying both intra-(between images of the same epoch) and inter-epochs (between images of reference and slave epoch) tie points. The fourth strategy (Method 4) uses a Metashape's proprietary procedure (called "Align Chunk"), which allows two individually oriented photogrammetric blocks to be automatically aligned using a point matching procedure between the two blocks. Even if, due to commercial reasons, very few information about the used algorithms is available, some details on the procedure can be found in the Agisoft User forum (Agisoft, 2021). The procedure tries to match points between the images of different image blocks: all the steps involved in feature points detection and matching are performed from the beginning, unless the keypoints extracted during single block orientation have been kept in memory. In any case, a new image-to-image keypoints matching is performed. Unfortunately, no details are provided, to the best of author's knowledge, about the actual keypoint matching strategy (e.g., geometric or radiometric constraints used to filter out wrong correspondences, etc.). At the end of the matching process, the slave image block is considered rigid, and a conformal transformation (rotation, translation and scale) is applied to align it to the reference block. Consequently, even if apparently similar in point matching stage, Methods 2 and 4 handle the slave block quite differently in image orientation.

Automatic identification of STP (Method 2).
The proposed epoch-to-epoch image block orientation procedure automates the basic workflow implemented in Method 1. The processing pipeline assumes that the reference and slave epoch have been already individually oriented. The reference image block provides the control for georeferencing the subsequent epochs. At the same time, to speed-up and make inter-epoch tie points matching more reliable, the slave image block should be approximately registered in the same reference system. Tests so far demonstrated that even with very low registration accuracy (errors up to 5-10 m) the performances achievable by the procedure are satisfactory. An approximate initial co-registration can be achieved using the navigation solution of the UAV, regardless of the quality of the GNSS equipment used, or by manually selecting three (approximately) corresponding points after the single block orientation stage. The purposedly-developed software acquires all the tie points in each block using the python scripting interface implemented in Metashape. Unfortunately, the corresponding keypoints information, in particular the descriptor data associated to each image point, is not exposed by the Metashape Python API (Application Programming Interface). The software can therefore compute a feature descriptor vector for each image point or can access directly the keypoints data stored in the Metashape project file. In fact, even if not officially documented by Agisoft, its file format and the related data structure can be quite easily inferred. Apparently, if author's understanding is correct, Metashape uses a Local Difference Binary (LDB) (Yang and Cheng, 2013) descriptor represented by a binary string of 61 bytes. In LBD descriptors the data represents intensity and gradient difference tests, whose results are packed in 3 bits chunks in the binary string, computed on a grid (commonly using a multi-gridding strategy) in the local neighbourhood of the selected image (feature) point. The actual feature extraction and description algorithm used by Metashape cannot be inferred but, if the assumption on the descriptor structure is correct (as the results seems to confirm), comparing two binary string for point matching using the Hamming distance (Hamming, 1950) metric seems the most obvious choice. A single reference epoch tie point at a time is considered, comparing the feature descriptors associated to its corresponding image points with the tie points and corresponding image features located inside a pre-defined horizontal search radius in the slave block.
For each tie point correspondence, a similarity score is computed as the average Hamming distance of all the descriptors combinations between the reference and slave image points. The best (lowest score) is selected as indicating a probable match, but the connected tie points correspondence is considered valid only if the similarity score is low enough (indicating a good radiometric match between reference and slave image points) and it is significantly lower than the second-best score (which should indicate a good level of uniqueness of the matching). The search radius can be customized by the user: if the initial co-registration is already good, using a small search radius improves drastically the computational load and the matching identification rate. For approximately co-registered image blocks, as stated before, a bigger search radius (e.g., 10-20 m) can still provide a good level of reliability, at least according to our experiments, but, involving a much higher number of combinations to test, implies longer processing times. The matching procedure is repeated swapping the reference and the slave block i.e., selecting every single tie point in the slave dataset and searching for corresponding reference tie points. Every match not corresponding with the one identified in the first pass is discarded. At the end of this procedure most of the inter-epochs tie point matches should be correct, but still some outliers might be expected, like points extracted in low-contrast or repeated pattern regions, etc. At the same time, being the matching procedure based so far only on radiometric information (i.e., the similarity of corresponding tie points) it cannot be excluded that, for some identified matches, the point changed its absolute position between the two epochs (think for instance of a soil slip). For these reasons, an additional outlier removal procedure is performed. Using a RANSAC robust estimator, a conformal transformation between the matched tie point object coordinates in the reference and slave image block is computed. The acceptance threshold, i.e. the maximum distance allowed between corresponding tie point positions after transformation to be considered inlier, can be set according to the expected ground point accuracy of the two image blocks. At the end of the matching procedure the identified STP are used as GCP in the slave block and a final BBA is computed.

Test sites
Three different test sites in Italy were considered for the experiment. As the sites undergo increasing rates of environmental changes, so are the levels of challenge in the identification of STP through the epochs. The first test site (Site A) is located in the southern part of Parma University Campus (44°45'51'' N; 10°18'33'' E) and refers to a stable (unchanging) area with buildings (from 6 m to 35 m high), roads, car parking lots, sporting facilities and meadows (see Figure 1 Top). Eight photogrammetric blocks have been flown on the same day between 12 AM and 6:15 PM. Consequently, except different distribution of cars in the parking lots, the image blocks depict a very stable, unchanging area. Two more flights took place the next day, after heavy rain during the night. Flights were carried out with a fixed wing SenseFly eBee-RTK equipped with a compact camera Sony Cyber-Shot DSC-WX220 (1/2.3'' sensor with a resolution of 4896x3264 pixels, 25 mm equivalent to 35 mm format focal length). The UAV platform is equipped with an L1/L2 GNSS receiver which acquires the differential corrections coming from the master station through ground radio modem allowing centimetre grade positioning of camera stations. A first series of eight acquisitions, enclosing an area of about 500x330 m 2 was performed. Flight planning was designed with flight lines oriented along the main block side (roughly eastwest) with each block (about 150 images) flown at about 90 m above ground level, with a GSD of about 23 mm. The same dataset has already been used for the experiments reported in (Benassi et al., 2017) and in (Forlani et al., 2018). A second dataset made of two image blocks (ca. 110 images each) enclosing a smaller area (ca. 330x300 m 2 mainly developed along south-north direction) and with a lower flight altitude (about 70 m a.g.l.) has been considered in this experiment. Twenty-six signalized targets were deployed and surveyed with both a Leica 1200 and a Geomax Zenith 20 GNSS receivers, just before the UAV flights. Only fifteen of them are visible in the second dataset. The average standard deviation of the repeated GNSS measurements on each point is 7 mm for the horizontal position and 11 mm for the elevation. The second test site (Site B) considers a construction site, in an area subject to landslides: Plàn-Chécrouit (45°47'11" N; 6°57'17" E) is located in the Valle d'Aosta region (Italy) on a south-facing hillside at the foot of Mont Blanc. The average altitude of the area is 1100 m above sea level. The test area (approximate size 700x300 m 2 ) can ideally be divided into two (Figure 1 middle): the upper part is characterized by the presence of thick vegetation and is partially affected by a landslide movement that also partially involves the lower one. The latter is mostly covered by grass and hosts some touristic facilities of the nearby ski resort.
The acquisition considered a time span of approximately 4 months from September 2020 to the end of November 2020 with a new image block acquisition approximately on a monthly basis (four image blocks). During this period, the site was interested by construction works mainly devoted to securing the area from the landslide (terrain reshaping, dried/fallen vegetation removal, rock fall barriers installation, etc.). This second test site therefore represents a case study where moderate, localized changes occur. For all the four epochs a DJI Phantom 4 PRO UAV has been used for image acquisition. It is equipped with a 1'' CMOS FC6310 camera with a resolution of 5472x3648 pixel and with a 24 mm equivalent focal length optics. Differently from the previous case, the UAV is not equipped with a centimetre-grade GNSS receiver, and its navigation data can be used in a GNSS-assisted BBA only to obtain a first, approximate, image block geo-referencing. All the image blocks were acquired at flight altitude of ca. 90 m above ground level (GSD is ca. 24 mm/pixel), for a total of about 280 images (9 strips roughly east-west oriented) each. Twentyfour signalized targets were surveyed with a Geomax Zenith 20 GNSS receivers, just before the first flight. The third test site (Site C) is located near the village of Cenova (Italy -44°1'54'' N; 7°53'41'' E) and covers an area subject to periodical landslides and debris flows (see Figure 1 Bottom). The debris flows, first occurring in 2016, reactivated in 2019, causing extensive damage to the nearby village. The phenomenon has been aggravated, in the last two years, by the more frequent very rainy periods (during autumn) followed by dry seasons (summer) that affected drastically the stability of the slope. The area is currently monitored by the Cima Research Foundation and the University of Florence. This case study represents, in the context of the current experiment, the most challenging test. The area, quite steep, is subject to two types of changes: the seasonal cycle of the leaves in the wood at the landslide area sides, and the changes in the proper landslide area: boulders fall and debris flow due to rain and gravity. Three different monitoring epochs, each two months apart, have been considered: January, March and May 2020. For all the acquisitions a DJI Phantom 4 PRO UAV has been used. All image blocks were acquired at a flight altitude of ca. 90 m above ground level (GSD is ca. 24 mm/pixel), with approximately 170-200 images (20-22 short strips following the contour line of the slope roughly in north-east/south-west direction). In the first epoch sixteen GCP were surveyed along the edges of the area monitored. During the four months period between the first and last acquisition some GCP were torn away by the landslide. In the end only ten points, common to all the epochs, have been used as check points.

Data processing and results comparison
All the image blocks were processed using Agisoft Metashape v. 1.7.1 running on an Intel I9-10920X PC with 32 GB of RAM and equipped with an NVIDIA GeForce RTX 2060S graphic card. Each single epoch has been oriented (recording the total processing time) individually using the image station coordinates acquired by the UAV GNSS as ground control: the eBee UAV used in site A has RTK centimetre-grade accuracy, while the onboard Phantom 4 receiver used in site B and C has much worse precisions. In the first epoch of each site the GCP collected were used as ground control. The same ground points are used as Check Points (CP) in the subsequent epochs. All these single oriented image blocks provide support for Method 1, 2 and 4. For Method 3, on the contrary, two epochs' datasets (the first always being the first acquisition) are merged together. The ground points are used as GCP on the images of the reference epochs, and as CP on the images of the slave block. Method 1 (manual identification of corresponding points), being extremely time-consuming and (for this reason) unlikely exploitable in real-world scenarios, has been performed only for Site B. It is here considered as a gold-standard (although not usable in practice) for evaluating the actual performance of the automatic procedures. For all the methods the difference between known and estimated coordinates of the CP are computed and stored in a Postgres Database. The RMS (Root Mean Square) of the differences, in the following indicated with RMSE, of all the CP of each epoch is computed as well. The time required to perform the coregistration process for each method is recorded as well and summed (for Method 2 and 4) to the time required to orient the slave image block.   (Benassi et al., 2017)). For instance, the minimum (best) planimetric RMSE obtained in the 2017 tests in a single epoch is 12 mm (using 12 GCP -14 mm if GNSS assisted BBA is used) and is 14 mm for the altimetric component (using a GNSS assisted with an additional GCP). Such results indicate that, even if a centimetre-grade GNSS equipped UAV is used, the co-registration of inter-epoch image blocks can actually improve, although not drastically, the final accuracy of the image block. Method 4 provides the worst results, especially for the vertical component, with RMSE, compared with the other methods, 2 to 4 times higher in the planimetric component and 5 to 8 times higher in Z direction. One, obvious explanation of such suboptimal performance is reported also in the Agisoft user forum (Agisoft, 2021) where it is clearly stated by the software technical support that this method should be considered less accurate than orienting all the epochs images together, since each epoch image block is treated as a solid model to whom an Helmert transformation is applied after inter-epoch image matching. This implies that a possible image block deformation during the slave epoch orientation, due for instance to a not enough strong/rigid image network, is left unchanged after the "align chunk" operation. Figure 3 shows the processing time for each automatic coregistration methods in each test site. Processing time for method 1 is not shown being out-of-scale with the others (approximately one-day work of a skilled operator for co-registering a single block). As shown in Figure 3, Method 2 is incredibly fast if the matching tie point search radius can be set to low values: in Site A, where the two image blocks were already well co-registered from the beginning (leveraging the GNSS-assisted capabilities of the implemented UAV) the search radius was set to 50 cm. On the contrary, if the two epochs image blocks are only approximately co-registered (as in Site B and C), the search radius should be much higher (in these experiments it was set to 7 m) and the entire procedure takes more time (8-10 minutes). Nonetheless, Method 2, even in these sites, is faster than the method 3 procedure. Surprisingly, Method 4 proves to be the most time-consuming, requiring 30 minutes at least in each scenario, with a maximum of almost 1 hour for Site B.

Method 2: influence of STP dataset size
3.2.1 Sensitivity to STP density Additional tests have been carried out for Method 2, considering the influence of STP numerosity on the final orientation solution.
In other words, the STP spatial density has been homogeneously reduced by consecutive steps with a Python script, to find out if this results in a substantial less accurate co-registration. The algorithm implemented for reduce the STP spatial density, subdivide the entire surveyed area in a grid of equally spaced cells. The entire population of STP is then considered, one point at a time, adding only one single point in each cell. The process is iterated, assigning more points to each cell, until the desired total amount of STP have been assigned. The remaining, not assigned, STP are consequently discarded. In this way a basically uniform distribution of STP can be obtained throughout the area. Test sites A and B have been identified as representative to this purpose, the former being a stable site, the latter being a more dynamic environment, surveyed over a larger time span, and a relatively poor initial co-registration (with metre-level camera positions). Figure 4 and 5 Figure 4, the RMSE increases by about 2.5 cm in elevation and by 0.6 cm in planimetric coordinates when the number of STP decreases from 10079 to 17, the latter value conceivably close to the "fair amount" of GCP required for block control in Site A. In XY the growth trend is very smooth. In elevation the RMSE is almost stable until about 400 STP are used, then jumps by about 2 cm. Notice that the planimetric error being overall steadier than the altimetric one could be expected, at least to some extent. Indeed, Method 2 looks for matches in a horizontal window: therefore, discrepancies in elevation are not accounted for. Results for Site B are similar to those of Site A, with RMSE in horizontal coordinates that deteriorates very little with decreasing STP numbers. Likewise, the RMSE in elevation is first steady and then increases quickly. These results highlight that the number of STP needed for a good co-registration is somehow higher than the "fair" number of GCP one would otherwise use.

Sensitivity to STP spatial distribution
Another test carried out for Method 2, considered the sensitivity of the image block orientation as a function of the STP spatial distribution, removing the points in specific areas of the test site to simulate a wider region subject to significant changes. In other word, for the automatic procedure to be effective in real cases, it is important to assess its robustness against the size and the location of the area subject to changes compared to the overall survey area. Indeed, a lack of STP can be expected in the changed areas and this will affect the stability of the block just where its steadiness would be more necessary to assess its ability to detect changes. Should the accuracy loss be significant, the alternative would be to increase the inner block strength (e.g., by adding cross strips) or to add newly measured GCP in the changed areas. A procedure to simulate the above-mentioned changes in STP coverage from one epoch to the other has been setup and carried out in Site A and B. To simulate a landslide or a soil slip affecting only part of the surveyed area, the STP were completely removed from the hypothetically affected sub-area and the RMSE have been computed for CP in such sub-area only.
In Site A, which is approximately rectangular and flown in East-West direction, five sub-areas have been cleared of STP in turn: a Central band and four others in the cardinal directions (Northern, Southern, Eastern and Western areas). As Site A did not undergo changes in the time span the blocks were acquired, in principle no apparent changes should be found on the CP coordinates, if the remaining STP are effective in preventing block deformations.
As this effectiveness may depend on the STP density, the tests have been carried out at full STP density, using the original STP set generated by the software procedure, as well as on a reduced dataset (see previous section 3.2.1). It has been assumed that a STP dataset big enough to ensure the block being successfully oriented prior to clearing the changed area is available.  Figure 6 shows the RMSE on CP in the cleared regions when using the full STP set. The pattern of planimetric RMSE looks reasonable, as the central area has the best results, without discrepancies with respect to the reference line. On the other hand, leaving the border of the subarea without STP, as expected, has consequences: the RMSE is larger and so the ability to detect changes is lower. The asymmetry between the N-S and W-E regions might have to do with the asymmetry of block overlap. In elevation the pattern is not as clear, however broadly similar. Figure 7 shows the CP RMSE in the Site A cleared regions when using the reduced STP set.

Figure 7
Site A: RMSE on CP in the cleared regions when using the reduced STP set Notice that, with the reduced STP dataset of Figure 7, the reference lines are different from those in Figure 6. Comparison of Figure 6 and 7 shows that reducing the size of the STP sample when some areas completely lack STP may or may not have large effects. Indeed, the central, northern and southern areas show more or less the same RMSE as with the full coverage of the whole area, though with reduced STP set. As this is not the case for the East and West bands, this is probably due to a block weakness in that direction. The picture for the altimetry is less coherent, as also the southern area shows a marked increase of RMSE.

Figure 8
Site B: RMSE at CP when using the full and the reduced STP sets for the northern band elimination Figure 8 illustrates the results the elimination of the norther band STPs in Site B. The left bar refers to the full STPs sample planimetric and altimetric RMSE while the right bar to the reduced sample. A noticeable RMSE increase is witnessed over the reduced STPs sample, in both horizontal and altimetric components.

CONCLUSIONS
Multi-temporal image block co-registration is indeed a simple (at least conceptually) but still very effective solution whenever ground control of the photogrammetric block requires a too much demanding effort (e.g., in terms of costs, time of operations or safety requirements). It should be considered that in fast evolving scenarios, for instance an active landslide or soil slip area, if no stable points can be found, the ground control survey should be repeated at every epoch to assess the reference points coordinates at the time of image acquisition. In the last few years, several researchers proposed different approaches to identify points common to different epochs, in order to co-register multitemporal datasets without (or with a limited use of) surveyed ground control points. Some of the proposed solutions are extremely refined and articulated. Others are less complicated but still very effective. In this work three different automatic co-registration methods, all applied using a very popular photogrammetric software package, are compared in terms of achievable accuracy and computational load / processing time. The first (Method 2) proposes an original, yet very simple and straightforward workflow implemented in an in-house developed software that, interacting with the same commercial software, assists the identification of common, stable tie points between two different epochs image block sets. The other two strategies (Method 3 and 4) can be used out of the box. A fourth approach, requiring an operator to manually identify the inter-epoch common (stable) points, has been considered in only one of the three tested case study, as a gold-standard method, being impracticable in today's real-world scenarios. It has been demonstrated that at least two of the three different automatic strategies (Method 2 and 3) can achieve satisfactory results, comparable with the gold standard manual approach. Method 3, consisting in a single global SfM orientation procedure, using the images of both the reference and the subsequent epoch datasets together, provides the best results, except in one single epoch tested (in Site C). Method 2 performances are almost on par, providing in the tested datasets (Site A and B) only 20% higher RMSE but with shorter processing times. Its approach, however, seems more robust and reliable, especially when strong movements occur in just part of the area, as in the more problematic Site C. Finally, it is worth noting that, in Site A, the multi-temporal coregistration procedure can achieve better results if compared with individual registration (georeferencing) of the block at each epoch by GNSS-assisted orientation. are greatly acknowledged for providing the Cenova image dataset.