SEMI-AUTOMATIC APPROACH FOR OPTICAL AND LIDAR DATA INTEGRATION USING PHASE CONGRUENCY MODEL AT MULTIPLE RESOLUTIONS

In light of the ongoing urban sprawl reported in recent studies, accurate urban mapping is essential for assessing current status and evolve new policies, to overcome various social, environmental, and economic consequence. Imagery and LiDAR data integration densifies remotely sensed data with radiometric and geometric characteristics, respectively, for a precise segregation of different urban features. This study integrated aerial and LiDAR images using point primitives, which were obtained from running the Phase Congruency model as an image filter to detect edges and corner. The main objective is to study the effect of applying the filter at different spatial resolutions on the registration accuracy and processing time. The detected edge/corner points that are mutual in both datasets, were identified as candidate points. The Shape Context Descriptor method paired-up candidate points as final points based on a minimum correlation of 95%. Affine, second and third order polynomials, in addition to the Direct Linear Transformation models were applied for the image registration process using the two sets of final points. The models were solved using Least Squares adjustments, and validated by a set of 55 checkpoints. It was observed that with the decrease in spatial resolution, on one hand, the registration accuracy did not significantly vary. However, the consistency of the model development and model validation accuracies were enhanced, especially with the third order polynomial model. On the other hand, the number of candidate points decreased; consequently, the processing time significantly declined. The 3D LiDAR points were visualised based on the Red, Green, and Blue radiometric values that were inherited from the aerial photo. The qualitative inspection was very satisfactory, especially when examining the scene’s tiny details. In spite of the interactivity in determining the candidate points, the proposed procedure overcomes the dissimilarity between datasets in terms of acquisition technique and time, and widens the tolerance of accepting points as candidates by including points that are not traditionally considered (i.e. road intersections).


Overview
Our world is encountering the largest wave of urbanization. A decade ago, only two out of ten settled in urban communities. The estimates of this number are six and seven out of ten in 2030 and 2050, respectively. This represents a threat to the available resources, which require reallocation and development efforts made by policy makers, in order to accommodate their anticipated shortage in the future (Megahed et al., 2015). The United Nations in its 2018's report stated that over 80% of the population in North America lives in urban settlements (UN, 2015). This urban sprawl is a challenge for the concerned authorities to overcome deterioration in available resources and services. Consequently, accurate and precise detection of different urban morphologies is essential for evaluating current situations, and developing effective plans to tackle anticipated urban expansion in the future. This magnifies the necessity of smart cities as a technological way out of this confrontation, where digital remotely sensed data can be collected from different sensors, integrated, processed, and analyzed (Li et al., 2013).
Integration of LiDAR (Light Detection And Ranging) and imagery data contributes to the construction of a complementary * Corresponding author geodatabase, in which geometric and radiometric characteristics are associated for better urban pattern recognition. A successful integration of both data types is conditioned by the accuracy of their registration to each other. Various registration primitives have been used to associate LiDAR and optical imagery data; points, lines, and polygons. (Mitishita et al., 2008) registered photogrammetric and LiDAR data using centers of rectangular roof planes. (Habib et al., 2005) used linear features from planar patches as primitives to also register same data types. (Yan et al., 2017) applied a polygon-based image registration technique, in which corresponding polygons in both datasets were selected, sample vertices along polygons' edges were generated, and then matched. The research work showed that high-resolution digitized historical map can be registered accurately to GIS polygons as a reference data set. (Zhang et al., 2015) registered images acquired by sensors mounted on UAVs (Unmanned Aerial Vehicles) and LiDAR data. They used objects extracted from LiDAR data and their corresponding boundaries from imagery data as control objects, and matched extracted edges and corners, then used the endpoints in coarse registration. All of these studies and similar ones look for an optimal compromise between automating the registration process, minimizing computations, sub-processes, cost of resources, and execution time, maximizing model accuracy, and generalizing registration techniques to fit different land-uses.
The purpose of the presented work is to assess the potentiality of the Phase Congruency model in generating edge/corner points as primitives for the registration of imagery and LiDAR data at different SRs (Spatial Resolutions), using multiple registration models, in terms of registration accuracy and execution time.
1.2 Phase Congruency (PC) Model (Morrone, Owens, 1987) introduced the Local Energy model to detect image features in 1D based on the postulate that edges and corners are located on points, where the Fourier components of a signal are in phase. (Kovesi, 2003) presented two major contributions to the model. First, a normalization to the local energy function known as a PC measure that identifies in-phase points as edge/corner points. Second, using of wavelets instead of Fourier transformation for local frequency analysis at different scales and orientations. The wavelets act as a bank of filters to select particular frequencies of the signal for analysis. This ensures a high spatial localization of frequency change, so that features are more associated to their neighbours in the analysis rather than being separately analyzed (Kovesi et al., 1999).
The PC model has been implemented with promising results in several studies that serve various applications. (Estépar et al., 2006) detected airway wall in tomographic images in an attempt to better understand the genetic pathways for the chronic obstructive pulmonary disease development, (Rijal et al., 2012) localized zones in chest X-ray images infected by pulmonary tuberculosis, (Štruc, Pavešić, 2009) extracted line features from palm-print images for individuals identification in biometric security systems, and (Zhang et al., 2012) identified local features in finger-knuckle-print images as a promising biometric identifiers in personal authentication systems. In remote sensing context, (Ahmed et al., 2009) successfully employed the model in extracting step and line features from satellite image. However, the data required smoothing in prior to accommodate atmospheric noise. (Fan et al., 2014) used the model in registering SAR (Synthetic Aperture Radar) images. They used nonlinear diffusion scale space along with the ratio of exponential weighted average operator to reduce noise and preserve fine details in the image. Then, they calculated PC information at each key point as a cue for rejecting false key points. (Ye, Shen, 2016) achieved satisfactory results in registering opticalto-SAR and optical-to-LiDAR data. They emphasized the significance of filter orientations as complementary feature descriptive information, which the modified PC model does consider. Figure 1 illustrates the workflow applied for a 2D-2D LiDAR and optical data integration. It includes eight steps.

METHODS
Step (1): resampling LiDAR and optical images to larger pixel sizes to build a data pyramid. The remaining steps are applied to each resolution.
Step (2): filtering both images in the spatial domain, by running the PC model as a band-pass filter. It calculates a PC measure for cells' center-points in different directions (Kovesi et al., 1999). The values of the measure and the filter's orientation angle contribute into computing maximum and minimum moments for each point. The higher the moments' values, the more inclination for a point towards representing an edge or a corner, respectively (Kovesi, 2003).
Step (3): adjusting a moment threshold range to isolate edge/corner points as CCPs (Candidate Control Points) on both images. Step (4): visually investigating the likeness of the two sets of CCPs. This step determines whether the process continues automatically or semiautomatically. If an adjusted threshold range is visually found to properly segregate interrelated sets of CCPs, the approach remains automatic, and CCPs are paired-up directly in step (6). Contrarily, the methodology requires some interactivity through step (5), and, thus, turns to a semi-automatic approach.
Step (5): abstracting the two scenes to their main elements, which commonly exist in both, and targeting points overlaid their boundaries as CCPs. This is achieved by clustering the images based on their moment values, refining output patches (i.e. merge, deletion), extracting their outlines, and selecting points located on them.
Step (6): matching CCPs from LiDAR and optical images into corresponding pairs of FCPs (Final Control Points), by applying the SCD (Shape Context Descriptor) method. It is a graph-matching algorithm that correlates a pair of points belonging to two different sets, based on their distances and azimuth angles with respect to remaining points in the same set (Belongie et al., 2006). Pairs with a correlation larger than a predefined threshold value are considered FCPs.
Step (7): applying Least Squares to estimate the transformation parameters of four empirical registration models: first, second, third order polynomials, and DLT (Direct Linear Transformation).
Step (8): validating the registration models based on a set of checkpoints obtained separately from FCPs.

Study Area and Datasets
The developed approach is tested on data acquired for a 33000 m 2 residential urban area in Toronto, Canada ( Figure 2). It is part of West Rouge; a small community neighbourhood within the former suburb of Scarborough, in the south-east corner of Toronto, where Lake Ontario exists. Figure 3 shows the aerial and LiDAR data obtained for the study zone. The image has a 20 cm SR, and four radiometric bands; R, G, B, (Red, Green, Blue) and NIR (Near Infrared). It was downloaded from the Geospatial Map and Data Centre on Ryerson University library website (GMDC-RU, 2014). Airborne LiDAR points were collected by Teledyne Optech, and their properties are given in Table 1. Since both datasets have the same projected coordinate system: NAD 1983, UTM zone 17N, the aerial photo's georeferenced file was deleted to ruin the pixels' geolocation, in order to later be able to test the model registration capabilities.

Model Implementation
Computations were carried out using Python programming language v 3.7.4, on Spyder IDE (Integrated Development Environment) v 3.3.6, embedded in Anaconda Enterprise v 3.0.
ArcScene v 10.5 was used for the visualization of 3D LiDAR points. ArcMap v 10.5 was used for 2D data visualization, in addition to some minor geo-processing tasks (i.e. buffering, clustering). Besides, LAStools were applied for LiDAR data conversion and metadata extraction. Data processing was performed on a workstation with the following specifications: Windows 10 Pro for Workstations OS 64-bit, 3.2 GHz processor (16 CPUs), and a memory of 131072MB RAM.
3.2.1 Image Production: The "lasinfo" tool in LAStools was used to generate the metadata file from the LiDAR LAS file. The latter was converted to a text file by the "las2txt" tool, to be read by Python. The PC model runs on a single 2D image; hence, an individual discriminative radiometric band from each data type had to be selected for edge detection. Since the landscape consists of elements variant in material intensity; building roofs, asphalt roads, driveways, land markings, curbs, sidewalks, and grass, the NIR layer was the selected input imagery band for the model. On the other hand, these elements were found to have a narrow height difference in the LiDAR data; consequently, an intensity-based image (C2, 1064 nm) was the other input for the PC model. This image was generated by the "LAS Dataset to Raster" tool in ArcScene, with the same resolution of the aerial image; 20 cm. The tool applies a binning interpolation, where the value assigned to each cell is the average of all points' within it. Values of empty cells are determined by linear interpolation (Esri, 2016b).

Feature Detection:
The PC model implementation and moment calculations were performed according to the set of equations illustrated in (Kovesi et al., 1999) and (Kovesi, 2003). The bank of filters were Gabor wavelets as described in (MacLennan, 1991). They were designed at five different scales and eight orientations; starting from 0 o and increasing by 45 o . Kparameter; the scaling factor, was altered to 0.01 and 0.05 while processing the aerial and LiDAR images, respectively. These values resulted in distinct moment patterns. The model was applied on five lower SRs: 40, 80, 120, 160, and 200 cm. The resampling was carried out using the cubic technique, which determines the resampled cell value from its 16 nearest input cells (Esri, 2016c).

CCPs Selection:
The scene abstraction was achieved by clustering the moment points -after being converted to an image-to the minimum number of classes that were seen to be identified in both images. The clustering was carried out through "Iso Cluster Unsupervised Classification" tool in Ar-cMap. It functions in an iterative manner after defining the number of desired classes. Pixels are grouped to clusters of a close mean value. The trials terminate when a maximum number of iterations or a convergence is achieved (Esri, 2016a). These clusters were converted to polygons, which were converted to polylines via the "Raster To Polygon" and "Polygon To Line" tools in ArcMap, respectively, to extract the clusters' outlines. Moment points located in a buffer distance (half the SR) around these boundaries were targeted as CCPs.

FCPs Selection:
The SCD method matched the two sets of CCPs; from aerial and LiDAR images, as given in (Belongie et al., 2006). In each set, a log-polar diagram was created for each CCP, based on its 2D geometric relations with the rest of CCPs, in terms of distances and azimuth angles. They are the diagram's X and Y axes, which range from 0 to maximum normalized distance, and from 360 o to 45 o , respectively. The diagram graphically plots these relations by dividing each axis range into bins to form a grid, where each cell contains the number of CCPs that are within the same bin's distance and angle boundaries. The correlation between log-polar diagrams of both CCPs sets were calculated, and pairs were matched on a one-to-one basis, to eventually have a CCP in one set, paired-up with a single CCP of a maximum correlation, in the other set. Number of bins for distances and angles were set to five and eight, respectively, and only pairs with correlation ≥ 95% were selected as FCPs.

Data Registration and Model Validation:
Polynomial models of first, second, and third orders (Nielsen et al., 2004), in addition to the DLT model (Chen et al., 1994) were used in the registration of LiDAR and imagery data. A set of checkpoints that were collected manually on both images was used for model validation. The model development error compares the calculated versus the observed coordinates of the CCPs on the aerial data. Whereas, the model validation error compares the calculated versus the observed coordinates of the checkpoints on the aerial photo. The parameters of the model with the highest and most consistent accuracies, and least processing time were used to associate the 3D LiDAR points to their corresponding cells on the aerial photo. The results show that the PC model did not yield an edgedifferentiating pattern when run on the original SR, at which the filter looks for a change in a slim range of frequencies, where a point appears significant, while it is not, with respect to the entire image. The maximum moment gave more definite feature detection than the values of the minimum moment, since the scene is richer in edges than in corners, as a typical residential urban morphology with abundance of manmade elements. Figure 5 shows the moment values of both data, with high values concentrated around the features' edges. Nevertheless, it was infeasible to detect interrelated edge points by directly adjusting a threshold moment range, because of the PC model noise sensitivity. Despite a noise compensation term has been introduced to the PC's normalized equation (Kovesi, 2003), noise in the context of this study is quite different. It implies the dissimilarity between some features in both datasets in regards to existence (as discussed earlier) and reflectance. For instance, the oval sidewalk at the upper left corner of the area ( Figure  2) is more highlighted in the LiDAR image ( Figure 4); consequently, in its output moment ( Figure 5). (Megahed et al., 2020) collaborated further in addressing other sources of noise, such that aerial and LiDAR data are originally different in terms of collecting sensors, acquisition mechanisms, and data dimensions. Additionally, shadows in aerial photos are usually misinterpreted as ground features that is why they appear in the moment output. Moreover, there are no LiDAR pulses returning from water surfaces, if they exist in a scene, which prevents those bodies from appearing in LiDAR images, while they are presented clearly in corresponding aerial photos. Hence, the model turned to be semi-automatic due to the interactivity required to get over this drawback. Figure 6 illustrates the clustering and edges extraction applied to eventually identify CCPs sets. Final clusters, clusters' outlines in red lines, and CCPs in yellow dots are shown in Figures 6.a, 6.b, and 6.c, respectively.

RESULTS AND DISCUSSIONS
The size of the CCPs data at each SR is given in Figure 7. The higher the SR, the larger the number of points targeted as CCPs are. This is more pointed out at a SR of 40 cm that has a serious size of CCPs compared to the second lower SR; 80 cm. Running the proposed model on fine SRs significantly increases the time required for the SCD method to produce FCPs. The algorithm of the SCD method implicitly builds a correlation matrix that has a number of rows and columns equals to the size of CCPs on imagery and LiDAR data, respectively. Such massive computations consumes hardware and time resources.
The workstation used in this study could not process the size of CCPs at SRs 40 and 80 cm. Consequently, they were downsampled to 1748 and 1987 CCPs at 40 cm, and 1684 and 1754 CCPs at 80 cm, for aerial and LiDAR data, respectively. Figure  8 illustrates the processing time at the five SRs. The workstation is capable of handling a set of CCPs around 2000, which is close to the 120 cm SR case. However, the processing time exceeded one hour and a half, which is too long for a relatively small study zone. It is worth to mention that the processing times at SRs 40 and 80 cm, in Figure 8, are calculated based on the downsampled CCPs sets.
The number of CCPs in this study is directly proportional to the SR. However, downsampling the size of CCPs for more efficient computations in terms of processing times should not impact the registration itself, as long as redundant, well-distributed, and interrelated control points are maintained. To illustrate, a third order polynomial, as registration model example, has 20 unknown parameters and mathematically requires at least ten pairs of control points to solve the system, in general. Nevertheless, this is an overdetermined system of equations that has no exact solution; rather, Least Squares looks for a solution with the smallest error.
First, downsampling does not affect the redundancy essential for obtaining an optimal solution, as the system remains overdetermined after downsampling. Second, downsampling was carried out evenly on the points located along the boundaries. At every two or three points, one point was eliminated, which decreases the crowd of points on one hand, and keeps a non-clustered point distribution on the other hand ( Figure 6.c), which is vital in order to avoid distortion at areas that lack control points. Finally, the SCD ensures a correlated set of FCPs as an input to the registration process. It is worth to mention that downsampling reduces the size of CCPs prior to running the SCD, and the latter is the mechanism that determines the FCPs, which contribute to the registration, and by default have a drop in size in comparison to the CCPs' size, regardless the downsampling step.  The SCD method produced FCPs that are 95% correlated, at least. This threshold was set to provide associated FCPs that are evenly distributed over the area, to ensure decent registration. Figure 9 shows the distribution of the FCPs on the optical imagery and the LiDAR intensity images at a 200cm SR, in yellow dots. Information about FCPs at the remaining SRs is given in Table 2.
The coordinates of 55 checkpoints ( Figure 10) were collected on the original SR version of both datasets, and were used to assess the registration models at each SR. It is worth to mention that the errors mentioned below are RMSE (Root Mean Square Error). The model development error is plotted versus the model validation error for each registration model, at each SR ( Figure 11). Generally, both errors are consistent to each other with the decrease of SR, which is an indication of a representative model. In centimetre units and with respect to the corresponding SR, the model development and validation errors do not change significantly. The value is acceptable for the data at a 200 cm SR registered by the third order polynomial model, since it is equivalent to one pixel. However, the model development errors for the datasets processed at the remaining SRs even with different registration models are not promising. A justification for this is the poor performance of the PC model in filtering images at fine SRs, as the effect of the aforementioned noise that falls off with the decrease in level of details. Indeed, these are vital gains behind applying the proposed methodology in the integration of LiDAR and imagery data. First, the model yields edge/corner patterns on versions of input images resampled at lower SRs, which significantly speeds up the processing times. Second, accuracy results do not differ from their counterparts at higher SRs. Additionally, acquisition of points as registration primitives is no longer limited to conventional locations; such as roof corners and land markings, which are limited to residential urban images obtained at fine SRs. In fact, (Megahed et al., 2020) confirmed the potentiality of the model in identifying control points as primitives for the registration of imagery and LiDAR data obtained for various urban morphologies; industrial, residential, and coastal shore. They suggested that data of large size can be divided and processed into tiles, to accommodate long execution times.
The parameters of the third order polynomial model which registered the data at a 200 cm SR were eventually chosen to register the 3D LiDAR points to the original RGB aerial image (Figure 12), due to the model's lowest processing time and error, beside the consistency between both accuracies; model development and model validation. The qualitative validation of the colored LiDAR points is very satisfactory, as the points inherited their corresponding radiometric characteristics from the aerial image. This is obvious by zooming in and out, scanning the scene, and examining its small details. To visually support this outcome, another registration was performed using the parameters of the poorest findings resulted from an affine polynomial at a SR of 40 cm. Figure 13 shows both registrations versus each other at three different zoomed-in positions, where misalignments of inherited RGB properties take place at roof edges and tree crowns in the second registration.

CONCLUSIONS
Urban sprawl has begun to be a serious challenge for governments. Working out its consequences requires the integration of remotely sensed data obtained by different sensors, for more accurate classification of different urban features. Such classification contributes in the precise evaluation of current situations, and, thus, designing appropriate future plans to deal with the expected shortage in resources and services. This study integrated LiDAR and aerial data obtained at a 20 cm SR for a residential urban land-use in Toronto, Canada, using a semi-automatic approach. The proposed methodology tested the accuracy of the PC model, as an edge/corner filter, against five lower SRs of the data input (40, 80, 120, 160, and 200 cm), registered by various empirical models. The corresponding processing time was recorded as well. The following procedure was applied for each SR. Aerial and LiDAR data were input to the PC model. The filter outputs were clustered into a number of classes that ensures mutual patches on both datasets. Clusters' boundaries were extracted, and points located around them were targeted as CCPs. Afterwards, the SCD algorithm matched the two sets of CCPs; on aerial and LiDAR data. It resulted in pairs of FCPs, each with a correlation higher than 95%. Four registration models were used to register both sets of FCPs; affine, second and third order polynomials, beside the DLT model. The transformation parameters were estimated by Least Squares. Finally, the models were validated by a list of 55 checkpoints collected on both images.
Applying the presented methodology on lower SRs of the original imagery and LiDAR data was found to be beneficial for multiple reasons. The PC model yielded no edge pattern at fine SRs (20 cm). Additionally, running the proposed method on high SRs (120 cm) significantly increased the processing time (102 minutes). Furthermore, registration accuracies at low SRs (200 cm, third order polynomial registration model, RMSE equivalent to one pixel) did not significantly differ from those resulted from higher SRs, which suggests the poor performance of the PC model in feature detection when performed on fine SRs. The third order polynomial model at a 200 cm SR gave the lowest and most consistent model development and validation errors. On the other hand, variations between both datasets in regards to acquisition technique and time did not allow for a direct selection of CCPs by applying a threshold range on the filter outputs. Alternatively, the model switched to be semiautomatic by proposing a scene abstraction solution to overcome this problem. Nevertheless, this bypass when applied at lower SRs does not limit the collection of CPs to the traditional types (i.e. at roof corners and road intersections).