AN AUTOMATIC ICP-BASED 2D-3D REGISTRATION METHOD FOR A HIGH-SPEED BIPLANAR VIDEORADIOGRAPHY IMAGING SYSTEM

High-Speed Biplanar Videoradiography (HSBV) is an X-ray based non-invasive imaging system that can be used to derive dynamic bony translations and rotations. The 2D-3D registration process matches a 3D bone model acquired from magnetic resonance imaging (MRI) or computed tomography (CT) scans with the 2D X-ray image pairs. This study focuses on the registration of MRI data as it can acquire detailed soft tissue contrast that cannot be easily discerned in CT scans. A novel 2D-3D registration method is reported in this paper that is suitable for the MRI-based bone models with high precision and high efficiency. In addition, an automatic initialization procedure with 64 starting poses is established to avoid user intervention in the registration. The method has been tested using the HSBV image sequence of a knee joint during walking. Thirty-five consecutive poses from the sequence were tested for the registration, and 50 non-consecutive poses randomly selected from the sequence were tested for the automatic initialization. The registration precision for each axis was 0.49 to 0.54 mm. For the initialization validation test, 48 over 50 frames were successfully initialized and two failed due to portions of the joint falling outside of the field-of-view of the system. The average time for each initialization is only about 6 min. The improved 2D-3D registration will allow determination of precise 3D kinematic parameters with high efficiency. These kinematic parameters can be used to calculate joint cartilage contact mechanics that provide insight into the mechanical processes and mechanisms of joint degeneration or pathology.


INTRODUCTION
High-speed biplanar videoradiography (HSBV) or dual fluoroscopy (DF) combines two X-ray sources with high-speed video cameras to generate a dynamic view of the object of interest. The term biplanar or dual refers to two plane views of the X-ray images, which is in contrast with single fluoroscopy that can only acquire a single image per frame. By using low dose X-rays, both high spatial and temporal resolution 2D radiograph pairs can be acquired with a HSBV system, which yields accurate 3D measurements of the joint kinematics (Li et al., 2008;Anderst et al., 2009;Miranda et al., 2011). Nowadays, the HSBV technique is widely used in medical studies and practices including, but not limited to, kinematics analysis (Li et al., 2008;Sharma et al., 2012;Thorhauer, Tashman, 2015), orthopedic surgeries (Yamazaki et al., 2004;Bingham, Li, 2006), and imageguided surgeries (Otake et al., 2012). For example, with the registration of 3D bone models for both the tibia (shank bone) and the femur (thigh bone), tibiofemoral soft tissue model overlap can be quantified to estimate cartilage deformation or contact Thorhauer, Tashman, 2015;Yin et al., 2017). An increased cartilage deformation rate under loading has been proposed to potentially be an early sign of osteoarthritis (Frobell et al., 2010;Sharma et al., 2015). After loading, cartilage thickness ranges from 0.3 mm to 1.2 mm for early osteoarthritic or healthy knees . Therefore, the cartilage contact estimation accuracy should be sub-millimetre or higher for early osteoarthritis diagnosis, which requires at this level of accuracy for kinematic parameter estimation. Anderst et al. (2009) reported the root mean square errors of the kinematics * Corresponding author measurements as 0.69 mm to 1.54 mm for translations and 0.54º to 1.75º for rotations in a dynamic trial.
The HSBV system in the current study, shown in Figure 1, is composed of two X-ray sources (time-synchronized, G-1086, Varian, USA), two X-ray image intensifiers (406 mm diameter, E5876SD-P2A, Toshiba, Japan), and two high frame rates video cameras (DIMAX, PCO, Germany). An instrumented treadmill (Bertec, USA) is positioned with the imaging system to allow for dynamic movement such as walking or running. The setup of the imaging system allows any part of the body to be observed. It is most frequently applied to the knee, foot, and cervical spine.
The HSBV procedure requires 3D reconstruction of the bone models, HSBV system self-calibration, radiograph image pair sequence acquisition, and 2D-3D registration. The 3D reconstruction from the computed tomography (CT) or magnetic resonance (MR) scans provides the 3D bone model for the analysis. This study focuses on the registration of MR data for two reasons: 1) MRI scans are free of ionizing radiation. With the radiation from X-ray imaging, CT scans would add another source of radiation to patients; 2) MRI scans can acquire detailed soft tissue contrast that is difficult to acquire from CT scans. Although this information is not necessary for the registration procedure, it is important to generate subject-specific cartilage models for accurate cartilage contact estimation (Thorhauer, Tashman, 2015). The purpose of the photogrammetric HSBV system selfcalibration is to accurately determine the system geometry, which includes the relative orientation parameters (ROPs), the interior orientation parameters (IOPs), and the image distortion additional parameters (APs). The direct linear transformation (DLT) method is commonly used in the HSBV system (Tashman, Anderst, 2003;Bingham, Li, 2006) thanks to its uncomplicated procedure. However, under the DLT method, each image is calibrated and oriented independently, thus the IOPs can vary among images and the common ROPs cannot be easily enforced. Therefore, the DLT method cannot provide the highest possible calibration and reconstruction accuracy. Bundle adjustment is a single-step, self-calibrating method which models the image distortions and the system geometry simultaneously (Lichti et al., 2015;Al-Durgham et al, 2016). A common set of IOPs and ROPs can be acquired in a multiple image calibration process using image pairs taken at different epochs. Because of data redundancy and parameters consistency, the calibration accuracy will be improved. An automatic calibration system was proposed to reduce user intervention during the procedure (Al-Durgham et al., 2016). Preliminary data suggests that the improved bundle adjustment approach may reduce noise in the kinematic outcomes .
Once calibrated, image pair sequences of a patient walking or running are captured by the HSBV system. The frame rate of the system ranges from 6 to 250 Hz depending on the motion to be captured. For example, running requires higher frequency while lower frequency is suitable for squats. The six degrees-offreedom (6DoF) of the femur and tibia, also called the transformation parameters with three rotation and three translation parameters, are determined by estimating the pose of the 3D bone model from the image pair at each frame ( Figure 2). By estimating the 6DoF of the tibia and the femur for each frame, the dynamic 3D motion of the bones can be obtained.
The 3D motion reconstruction can be acquired by marker-based or model-based registration. The marker-based Roentgen stereophotogrammetry analysis (RSA) system is well established and can accurately estimate the accurate 3D position of the markers implanted in bones and the kinematics of the skeletal segments (Kärrholm, 1989). However, RSA is an invasive procedure that requires the implantation of radiopaque markers. The risks associated with the implantation procedure and the adverse effects after the surgery need to be considered. The model-based 2D-3D registration is a modern technique that enables the non-invasive 3D motion capture for the HSBV system. Thus, the current research focuses on the model-based 2D-3D registration. In this paper, the 2D-3D registration is the process that matches a 3D bone model, acquired by MRI, with the 2D radiographs thereby providing estimates of the 6DoF of the 3D bone model. There are two commonly used methods for model-based 2D-3D registration. The first is intensity-based, which matches the intensities of the pixels and voxels and requires digital reconstructed radiographs (DRRs) generated from the 3D CT data. This method is not suitable for the MR data as there is generally no physical correspondence between MR-based DRRs and radiographs (Markelj et al., 2012;Oliveira, Tavares, 2014). The second is the feature-based method that matches the feature locations in 2D and 3D and minimizes the distances between the corresponding points. The iterative closest point (ICP; Besl, McKay, 1992) algorithm is widely used for the 2D-2D and 3D-3D registration of surfaces and objects. But for 2D-3D registration, the ICP cannot be simply applied because of the complex central projection relationship between the 3D bone model and the 2D radiographs. Xin et al. (2006) and Li et al. (2008) applied the ICP method to the 2D-3D registration of the orthogonal radiographs acquired from a C-arm. The orthogonal setup of the C-arm has simpler geometry to work with. However, the fixed configuration limits the movement space for the patient and thus is not suitable for the free-form of the HSBV setup in this study. Additionally, the optimization method used should be considered, as the least squares minimization in Xin et al. (2006) and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton algorithm in Li et al. (2008) both lead to heavy computational cost.
The requirement of user intervention, especially for initial pose estimation (Li et al., 2008;Anderst et al., 2009) is another unsolved problem. Most registration methods require the initialization of the transformation parameters to a close approximation. Otherwise, the registration method cannot converge to the correct result. Varnavas et al. (2013) presented a virtual fiducial marker (VFM) method in which reference points are selected by the user virtually to identify the vertebra, then a bounding box is drawn by the user to obtain the initial pose. Varnavas et al. (2015) proposed a Generalised Hough Transform (GHT) based method to match the radiographs onto precalculated 2D templates to obtain the initial pose of the vertebra.
Other than the efforts made in the automation of the initial pose estimation, manual registration is still in use. Anderst et al.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) (2009) presented the user intervention is required to initialize the registration for the intensity-based registration.
In this paper, the first objective is to develop a 2D-3D registration method that is suitable for the MRI-based bone model with high precision and high efficiency. An ICP-based 2D-3D registration method is developed with 2D matching and 3D updating using the Horn's closed form transformation (Horn, 1987) to achieve fast registration with high precision. The second objective in this research, addressing the need for user intervention, is to automatically determine the initial pose for the registration. The automatic initialization procedure with multi-starting poses is presented and validated in this paper.

METHODS
2.1 Models 2.1.1 Rigid Body Transformation: The rigid body transformation can be described by the 6DoF represented by three translation parameters and three rotation parameters. In the situation that the point cloud of the 3D bone model acquired from MRI is used in the registration, for each model point i, with coordinates , the transformed coordinates are given by (1) where translation parameters rotation parameters transformed model points the rotation matrix that is parameterized by the following Euler angle sequence (2) where are the rotation matrices about the axes respectively With the above equations, the 3D bone model point cloud can be transformed to the actual pose to match with the radiographs. The three translation and three rotation parameters of the 6DoF are the unknowns to be determined by the 2D-3D registration.

Projection with Collinearity Equations:
The collinearity model describes the condition that the perspective centre of the camera, the image point, and its corresponding object point are on the same straight line. Figure 3 shows the geometry of the collinearity condition for an image pair in the HSBV system. The collinearity equations can be used to establish the relationship between the model points in the object space and the image points. The coordinates of the image points and are given by image coordinates of the i th point principal distance coordinates of the principal point image distortion correction terms image space coordinates transformed from the object space, and this rigid body transformation is given by (5) where perspective centre coordinates in the object space transformed model points in the object space acquired from equation (1) object space to image space rotation matrix Figure 3. Collinearity condition illustration for an image pair.
The calibrated system geometry provides the exterior orientation parameters (EOPs; and the IOPs of each video camera. The EOPs come from the system ROPs acquired from all the calibration images at one system setup. The calibration procedure also provides the additional distortion parameters to correct the image distortion. In the registration procedure, the projected model points and the distortioncorrected radiographs can be matched to provide the correspondences for the registration.

ICP-Based 2D-3D Registration
The ICP-based 2D-3D registration is an iterative procedure including multi-threshold Canny edge detection, matching of 2D edges and 2D projected outlines, back-projection of the edge points, and transformation estimation. Figure 4 shows the flow chart of the registration procedure.

Multi-threshold Canny Edge Detector:
The radiographs depict the internal view of an object because the materials comprising the object attenuate X-ray differently depending on the density and the structure. The variation in contrast across the image is typically lower than an optical image, especially at the knee joint comprising bone and cartilage. Canny edge detection (Canny, 1986) is commonly used in medical image analysis. However, detecting edges from the radiographs is still a challenging task. Three parameters are required to perform a good Canny edge detection: the standard deviation of the Gaussian operator, σ, the high threshold, for the strong edges, and the low threshold, for continuity. The Canny The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) detector parameters typically vary for different images. The standard deviation and the thresholding parameters often need to be adjusted carefully. One purpose of this study is to automate the registration process to avoid user intervention. Thus, a multi thresholding strategy that combines multiple Canny edge detectors is adopted. After the empirical testing with a set radiographs, a bank of Canny edge detection with 25 filters is developed. By overlaying the edges detected by the bank of filter, the edges are much stronger than those acquired from any single Canny detector. The overlain edges are further thinned based on their connectivity because there might be multiple responses to a single edge. In addition, gradient thresholding and intensity thresholding strategies are used to remove the outliers. Nevertheless, the above strategies cannot remove all the outliers. The outliers can be further removed in the automatic initialization procedure discussed in Section 2.3.

2D-2D
Matching: The 3D model points can be projected onto the 2D image plane using Equations (3)-(5) so that the 2D projected point cloud of the bone model is obtained. Then, the outline of the 2D model points can be extracted to match with the edges detected from the X-ray image, see Figure 5(a). With the 2D edge points and the 2D outline points, one-to-one correspondence needs to be established based on point-to-point distance. Starting from each edge point, the nearest outline point can be found. However, an outline point may be matched to more than one edge point, see Figure 5(b). Thus, each matched outline point should be checked. If more than one edge point is matched, only the one with the shortest distance will be kept ( Figure 5(c)). By doing this, not only is the one-to-one matching preserved, but also the amount of data is reduced. Figure 5. Projected outline points and edge points matching. (a) Points before matching, (b) matching from each edge points, (c) one-to-one correspondence.

Back-projection:
Although the matching is performed in 2D, the transformation estimation needs to be done in 3D as it involves 3D translations and rotations. In this case, the matched edge points need to be back-projected into 3D for the transformation parameters estimation. One edge point can be determined on a 3D ray back to the principal point of the camera provided the calibrated system geometry is available. Then the assumption can be made that the depth of the edge point is the same as that of its matching outline point because the edge point is considered as the 2D expression of the outline point. Figure 6 illustrates the projection of the outline points and the back projection of the matching edge points. Figure 7 shows the backprojected edge points onto the 3D bone model of the tibia from a select image pair. The scale factor to determine the depth of the outline point is given by + where 3 rd row elements in the object space to image space rotation matrix

Transformation Estimation:
The objective function of the transformation estimation is to minimize the root mean square distance between the 3D bone model points and the backprojected edge points. After the edge points are back-projected into 3D, Horn's closed-form transformation (Horn, 1987) is used to acquire the estimation of the translation and rotation parameters. The bone model points are the slave points, and the back-projected edge points are the master points. In Horn's method, unit quaternions with four algebraic parameters are used to represent the 3D rotation because of their simple algebraic properties and the lack of gimbal lock problem. Thus, in this study, the quaternions are used for the transformation estimation, but the results will be transformed into an Euler angle sequence that has physical meaning and quantifies the rotation about each axis.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)

The Iterative Registration:
After the transformation parameters are estimated at the current iteration, the 3D bone model point cloud needs to be updated using the estimated parameters for the next iteration until the termination criteria are reached. This accomplishes the process of the ICP-based 2D-3D registration. The termination criterion is an important factor for the registration process because it limits the computation time while ensuring the accuracy of the registration. Since the accuracy requirement for the registration is at the sub-millimetre level, the change in update from one iteration to the next should be less than 0.01 mm. Thus, the termination criterion is set as 0.01 mm for the translation parameters as well as the objective function value. For the rotation parameters, although the quaternions do not have the physical meaning, they can be considered equivalent to the angle-axes rotations. Therefore, based on the system geometry, the termination criteria for the quaternions are set to 10 -6 .

Automatic Initialization
The ICP is known as a local optimization method whose optimality relies on the initialization of translation and rotation parameters. To avoid a manual initialization procedure, a global optimization strategy can be used. There are many global optimization or heuristic search strategies that have been used in medical image processing, for example, the Monte Carlo random sampling (Dey, Napel, 2006), the multi-start strategy (You et al., 2001), and simulated annealing (Vermandel et al., 2006).
Here, the goal is to determine the approximate translation and rotation parameters that are within the capture range for the registration with reasonable computation time. In the 2D-3D registration, the rotation parameters are sensitive to the initial values because it cannot overcome a rotation more than a certain range. Figure 8 illustrates a failed registration due to the initial pose being outside of the capture range, thus the registered pose was flipped approximately 180º about the vertical axis from the true pose. To address this problem, the search space can be evenly subdivided. The size of the capture range and level of subdivision required for the search space depend on the characteristics of the registration method and the shape of the object to be registered. Thus, testing needs to be performed to determine the division of the search space. The detailed testing results are shown in Section 4. Based on the testing results, a reliable initialization procedure is constructed. Starting from 64 poses, the 2D-3D registration for each starting pose is performed. The one having the smallest root mean square error (RMSE) from registration is considered as the best coarse approximation of the registration. Since the initialization does not require high accuracy, it can be simplified to save the computational cost. Thus, the bone model is down sampled to reduce the data amount based on the testing result in Section 4. Also, the number of iterations for each starting pose is limited to 50 to reduce the time consumption. This limit has been validated in the testing. Another benefit of the automatic initialization is its ability to handle outlier edge points. It rejects matching pairs whose distance is larger than three times the standard deviation of the matching distance. Most of the outliers can be removed from this step, but it cannot guarantee that the edge outliers will be removed completely. Visual inspection and manual outlier removal are still required to yield high precision registration. But The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) the time consumption is largely reduced and only takes about 1 min for each image.

EXPERIMENTS
MRI scans of the right knee of ten healthy participants ages 20-60 were acquired using a high-resolution MR system (3 Tesla Discovery 750, GE, USA). The scan parameters were: repetition time 7.513ms; echo time 2.32ms; imaging frequency 127.7671MHz; flip angle 35º; spacing between slice 0.5 mm; slice thickness 1 mm; slice numbers 204; pixel spacing [0.3516×0.3516 mm]; rows and columns [512×512]. The MR images were manually segmented in Amira software (USA) to acquire the 3D bone models of the tibia and the femur. The cartilage models were segmented as well for soft tissue analysis.
The 2 x 2D radiographs of the right knee of the ten participants were acquired at the Clinical Movement Assessment Laboratory, University of Calgary, Canada. The two video cameras were synchronized by the master camera using continuous timing pulses. BV images were collected at 20 Hz for 5 min for the ten participants while walking. Therefore, there are ten series in total, named as JB001 to JB010. The radiographs were taken at a high resolution [2016×2016 pixels]. The system was calibrated (Al-Durgham et al., 2016) using the bundle adjustment method, and additional distortion corrections were performed for all the radiographs.
The computational time in this paper was recorded on a personal computer with Core i5-6400 processor, 24 GB memory and 64bit operating system. The programming and data processing are implemented in the MATLAB (v2016b, MathWorks, USA).

2D-3D Registration Testing
The registration testing was performed over 35 consecutive poses from the JB001 series. The full bone models include 188560 points for the tibia and 231162 points for the femur. The 3D bone model point clouds were randomly down sampled to 5%, 10%, 50%. The registration validation test is performed using the down sampled datasets as well as the original datasets. Table 1 and  Table 2 shows the registration precision, which is the root mean square (RMS) distance between the registered 3D bone model points and the back-projected edge points.
The 10% and 50% down sampled bone models achieved similar precision as the original data set, while the time consumption of the 50% and the full model size is greater than the 10% model size. Thus, it can be concluded that the bone model down sampled to 10% of the full dataset can be used for the 2D-3D registration without sacrificing precision. The precision of using the 5% down sampled bone model is lower by 0.02~0.04 mm, but its time consumption is reduced by 55% for the tibia and 72% for the femur compared with the 10% model size. This model size can be used for the initialization procedure that requires the faster registration and allows lower precision.
The time consumption for the registration of femur is about 2 times of the registration of tibia because of two reasons. One is that the model size of the femur is larger than the tibia which means there are more points to be processed. Another reason is that the femur requires more iterations to converge to the termination criterion. The average iteration time required over the 35 poses is 86 for the tibia, and 155 for the femur. The differing shapes of the tibia and femur could be the reason why the femur requires more iterations.
From the results of the 10% down sampled bone model in Tables  1 and 2 Figure 9 shows the registration obtained with the newly developed approach at a single frame. For this frame, the root mean square distance between the 3D bone model and the backprojected edge points is 0.89 mm for the tibia and 0.85 mm for the femur.

Initialization Validation Test
The initialization validation test was performed over 50 nonconsecutive poses randomly selected from the JB001 series. The initializations with 27, 64, and 125 starting poses were tested. To validate the initialization results, an evaluation criterion needs to be determined. Since the precision of the registration is around 0.90 mm, two times the registration precision is set as the evaluation criterion, which means any initialization whose 3D reconstruction error is within 1.8 mm is considered successful. The initialization testing is also evaluated by visual inspection, which yielded the same results as the precision-based criterion. Table 3 shows that the initialization with 27 starting poses has 40 successful counts over 50 cases, while both the initialization with 64 and 125 starting poses achieved 48 successful count. The two The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) failed cases were due to portions of the joint falling outside of the field-of-view of the HSBV system, which means that there is insufficient 2D information for the registration. It can be concluded that the initialization with 64 starting poses can determine the initial pose for the registration while only takes approximately six minutes. Based on the subdivision of the search space, the capture range for the angle is 45º and for is 22.5º. The angle has a smaller capture range because it is the rotation about the vertical axis which is more ambiguous.  Table 3. Initialization validation results.

CONCLUSIONS
The significance of this work is that a novel automatic ICP-based 2D-3D registration method with high precision and high efficiency has been developed. This method is suitable for the registration of the 3D bone model constructed from MR images so that the detailed soft tissue contract can be acquired to assist the soft tissue analysis. By using the closed-form Horn's transform, the registration time is greatly reduced. The registration takes about 9.5 s for tibia, and 20.2 s for the femur provided the model size is around 20,000 points. A practical improvement in computational speed also makes possible the automatic initialization with 64 starting poses that only requires 6 min. The registration precision ranges from 0.49 to 0.54 mm for each axis. With the fast and precise registration, dynamic 3D kinematics can be acquired for further kinematics investigation and soft tissue analysis.
Despite the success of this approach, there are many future avenues of work to be done. 1. With this dataset, only registration precision can be quantified. Accuracy cannot be quantified because the ground truth of the transformation parameters was not available. Currently, the common method to acquire the ground truth for 2D-3D registration is to use the marker-based method that requires the beads to be imbedded into to the bone. Since this is an invasive procedure, the validation testing can be performed using a cadaver knee or a 3D printed joint model in the future. 2. The MRI data were segmented manually in third-party software to acquire the 3D bone model. This is both a black box process and a potential error source. Since the errors in the bone model can affect the registration accuracy, future investigation may be required to explore other segmentation techniques, e.g., the automatic segmentation using Convolutional Neural Networks (Thaha et al., 2019;Felfeliyan et. al, 2019). 3. The essential benefit of the HSBV system is that it can acquire image series of the object of interest to provide valuable sequential data to kinematics studies. The image series are not independent of each other, which means the correlated information between the images can be helpful for the registration procedure and improve the optimization accuracy. However, in the current study, each frame of the image series was registered independently, which means the correlation in the sequential data was ignored. In this case, the estimated kinematics could have a lack of consistency and be less accurate. Therefore, a dynamic estimation method, the Kalman Filter, will be investigated in the future in an effort to improve the registration accuracy. 4. In spite of the effort to automate the registration procedure including automatic edge detection, outlier removal, and initialization, this method is still not fully automatic as it requires user intervention to remove the outliers that cannot be removed automatically to yield high precision. Further investigation is required to eliminate the user intervention procedure. The dynamic estimation method stated previously could provide additional information from the adjacent frames. In this way, the real edges can be tracked from frame to frame to fully automize the 2D-3D registration.