GUIDED CALIBRATION OF MEDICAL STEREO ENDOSCOPES

Using stereo endoscopes for 3D reconstruction in minimally invasive surgery, the calibration of the stereo system is required. Since calibration in general and of medical endoscopes in particular is a well discussed topic, this publication will focus on the user experience in the operating room. To enable the medical personnel in performing the calibration, a guided process using an augmented camera image has been developed and implemented on the basis of the Robot Operating System framework. A similarly guided accuracy check allows the user to determine beforehand if a calibration is necessary. The method is tested by performing multiple calibrations on two different stereo endoscopes. It is shown, that the accuracy check and calibration can be completed in around 2 minutes. The resulting calibration parameters for one endoscope are analyzed in terms of temporal stability. Additionally, the self-heating that occurs after the system starts is examined. It is shown that shortand long-term effects impact the stereo system stability and that regular re-calibrations might be required for use in 3D reconstruction.


Motivation
Minimally Invasive Surgery (MIS) using endoscopes has become more and more popular. For compensating the disadvantage of limited vision, missing depth perception and the lacking haptic feedback, many publications focus on possibilities of augmented reality (AR) to support the surgeon (Bernhardt et al., 2017). During surgery, stereo endoscopes are often used to provide a 3D reconstruction of the patient's abdomen (Maier-Hein et al., 2013, Wang et al., 2020. This work is part of the ARAILIS 1 project where the goal is the development of a prototype to allow the AR overlay of preoperative data during surgery on the example of the liver. This paper will focus on the important requirement of an accurate calibration. Since navigation is accomplished using SLAM (Docea et al., 2021), no external tracking system and no hand-eye calibration is required. Due to the goal of the development of an all in one system, the calibration is integrated together with the other components such as the user interface, SLAM and disparity matching.
Special focus in the development has to be given on the circumstances under which the calibration has to be performed, and the particular technical aspects of the endoscope. Additionally it has to be considered that the medical personnel responsible for the calibration have a limited amount of time and no specific knowledge about this topic. That is why the workflow does not only have to be reliable, but should also be as simple and fast as possible.
Besides the user experience, one general challenge for stereo endoscopes is the small baseline of around 4 mm and the res- * Corresponding author 1 Augmented Reality and Artificial Intelligence supported Laparoscopic Imagery in Surgery (ARAILIS) ulting base-to-height ratio of (h/b = 100 mm/4 mm = 25). In combination with the rather large image noise and low resolution this can be especially problematic.
Talking with surgeons about camera calibration the most asked question is, "How often is a calibration necessary?". So, one final aspect to be considered is the stability of the stereo system, which is not only effecting the accuracy but also the necessary frequency of re-calibration. To avoid unexpected errors in a research environment or operating room (OR) respectively, it could also be critical to provide the user with a fast system for checking the accuracy. Figure 1. Evaluated endoscopes, aligned with calibration field for the data capture to determine the self heating effects of the stereo system. Einstein on the left and da Vinci on the right.

Related Work
Many researchers have looked at different aspects regarding the calibration of medical stereo endoscopes. For endoscopes the use of some form of 2D calibration boards is common (Liu et al., 2017, Barreto et al., 2009, Mourgues et al., 2002. In photogrammetry, 3D calibration fields are generally used for high accuracy applications, although similar results can be achieved with 2D fields when using a strong image block configuration (Hastedt and Luhmann, 2015). However, an example for the use of a 3D field can be found in (Conen et al., 2016), where it is used for calibration of a trinocular camera prototype.
The influence of the user on the success and quality of the calibration, and the duration (Prevost et al., 2019) or guidance of the user (Chen et al., 2017), are rarely reported. Furthermore, descriptions of an independent accuracy check in object space or options for visual real-time feedback to the user could not be found in the literature.
This paper will therefore deal with the users influence on the quality of the calibration of stereo endoscopes, and will propose a visual guidance method for the calibration procedure of nonexpert (in particular medical) users. Special problems such as the calibration of refocusing endoscopes (Pratt et al., 2014), the rotation of the lens during surgery (Liu et al., 2017) and handeye calibration (Thompson et al., 2016, Kalia et al., 2019, Lee et al., 2017 are not considered here. There exist several methods for drawing conclusions about the stability of a camera system by comparing multiple calibrations. Instead of using a statistical test on the camera parameters, it has been found that image simulation allows better interpretation (Al-Durgham et al., 2018). Image simulation has been used for single-camera systems in aerial photogrammetry (Lichti et al., 2009) and in multi-camera systems (Habib et al., 2014). The method is explained in detail in section 2.4.
The temperature of an optical system often influences its stability and therefore its accuracy. This was examined on smartphones by capturing a fixed calibration field and performing single-image calibrations (Elias et al., 2020) and on a stereo endoscope by the observation of a known distance after starting the system (self heating) (Conen and Luhmann, 2015).

Hardware
The tests in the paper have been performed with two different endoscopes. Primarily the Einstein Vision 3.0 (in the following referred to as: Einstein) has been used to test the developed method. Additionally, the da Vinci Xi Plus (referred to as: da Vinci) was included as a comparison. This is part of the da Vinci surgical system, where the endoscope, as well as the surgical instruments, are attached to multiple robot arms that are controlled by a surgeon. For this publication the robot arms are not used to move the endoscope.
The two stereo endoscopes, are both oblique-viewing (equipped with an 30°angled tip) and use fixed focus lenses. The stereo base line is about 4 mm and the resolution for each camera sensor is 1920 × 1080 px. Since at full resolution the da Vinci is only transmitting interlaced images, a lower resolution was selected. Additionally, due to a border around the image, only 893 × 713 px where actually usable. Another difference between the endoscopes is the method for achieving a sterile environment. While for the da Vinci plasma sterilization is used, the Einstein relies on a single-use sterile cover (see Figure 3). Figure 2. Example endoscope images. Einstein on the left, with a resolution of 1920 × 1080 px. da Vinci on the right, whose resolution is 1280 × 720 px, but with a usable portion of only 893 × 713 px.
Only for the Einstein was a data sheet available. According to this the operating distance is 2 cm to 20 cm, and the focal length is 4.62 mm with two 1/3" CMOS sensors. One additional feature is the integrated heating to avoid fogging of the lenses (Aesculap AG, 2019). The given sensor would result in a pixel size of around 3 µm for the Einstein, while for the da Vinci no information about the sensor is available. Both of these particular endoscopes are only used in a research environment and not in surgery. Using a video capture card, the endoscopes are connected to a PC that is running the ROS. The calibration process was implemented in C++ as a ROS application for convenient integration into the existing system and parameter control.
The 3D calibration field is made of aluminium and consists of un-coded and coded circular markers with diameters of 2 mm. The precise coordinates of the field were determined in a selfcalibration bundle adjustment using a Nikon Z6 with a 35 mm lens. For the scale definition, three scale bars with known length were considered within the bundle adjustment. The maximum standard deviation for the object point coordinates was 9 µm, and the depth of the calibration field is 3.8 cm. The sterility of the calibration field was not considered in the development process, since the project is only working on a phantom for now.

Calibration
For compatibility with ROS, the OpenCV camera model is used (detailed below), and a bundle adjustment has been implemented using the Ceres Solver (Agarwal et al., 2019). The parameters of the relative orientation (ROP) between the cameras are part of the system model and directly adjusted together with The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France the interior orientation parameters (IOP). In order to reduce the number of necessary calibration images, object points are not considered as unknowns. A self-calibration that includes the adjustment of the object coordinates would require more images and would result in an increased calibration effort. Considering the limited image quality of the endoscopes, adjusting the coordinates might decrease robustness. For the developed method it is assumed that the 3D coordinates of the calibration field are stable and do not change over time.
For the following calibrations the principal distance is always equal in x-and y-directions (f x = f y) and the third radial distortion coefficient is set to zero (k3 = 0).
where k1, k2, k3 = radial distortion coefficients p1, p2 = tangential distortion coefficients where Xc, Yc, Zc = point in camera coordinate system Xw, Yw, Zw = point in world coordinate system with the left camera: and the right camera: where R, t = pose of left camera R rel , t rel = relative orientation

Accuracy
For determining the accuracy of the stereo system, the calibration field is used as a test object. One or multiple stereo image pairs of the field are captured at defined distance. Using forward intersection, new 3D coordinates can be calculated and compared to the known points of the calibration field.
The process consists of the following steps: 1. Capture and rectification of a stereo image pair 2. Detection of markers and forward intersection to determine the 3D coordinates 3. The transformation between the new 3D points and the known coordinates is calculated using Least-squares estimation, and is then applied to the points. The scale is not adjusted during this step. 4. The object coordinate deviations are calculated between the true and new 3D coordinates.
These steps can be applied to one or multiple stereo image pairs for more reliable results. For all the object coordinate deviations the 95th percentile (P95) is calculated as the accuracy value. It is less influenced by outliers than the maximum error and easy to interpret for the user. An advantage of this process is that it will report an accuracy over the full measurement area.

Workflow and user guidance
The goal of the developed workflow is to allow medical personnel or other users with no experience in camera calibration to calibrate a stereo endoscope. This process is directly integrated into the system. It is designed to require as few inputs as possible and to give clear feedback to the user. Note that the developed method is a first prototype. In particular, the interface as well as some parts of the data processing are not yet extensively tested and might react sensitively to handling errors. A video 2 showing the complete calibration process has been published. Figure 4. Augmented image of left camera during the data capture for the calibration. AR elements show information about deviation from optimal distance, rotation of camera, detected markers and warns whether movement of the endoscope is too fast. To enable real time processing, only part of the points is detected at first.
When the system is started, the last set of camera parameters is loaded automatically. This is directly followed by an accuracy check using the method defined in section 2.2, which is responsible for checking whether the system meets the previously defined requirements. Depending on its result, a recommendation is given to the user about whether a new calibration of the system is necessary. For both the accuracy check as well as the calibration, the user is guided to position the endoscope at the predefined positions relative to the calibration field. This is achieved through real time marker detection and pose estimation. To support the user in finding the correct position and viewing direction of the endoscope, an augmented camera image ( Figure 4) and 3D view is generated.
For the accuracy check one predefined camera position is used, where the endoscope is pointed directly at the calibration field. For the calibration multiple camera positions in a standard image configuration are defined (see Figure 5). If the calibration field is visible in the left camera several parameters are continuously checked, and only if all are within an acceptable range will the image pair be saved for later processing (full list see Table 1). For redundancy, multiple image pairs are captured at each predefined camera position.  When all the necessary images have been captured, the image processing automatically starts in the background. Marker detection is performed on all images and either the accuracy check or the bundle adjustment is calculated automatically. If the calibration is finished, the accuracy will be calculated for both the new and the currently used system parameters. This helps to determine whether the calibration has increased the system accuracy and if it should replace the camera parameters loaded at the beginning.

Temporal Stability
Camera parameters can be affected by external influences, leading to increasing measurement errors. By studying the stability of the system, the magnitude of a possible error can be determined.

Short-Term
For the investigation of the short-term stability of the optical system, both endoscopes are fixed above the calibration field (see Figure 1). After letting the endoscope cool down it was started and images pairs where recorded twice a second for one hour. The data is used to determine how long the system will need to reach a stable state. Using the method defined in section 2.2, the accuracy is calculated over four consecutive image pairs.
To investigate which parameter of the system is changing over time, multiple calibrations can be performed using the shortterm data. A single image calibration is using just one image pair to estimate the camera parameter change (compare (Elias et al., 2020)). Similarly to this a calibration using three consecutive image pairs was calculated. This results in a more stable calibration, which reduces the noise in the estimated parameters.

Long-Term
For a more long-term analysis, the system parameter acquired during the interactive calibration can be studied. The first method used is relying on the images of the accuracy check. For analyzing stability, one set of camera parameters is used for calculating the error on every available accuracy check data set. This makes it possible to say what error would occur, if a calibration were to be used at a different point in time. Disadvantage of this method is, that the images of the accuracy check are required.
Alternatively, the difference of two calibrations in the object space can be estimated using image simulation (Al-Durgham et al., 2018). Simulated 3D points are thereby projected into images using one set of camera parameters. With these image points, new 3D points are calculated using a different set of camera parameters. The measure for stability between the two calibrations is calculated by comparing the object points. For this paper one set of camera parameters consists of the IOPs for both cameras and the ROPs of the right camera. The following method is used: 1. Simulation of a regular grid of 3D points in the camera coordinate system at a distance of 100 mm 2. Projection of points into stereo image pair, including distortion using Parameter Set A 3. Undistortion of image points and forward intersection to object points using Parameter Set B 4. Computation of deviation of new coordinates from simulated 3D points in X, Y and Z

RESULTS AND DISCUSSION
In order to evaluate the method, more than 40 calibrations with the Einstein and 15 with the da Vinci were performed. For the following results section, 12 calibrations will be considered for each endoscope done on two different days.

ID
Einstein da Vinci 1-6 18.02.2022 11:20-11:32 22.11.2021 13:48-14:03 7-12 11.03.2022 10:11-10:28 10.03.2022 13:13-13:26 When 10 image pairs are captured at each predefined camera position, the entire calibration procedure can be completed in two to three minutes. This includes the image capturing and processing, as well as the bundle adjustment. If only 5 stereo image pairs are taken at each target camera position, the time necessary for calibration is reduced to 90 seconds. Positioning the endoscope correctly will take longer when the user is unfamiliar with the process.

Calibration and Accuracy Check
Since the goal of this paper is not the determination of the best camera model, the results of the bundle adjustment itself will not be analysed in detail. This is especially due to the fact that these values are less significant than the results of an independent accuracy check using additional images. One example of calibration results for each endoscope can be found in Table  2. Plotting the residuals after calibration does not show large remaining systematic errors for the endoscopes (Figure 6).  Table 2. Resulting camera parameters and ROPs with standard deviation for one exemplary calibration. Note that since the image resolution for the two endoscopes is different the parameters are not directly comparable. The rotation of relative orientation is given in the axis-angle representation (r1, r2, r3). The guided accuracy check is an integral part of the developed method as it informs the user about the need for a new calibration. Therefore, the object coordinate deviations are investigated further. Figure 7 shows a plot of the deviations of the accuracy checks for each calibration and the resulting P95 error reported to the user. For the da Vinci, the deviations are relative consistent, with a median of around 0.26 mm and a 95th percentile of 0.86 mm. The reason for the outlier of the 9th calibration could not be determined. For the Einstein the median is around 0.47 mm and P95 = 1.08 mm. A possible reason for the larger variations visible for this endoscope is given in the following section 3.2.1. As mentioned in 2.3, multiple images are captured at each predefined camera position for redundancy. Since one goal is to reduce calibration time, all 12 calibrations of each endoscope were repeated with 1 to 10 image pairs per position. In Figure  8, the result of the calibrations depending on the number of images is shown. For the da Vinci, an increased number of images does not show an increasing accuracy. In comparison, a slight improvement can be observed for the Einstein endoscope. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France

Short-Term
The data for the short-term analysis was captured as described in sections 2.4.1. The resulting P95 error calculated over four consecutive image pairs can be seen in Figure 9 for both of the endoscopes. As expected, the da Vinci shows a decreasing error after system start until stabilization. At around 15 min the system reaches a stable state with a variation of around 0.1 mm. In contrast, the Einstein shows an unexpected pattern repeating every 75 sec with a peak followed by a slow decline. The median span between the peaks and the valleys is 0.73 mm.  Figure 9. Accuracy for the used endoscopes over Minutes after system start. Distance to the calibration field for da Vinci was 112 mm and for Einstein 123 mm. Note: For the first 20 sec the error for Einstein is up to 5 mm, which is cut off for this plot.
A possible cause for this error would be a regular change of the camera parameters. This can be checked by performing several calibrations using three consecutive image pairs over this short-term data. A selection of parameters of these calibrations over time can be seen in Figure 10. The pattern is visible in the y-component of the principle point (y0), with a span between peaks and valleys of 0.5 px. In addition, it can be observed in the first coefficient of the tangential distortion for both cameras. The other camera parameters are not so clearly influenced by the effect. A possible cause for this effect could be the integrated heating of the endoscope tip, that is mentioned in the data sheet.

Long-Term
The stability over time will only be investigated for the Einstein, because more data is available for this endoscope. The parameters from all 12 calibrations were applied crosswise to all accuracy checks in order to investigate the reliability of the calibration method and the accuracy check, as well as the geometric stability of the endoscope. Figure 11 shows the accuracy in the object space (95th percentile) of the Einstein for all calibrations. It is visible that the accuracy, within the 15 min of each day that the calibrations where conducted, was always below 2 mm. However, applying the camera parameters of calibrations 1-6 to the accuracy checks of 7-12 or vice versa will result in errors above 5 mm. From this, it can be concluded that in the 3 weeks between the two days the camera parameters changed significantly. Beyond that, in the other calibrations conducted, there are also rare indications of large parameters changes within one hour.
By using image simulation the difference between the camera parameters in object space can also be calculated without using additional image data for the accuracy check. This allows the direct difference between calibrations to be analyzed without the influence of the image measurement during the accuracy check. The resulting plot shows a similar but also cleaner result ( Figure 12). This confirms that the accuracy check can indicate a change of camera parameters.

CONCLUSIONS
Conducting multiple calibrations over a 3 month period with the developed method has shown that the interface is helpful in guiding the user during the calibration. The method is simple to use, but still requires an introduction and some training. In its current state, the system is helpful for calibration and verification of accuracy during experiments in a research environment.
Currently, the decision of whether a calibration is necessary is based on a predefined threshold. Using the stability data, the recommendation to the user could also provide information about the possible improvement of a new calibration. One limitation is that in some cases the accuracy check is still sensitive to handling errors (e.g. endoscope is moved to fast).
The first look at the stability of the Einstein endoscope has shown that medical stereo endoscopes could need more fre-  quent calibrations than expected. Additional influences like the camera model changing at regular intervals show that there are still unanswered questions. Overall, it has been shown that further investigations to asses the stability over time are required. This is not just true for the Einstein, but in general for all medical stereo endoscopes which are to be used for 3D reconstruction and augmented reality. This investigation should also include endoscopes used in surgery and not just in a research environment.
The question asked in the beginning, on how often a calibration is necessary, can not be answered conclusively. But the guided accuracy check at least provides a method for checking before a surgery if a calibration is required. What has not been considered so far is the accuracy actually required for an application such as augmented reality. The real accuracy requirement for the intended applications has to be determined in further research.  Figure 12. Simulation of differences between calibrations for Einstein. RMSE of the z-coordinate in the camera coordinate system.