ROLL-SENSITIVE ONLINE CAMERA ORIENTATION DETERMINATION ON THE STRUCTURED ROAD

Online camera calibration technology can estimate the pose of the camera onboard in real time, playing an important role in many fields such as HD map production and autonomous vehicles. Some researchers use one vanishing point (VP) to calculate the pitch and yaw angle of the onboard camera. However, this method assumes that the roll angle is zero, which is impractical because of the inevitable installation error. This paper proposes a novel online camera orientation determination method based on a longitudinal vanishing point without the zero-roll hypothesis. The orientation of the camera is determined in two steps: calculating the pitch and yaw angles according to vanishing point theory, and then obtaining the roll angle with lane widths constraint which is modeled as an optimization problem. To verify the effectiveness of our algorithm, we evaluated it on the nuScenes dataset. As a result, the rotation error of the roll and pitch angle can achieve 0.154° and 0.116° respectively. Also, we deployed our method in the “Tuyou”, an autonomous vehicle developed by Wuhan University, and then tested it in the urban structured road. Our proposed method can reconstruct the ground space accurately compared with previous methods with zero-roll hypothesis.


INTRODUCTION
As cameras are widely equipped on mobile platforms like autonomous vehicles and robots, accurate extrinsic calibration becomes increasingly important for vision-based algorithms. Traditional manual calibration methods tend to use control points in the calibration field with special targets. However, these methods are impractical in outdoor scenes where cameras should be automatically calibrated online. Due to the ups and downs of the road, the poses of onboard cameras are changing continuously during driving, which introduces the errors related to the pose of the camera and influences the robustness of vision-based applications, such as lane keeping assist and lane departure warning.
Some researchers utilize VO (Vision Odometry) to measure the relative orientation change of the camera (Jeong and Kim, 2016). Based on VO, other researchers obtain the relative pose between camera and other onboard sensors whose poses are known, via hand-eye calibration (Tsai and Lenz, 1989;Wang et al, 2019). These methods require a large amount of computation and heavily rely on texture information. The main disadvantage of these methods is that they can only obtain relative poses between two adjacent frames. If one frame is lost, the calibration system is exposed to the risk of invalidation.
Three orthogonal vanishing points in the image can be used to calculate the intrinsic and extrinsic parameters of the camera (Hartley and Zisserman, 2003;Orghidan et al, 2012). However, for a camera mounted in the front of the vehicle, it is difficult to obtain three stable orthogonal vanishing points. On the structured road, the intersection of lanes in image space provides a stable * Corresponding author longitudinal vanishing point. While another two orthogonal VPs, namely, the horizontal vanishing point and the vertical vanishing point, often intersect at infinity, which suffer numerical instability. Moreover, on the structured road, it is impractical to find three orthogonal VPs all the time. Therefore, many researchers only use the longitudinal VP to calculate the camera's pitch and yaw angle and assume that the roll angle is zero (Zhao et al, 2014;Lee and Zhou, 2016;Yang et al, 2016). However, this zero-roll hypothesis is inappropriate. On the one hand, the roll angle is not strictly zero due to the installation error. On the other hand, inevitably, the roll angle changes continuously during driving. This will directly affect the stability of autonomous vehicle applications such as lane keeping assist.
Researchers usually use inverse perspective mapping (IPM) to map image space into ground space (Bertozzi and Broggi, 1998;Aly, 2008;Oliveira et al, 2015) and obtain bird's eye view (BEV) image to measure the distance to the adjacent lane lines, in applications like lane keeping assist. In BEV, targets on the road's surface have the same scale as the real world. Generally, the widths of lanes on the structured road are equal. As a result, lanes in BEV usually have the same width. Besides, if the actual lanes' widths are known, the widths measured in the BEV should be equal to their actual values. However, the inappropriate zeroroll hypothesis will introduce errors. The error of the roll angle will deform the lanes' widths in BEV as shown in Figure 1. When the camera rotates anticlockwise around the optical axis of the camera, the lanes on the left become narrower and the right lanes become wider. The zero-roll hypothesis will bring more errors to vehicle lateral positioning.
This paper firstly deduces a formula for calculating the pitch and yaw angle based on VP without the zero-roll hypothesis. Then, a pipeline is proposed to estimate VP and measure lanes' widths. Finally, fixing the pitch and yaw angle, we further model the solution of the roll angle as an optimization problem with lane width constrain in BEV.

Problem Statement
As shown in Figure 3, there are mainly three coordinate systems in this paper: the road coordinate system ( , , ), the camera coordinate system ( , , ) and the image coordinate system ( , ). The origin of the camera coordinate system coincides with the origin of the road coordinate system. The camera is mounted at a height above the ground. The rotation matrix is represented by and the translation vector is represented by . Since the origin of the road coordinate system and the camera coordinate system overlaps, = . can be expressed as a function of the roll angle , pitch angle and yaw angle . In photogrammetry, the Euler angular order of the matrix is often roll-pitch-yaw. In this paper, the Euler angular order is adjusted to pitch-yaw-roll, to avoid the influence of the roll angle on the pitch and yaw angles. The matrix is defined as follows.
are the rotation matrices corresponding to each angle. In order to be consistent with the definition in most IPM literatures, this paper defines the pitch angle in the horizontal direction as 0, which differs from the value defined by Euler's angle by /2, so the has different forms.
For any given point = ( , , ) in space. The relationship between the road coordinates and image coordinates of can be described by the pinhole camera model.

Definition of Vanishing Point
As shown in Figure 4, the point on the line passing through the point and with direction can be expressed in the following formula.
Vanishing point is the point on the line at infinity (Hartley and Zisserman, 2003).
The image coordinate of VP is only related to the direction . For simplicity, let = . Also, since the axis is parallel to the direction of the road, let = (0,1,0) . Combining equation (3)(4), the VP can be expressed in equation (5).
In equation (8), the pitch and yaw angles are independent of the roll angle. During the derivation, only the order of Euler angles is specified, and no assumptions are made on the roll angle. In other words, the pitch and yaw angles can be obtained from equation (8) regardless of the change of the roll angle. This is important for the further calculation of the roll angle.

VANISHING POINT ESTIMATION AND LANE WIDTH MEASUREMENT
Up until now, the pitch and yaw angles of the camera can be calculated from the VP of the lane lines. Besides, as mentioned before, to further estimate the roll angle, the lane width is used as a constraint to solve the optimization problem. Therefore, it is necessary to detect the lane lines on the image, estimate the vanishing point, and measure lanes' widths.

Lane Detection and Vanishing Point Estimation
Recently, lane detection methods based on convolutional neural networks (CNN) have become mainstream and achieve state-of-  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France the-art performances (Chen et al, 2017;Zou et al, 2019;Qin et al, 2020). As shown in Figure 5, a self-developed CNN is used to get the segmentation mask of the input image. Then, following a series of post-processing processes including denoising (Xiao, J et al, 2018), contour detection (Suzuki, 1985), Douglas-Puke sampling, and center points extraction, we finally obtain lanes in the image and describe it as a piecewise linear model.
Then, we estimate VP based on Gaussian spheres (Collins and Weiss, 1990;Rother, 2002) using the lanes detected before. As shown in Figure 6, a Gaussian sphere is a unit sphere whose center is the camera's optical center. Each line in the image corresponds to a great circle on the Gaussian sphere. Two great circles of two parallel lines in space will intersect at a point on the Gaussian sphere. The ray from the sphere's center to the intersect is the vanishing direction (VD) (Barnard, 1983;Lee and Yoon, 2019) Equation (9) can be solved by singular value decomposition (SVD). Then VP's image coordinate = ( , ) can be calculated.

= (10)
Since lane lines are represented in the piecewise linear model, this method can also work on unstructured roads. The time complexity of this algorithm is ( ).

Roll-Sensitive Inverse Perspective Mapping
IPM is a technology which can map image coordinates ( , ) to road coordinates ( , ) . For simplicity, previous studies (Bertozzi and Broggi, 1998;Aly, 2008) usually assume that the roll angle is zero. However, such an assumption may bring nonnegligible errors to the subsequent lane width calculation when the roll angle is large.
According to equation (2), the relationship between ( , ) and ( , ) can be expressed by the following equation. Substituting into equation (11)

Lane Width Measurement
In Where, ( , ) is the distance from the point to the line where edge is located.

CALCULATE ROLL USING LANE WIDTH CONSTRAIT
In Section 2, the pitch and yaw angles are calibrated based on the VP. Then we need to estimate the roll angle. In BEV, the lane widths are influenced by the roll angle. When the camera rotates clockwise around the axis, the lanes on the left become wider and the right lanes become narrower. When the camera calibration parameters are accurate and the ground is approximately flat, the lanes in BEV obtained by IPM should match their scale in real world. This characteristic can be used to accomplish the calibration of the roll angle.

Roll Calibration Without Prior Lane Width
In structured roads, the widths of adjacent lanes are usually equal. This means that the widths of different lanes in the BEV should be equal and the variance of the widths of multiple lanes should be zero. Utilize this property, an optimization problem can be built. * = argmin ∑( ( ) − ̅ )  When fixing the pitch and yaw angle, the lane's width is a function of the roll angle .
Since the relationship between and is difficult to express explicitly, the above optimization problem is difficult to solve directly in the continuous numerical space. Therefore, we use a search-based algorithm to solve . Assume that obeys the normal distribution, i.e., ~ ( , ). According to the property of normal distribution, 99.7% of the values lie within three standard deviations of the mean, which is called as three-sigma rule. So, let ∈ [ − 3σ, + 3 ]. After limiting the upper and lower bounds of , if the resolution of is further specified, the value of can only be taken in a finite set of real numbers. Then, the brute-force searching method can be used to solve the problem.
The pipeline of solving the roll angle is shown in Figure 2. Firstly, using CNN and a series of post-processing processes, sorted lane lines = { 1 , 2 , … , } are obtained. According to equation (9)(10), the image coordinate of VP ( , ) is estimated. Based on the VP, the pitch and yaw angles ( , ) of the camera can be calculated according to equation (8). Fixing and , the optimal roll angle * is determined by searching which minimizes the variance of lane widths in BEV. As for how to calculate the lane widths, we first project lanes ℒ from image space to ground space according to equation (13), and then calculate the distance between adjacent lanes and calculate the variance from equation (15). This proposed method does not require scale information and only requires that there are at least two lanes in the image. The actual width of the lanes and the mounting height of the camera do not affect the solution of the roll angle.

Roll Calibration with Prior Lane Width
When the widths of lanes are known, the optimization model can be obtained by simply modifying equation (16)

EXPERIMENT
We quantitatively evaluated our algorithm on the nuScenes dataset (Caesar, H., 2020) which contains a large amount of synchronized camera and pose data. The onboard front camera collects images in urban scenarios including structured roads. The pose data is obtained from an accurate localization system that takes into account IMU, GPS, and HD LiDAR maps. We take the pose data as ground truth. The proposed method estimates the absolute orientation in road coordinate system. However, the orientation provided by the dataset is relative to the global coordinate system. As a result, the yaw angle is different in the most of time, and the pitch and roll angle will differ when driving along a slope or inclination. Therefore, we take the error of relative orientation (Geiger, A., 2012) as the metric to describe the accuracy of the calibration algorithm.
The rotation errors are shown in Figure 7 and their mean values are shown in Table 1. The error of the yaw angle is relatively larger than the other two, which is influenced by the fact that the horizontal location of the vanishing point is relatively sensitive to the lane centrelines extraction error. Also, the non-zero curvature of the lane line will introduce errors to VP estimation. We also tested the proposed method on "Tuyou", an autonomous vehicle that equips cameras mounted on the roof of the car (see Figure 8.). The focal length of the camera we used is 3.6 mm and the size of the image is 1920 x 1080. We tested our algorithms on real urban road scenarios, including main roads, highways, and intersections.
It's difficult to obtain the ground truth of the camera's orientation, especially in outdoor scenes without special targets. We use the proposed roll-sensitive IPM to reconstruct the ground and   compare the result with other online calibration methods with the zero-roll hypothesis. Applying IPM, the bird's-eye view can be easily generated. Figure 9(b) shows BEVs using the camera parameters calibrated by Yang's method (Yang et al. 2016) which assumes the roll angle is zero. Figure 9(c) demonstrates the result using the parameters calculated by our proposed method. Because of the installation error, our onboard camera did not strictly mount horizontally and the roll angle was about 2°. As shown in the figure, due to the inappropriate zero-roll assumption, the BEV generated by Yang's method is deformed systematically. On the contrary, by calibrating the roll angle, our method successfully reconstructs the ground space. It indicates that we can use the proposed method to rectify the installation error.
The total running time of the proposed algorithm is about 75ms where lane detection and corresponding post-processing pipeline take 40ms, camera calibration takes about 35ms. In camera calibration, the calculation of the roll angle takes up most of the running time, about 34.5ms. Here we choose = 0 and = 1°. And we set the resolution of roll angle when searching is 0.1°. The rapid running speed allows the proposed algorithm to run in real time onboard.

CONCLUSION
In this paper, we proposed a roll-sensitive online camera calibration method without the zero-roll hypothesis. Previous studies assume that the roll angle is zero, and calibrate the pitch and yaw angles by a longitudinal vanishing point. With lane width constrain, our proposed method can calculate the three Euler angles of the camera based on only one VP in the image. This method can be used to rectify the camera's installation angular errors and determine the orientation of the camera online. (a3) (a2) (a1) The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France