TRIMMING AND ROAD ORTHO IMAGING FOR NIGHT IMAGES BY ONBOARD HIGH SENSITIVITY CONSUMER GRADE DIGITAL CAMERAS

Various kinds of cameras have been utilizing as the onboard cameras in the construction of Intelligent Transport Systems. In recent years, utilization of the high sensitivity consumer grade digital cameras at night is attracting attention from the viewpoint of avoiding the effects of sunlight and congestion of people and cars. However, due to the image taken by the onboard cameras is a perspective projection image, the image is projected small at the far from car and the effect of the lens distortion will be greater at points far from the image center. In order to avoid these issues, the lower part of the projection image or a bird's-eye view image is used, but the imaging of the bonnet part caused by the car models and tilts of the cameras becomes a new issue. Furthermore, a bird's-eye view image at night has to be trimmed to coincide with the irradiation range because the irradiation distance and range of the headlights are limited. On the other hand, feature quantities such as vanishing points and feature points on the lane have been used for projective transformation from a perspective projection image to a bird's-eye view image, but the projective transformation based on feature quantities is an ill-posed problem. Therefore, this paper discusses a quantitative trimming method based on projective transformation that does not depend on feature quantities and coincide with the irradiation range of the headlights. * Corresponding author


INTRODUCTION
Image information gives an important role in the construction of Intelligent Transport Systems (ITS) (Aoki, 1999). Omnidirectional cameras (Sato et al., 2007), drive recorders (e.g., Noda et al., 2011), smart phones (Suga et al., 2017), and action cameras (Kameyama et al., 2019) are commonly used as imaging devices, but drive recorders are widely used for uaderstanding traffic environments, including road markings, and for estimating car positions during autonomous driving, which is one of the goals of ITS. In addition, the creation of road ortho image surface using night images, which avoid the effects of sunlight and congestion of people and cars, have been studied (Yamamoto et al., 2019), as a result of high sensitivity consumer grade digital cameras with wide dynamic ranges have been enabled to take high-definition images and 4K video at night.
However, when using these cameras (excluding omnidirectional cameras) as the onboard cameras, the images will be a perspective projection image. That means, the projected image will be small for distances far from the car, and the influence of the lens distortion will be greater at points far from the image center. To avoid these issues, the lower part of the projection image or a bird's-eye view image is used in ITS (e.g., Geiger, 2009). Furthermore, a bird's-eye view image is used to recognize road signs (e.g., Kemuriyama et al., 2013) and estimate car positions (Ziegler et.al., 2014) since the sizes and directions of road signs will be viewed almost constant by transforming the perspective projection image into a bird's-eye view image as if viewed from directly above the road. However, the birds'-eye view image also uses an image trimmed from the center to the bottom of the perspective projection image. Thus, it is necessary to avoid imaging the bonnet part, depending on the kind of the car and the tilts of the camera. On the other hand, the primary light sources for night imaging are car headlights whose irradiation distance is limited and irradiation range on a vertical screen is limited to the center of the screen (Tsukada et al., 2015). Therefore, the bird's-eye view images must be trimmed in accordance with the area illuminated by the headlights if night onboard images are used.
Incidentally, feature quantities such as vanishing points and characteristic points on the lane (Kemuriyama et al., 2013) and parallelism of lanes (Suga et al., 2017) have been used for projective transformation from a perspective projection image to a bird's-eye view image. However, the projective transformation based on feature quantities causes an ill-posed problem. Therefore, a projective transformation method that is independent of the feature quantities would be advantageous for the efficient construction of ITS. This paper discusses a quantitative trimming method for creating road ortho image at night based on the projective transformation that does not depend on the feature quantities. The proposed method is not significantly affected by lens distortion, has no risk of road surface obscuration, and coincides with the irradiation range of headlights.

High-sensitivity General-purpose Cameras
As shown in Table 1, the high sensitivity consumer grade digital camera used in this study has a normal ISO sensitivity of ISO100 to 102400, which can be expanded to ISO50 to 80 and ISO128000 to 409600. It can also capture video at 4K resolution. We verified the accuracy of this camera from a photogrammetric point of view, assuming that it would be used as an onboard camera for imaging at night. As a result, ISO102400, the highest value of normal ISO sensitivity, was determined to be the optimum ISO sensitivity, and it was confirmed that more accurate than the theoretical values be able achieved by the F-value and shutter speed were 8 and 1 / 500s respectively (Sugimori et al., 2021). The source images used in this study were defined by the parameters described above.

Night Images
In this study, the camera was onboarded at a height of approximately 2.04 m and depression angle of approximately 11.4° on the car, and 4K (3840 × 2160 pixels) video was captured at a frame rate of 30 fps and bit rate of 100 Mbps on the road surface while driving at a speed of approximately 30 km/h with low beam. The headlights were high-intensity discharge lights. Fig. 1 shows a high-sensitivity image of the road surface, which was imaged at night by the onboard camera (an image cut from a moving image). Based on NB values, which represent the scale of the light rating (Nakamura et al., 2004) and visual characteristics (Takeuchi, 1997), Fig. 2 is a trivalent image of Fig. 1, in which the brightness distribution of the space illuminated by the headlights is divided into three areas: a dark area at the edge of the road, dim area with weak light, and light area in the center of the road. The areas such as the sky where no light reaches and strongly reflective areas such as signs are excluded. It is understood from Fig. 2 that the headlight illumination distance is limited, and the illumination range is wider on the left side than on the right side, while being limited to the center of the image.

Oblique Images
As shown in Fig. 3, the perspective projection image taken from the onboard camera is an oblique image taken at the center of the camera O2 position by rotating the camera by an angle ω around the x-axis, maintaining a constant distance L between the camera center (O1) and a point P in the vertical image. In In this case, as the imaging range of the vertical image is a narrow area near the center of the oblique image and the bright area in the high-sensitivity image is near the center of the image as shown in Fig. 2, the effective useful range of the high-sensitivity image is assumed to be equivalent to the imaging area of the vertical image. Therefore, assuming that the oblique image is an image obtained through the projective transformation of the vertical image, the trimming range for the oblique image is the range where the oblique image is sandwiched between each y-coordinate after projective transformation corresponding to the top and bottom edges of the vertical image.

Setting Key Points
Common correspondence points are necessary for projective transformation between vertical and oblique images. In order to simplify a calculation, this study proposes a quantitative trimming method based on the theory of photogrammetry by setting four key points in the oblique image, as shown in Fig. 4. First, the y-axis direction in Fig.4 is the driving direction of the car, and the car is travelling in the left lane (lane width is approximately 3m). The image size is assumed to be 2Sx × 2Sy.  If the camera is calibrated in advance, aij, X0, Y0, Z0, and f in Equation (1) will be known values. However, even if the camera is not calibrated, if the camera parameters (focal length, image size, and sensor size), approximate tilt of the camera (ω), and height of the imaging point (Z0) can be estimated, the proposed method can be implemented under the assumption that the other parameters are 0.
Next, the selection of the key points and the calculation of their coordinates is presented. It is assumed that the camera has been calibrated in advance and that the Z-coordinates of each key point is 0.

Key Point A:
The X-coordinate (XA) of point A in Euclidean coordinates is XA = X0 − TX, assuming that the position of point A is TX to the left of the car (from the camera position). Next, if points A and B are defined on the same line parallel to the x-axis, the y-coordinates of point A and B will have equal values. Let's define that value as ya, the Ycoordinate of point A can be calculated from Equation (2), which is derived from the second part of Equation (1). Furthermore, the x-coordinate of point A can be calculated from the collinearity condition equation where Z A = 0 by using this value (YA).
Note that TX need not be an exact value, as long as it is a rough value that fits within the imaging space. The positions of points A and B in the image (i.e., y-coordinates of these points) are also arbitrary values. For example, let's define these point on the x-axis, y a (= yb) = 0.0 mm.

Key Point C:
Let's assume that point C is located on the vanishing line connecting the vanishing point and point A, according to the rule of perspective projection, the X-coordinate of point C is equal to the X-coordinate of point A (XC = XA). If point C is at the bottom of the image, the y-coordinate of point C will be -Sy. Therefore, the Y-coordinate of point C can be calculated by using − Sy instead of ya in Equation (2). The x-coordinate of point C is calculated from the collinearity condition equation with ZC = 0, using the Y-coordinate (YC) calculated as described above, as well as the x-coordinate of point A.

Key Point D:
Next, Let's define the position of point D at the bottom-right corner of the image, its photo coordinates (xd,yd) will be (Sx,−Sy). The Y-coordinates of points C and D are equal (YD = YC), as C and D are on the same line parallel to the x axis (at the bottom of the image). Their X-coordinates are calculated using the following equation, using the photo coordinates (xd,yd) in the collinearity condition equation and assuming ZD = 0.

Key Point B:
Finally, we consider point B. First, the ycoordinates and Y-coordinates of points A and B are equal (yb = ya and YB = YA) because points A and B are on the same line parallel to the x-axis. Let's assume that point B is on the vanishing line connecting the vanishing point to point D, following the rules of perspective projection, the X-coordinate of point B is equal to the X-coordinate of point D (XB = XD) as well as point C. Furthermore, the x-coordinate of point B is calculated using the collinearity condition equation using (XB,YB) obtained as described above, with Z B = 0. In this manner, the ground coordinates and photo coordinates of key points A to D can be calculated.

Projective Transformation and Trimming Range
As shown in the Fig. 6, it is necessary to know the photo coordinates of points C' and D' in the vertical image to transform the vertical image to the oblique image. The xcoordinates have the following relationship based on Fig. 6.
The hatched area in Fig. 6 represents the area where images are missing when converted to a vertical image.
In contrast, consider the y-coordinates of points C' and D'. Let's assume the real distance between points A and B as LX and the corresponding distance in the vertical image as lx, and similarly, the real distance between points A and C as LY and the corresponding distance in the vertical image as ly, the y-coordinate of points C' and D' is calculated using Equation (6) based on ly calculated in Equation (5). The y-coordinates of points C' and D' are the same, as points C and D are on the same line parallel to the x axis.
Here, the projective transformation coefficients (a1 to a8) are obtained from the following quadratic projective transformation formula, based on the photo coordinates of each of the four points before the projective transformation (points A, B, C', D') and after the projective transformation (A, B, C, D).
The trimming range in the proposed method is the range in which the oblique image is sandwiched by each y-coordinate after the projective transformation corresponding to the upper (y = Sy) and lower (y = −Sy) edges of the image before the projective transformation (vertical image). These coordinates are calculated in the second part of Equation (7) with x' = 0, where the y'-coordinates are calculated as Sy and −Sy, respectively. The white band near the center of Fig. 7 is the trimming range calculated as mentioned above.
The inverse transformation of Equation (7) converts an oblique image into a vertical image. In order to understand this relationship intuitively, Fig. 8 shows the inverse transformation of the oblique image in Fig. 7 including the pedestrian crossing. Fig. 8 shows that the both ends are cut so as not to include the hatched area in Fig. 6 and the image dimensions are made to correspond to the original image. These figures confirm the validity of the trimming method presented in this study.  Incidentally, let's define point C at the lower edge of the image, its y-coordinate (yc) will be −Sy; therefore, if ya is replaced with −Sy in Equation (2) and φ = κ = 0, a11 = 1, a12 = a13 = 0, a21 = 0, a22 = cosω, a23 = −sinω, a31 = 0, a32 = sinω, a33 = cosω, Zc = 0, then the result will not be affected by X0 or Y0. If X0 = Y0 = 0, then the Y-coordinate (YC) of point C can defined as follows.
Here, Z0 is the camera position (height), 2Sx × 2Sy is the image size, f is the focal length, and ω is the depression angle.
The ground coordinate YA for point A is calculated using the following equation obtained by setting Sy = 0 in Equation (8), The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France In contrast, the X-coordinates for points A and B can be obtained from Equation (3) as, Furthermore, assuming ya = yb = 0 in Equation (10), and rearranging Equation (5) using Equations (8) to (10), the following equation can be obtained.
It is understood from Equation (11) that the position (yc' = ly) of point C in the vertical image is not affected by the height of the camera, but is a function of the depression angle (ω) and focal length (f).

Trimming Range for Night Onboard Image
The trimming range for night onboard images should correspond to the irradiation range of the headlights. However, the initial inputs for determining the trimming range are the ycoordinate (ya) for line AB and TX value for point A. The TX value can be a rough value and does not affect the results significantly. The trimming range, however, is affected by the value of the y-coordinate for line AB. This is because the ycoordinate values for line AB in each image do not match when line AB is not on the x-axis, based on the effects of the depression angle. However, the center of the oblique image is shifted according to the y-coordinate because the y-coordinate for line AB adopted as a common value between both types of images in this study, which shifts the trimming range correspondingly. The proposed method uses this property to shift the input value of the y-coordinate for point A by 0.5 mm from the center of the trivalent image for the original image and adopts the y-coordinate when the number of pixels corresponding to the bright area in the acquired trimming range reaches the maximum value. The white band in Fig. 9 shows the trimming range for ya = −2.0 mm and the dashed line shows the trimming range for y = 0 mm. It is understood from Fig. 9 that the bright area is trimmed by the proposed method.  Incidentally, Fig. 9 shows results with camera calibration (f = 24.558 mm, Z0 = 2.370 m, ω = 12°28′19″), whereas Fig. 10 shows the trimming range without camera calibration, using a nominal focal length and results of a quick measurement (f = 24.0 mm, Z0 = 2.04 m, ω = 11.4°) for camera height and tilt. It is understood from Fig. 9 and 10 that the trimming range is not influenced significantly by camera calibration. In other words, it can be said that even if the camera is not calibrated, the proposed method will be feasible if the camera specifications (focal length, image size, sensor size) and approximate depression angle (ω) of the camera can be estimated.

Depression angle
Top edge Bottom edge

Top edge Bottom edge
Center of the image Figure 11. Trimming range.
In contrast, Equation (11) shows that the trimming range is affected by the focal length and depression angle. Fig. 11 shows the relationship between the focal length (wide angle: f = 24 mm, standard: f = 50 mm) and depression angle in the trimming range when calibration is not performed. The area between the top and bottom edges at each depression angle is the trimming range in Fig.11. It is understood that the trimming range increases in proportion to the angle of view and that the smaller the depression angle, the narrower the trimming range will be, because the camera will be closer to the ground horizontally, and the trimming range will be wider for a larger depression angle.

ROAD ORTHO IMAGE AT NIGHT
The xy-coordinates corresponding to the laser point are calculated from the point cloud data obtained from the laser scanner and the collinear conditional equation in Equation (1) since the MMS used in this study synchronises the onboard laser scanner and camera. The exterior orientation parameters (position and attitude) of the camera in each image are obtained from the POS of the MMS, and the interior orientation parameters are assumed to be calibrated in advance.  A grid DEM is generated in this study from the obtained point cloud to match the resolution of the final ortho product (pixel size: 1 cm), and the road ortho image is created by assigning the brightness value of the point corresponding to the pixel whose xy-coordinates are calculated from the collinear conditional equation. The images used for orthorectification are cut out at regular intervals from the video, but there will be differences in the cut-out intervals in the case of the driving speed is not constant. Therefore, an image mosaic is created with equal distances by the integrated distance is calculated from the car speed pulse or car position, and the images are thinned out until they reach the set distance. Furthermore, the luminance values of the images close to the car are given priority by preferentially using the images with high resolutions. Fig. 12 shows a road ortho image at night created following the above procedure. Although, the road ortho image were created for a single lane because this study aimed to develop a trimming method for the creation of road ortho image at night, it is confirmed the efficiency of trimming while creating ortho images. Fig. 13 shows the images of the white line of the crosswalk and a manhole in the ortho image at night, it is also confirmed that the quality of the image taken by the high sensitivity consumer grade digital cameras is excellent.

CONCLUSION
This paper presented a quantitative trimming approach for creating road ortho image at night, which was not affected much by lens distortion, had no risk of road surface obstruction, did not depend on the feature quantities, and coincide with the irradiation range of the headlights. The first remarkable point of this paper was the setting of key points. Specifically, points A, B and C, D were considered on the same line parallel to the xaxis and these points were arranged in a characteristic manner on the image, to calculate the ground coordinates and photo coordinates of each key point based on the theory of photogrammetry. The second point was the utilization of the geometric features in the perspective projection image, i.e., let's assume point C exist on the vanishing line connecting the vanishing point and point A, its X-coordinate would have the same value as the X-coordinate of point A. The same applied to point B. Therefore, the proposed method did not depend on the features such as the vanishing point position or white line information, which enabled the matching between the oblique image (onboard image) and the bird's-eye view image in the projective transformation, even for images without texture or feature points. Furthermore, the quantitative trimming of oblique images was confirmed.
Consequently, it was concluded that the efficiency of the trimming method which was proposed in this paper for creating road ortho images from night onboard images and the effectiveness of the high sensitivity images. Furthermore, it is expected the utilization of the high sensitivity cameras as the onboard cameras at night to avoid the effects of sunlight and congestion of people and vehicles, during the construction of ITS. However, it is still issues to resolve the positions of headlights, varying road conditions such as uphill and downhill drives, and create road ortho image at night for round-trip observations.  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France