SYNTHETIC VISION SYSTEM CALIBRATION FOR CONFORM PROJECTION ON THE PILOT’S HEAD-UP DISPLAY

Situational awareness of the crew is critical for the safety of the air flight. Head-up display allows providing all required flight information in front of the pilot over the cockpit view visible through the cockpit’s front window. This device has been created for solving the problem of informational overload during piloting of an aircraft. While computer graphics such as scales and digital terrain model can be easily presented on the such display, errors in the Head-up display alignment for correct presenting of sensor data pose challenges. The main problem arises from the parallax between the pilot’s eyes and the position of the camera. This paper is focused on the development of an online calibration algorithm for conform projection of the 3D terrain and runway models on the pilot’s head-up display. The aim of our algorithm is to align the objects visible through the cockpit glass with their projections on the Head-up display. To improve the projection accuracy, we use an additional optical sensor installed on the aircraft. We combine classical photogrammetric techniques with modern deep learning approaches. Specifically, we use an object detection neural network model to find the runway area and align runway projection with its actual location. Secondly, we re-project the sensor’s image onto the 3D model of the terrain to eliminate errors caused by the parallax. We developed an environment simulator to evaluate our algorithm. Using the simulator we prepared a large training dataset. The dataset includes 2000 images of video sequences representing aircraft’s motion during takeoff, landing and taxi. The results of the evaluation are encouraging and demonstrate both qualitatively and quantitatively that the proposed algorithm is capable of precise alignment of the 3D models projected on a Head-up display.


INTRODUCTION
Situational awareness of the crew is critical for the safety of the air flight. Head-up Display (HUD) allows providing all required flight information in front of the pilot over the cockpit view visible through the cockpit's front window. This device has been created for solving the problem of informational overload during piloting of an aircraft. Head-up display eliminates the need for pilots to monitor both the surrounding area and numerous instruments in the cockpit of the aircraft. Modern head-up displays are capable of projecting both instrument scales and video sequences. The video sequence can be generated either by an external sensor, such as RGB or infra-red camera (Enhanced Vision System, EVS) located below the cockpit or by 3D graphic software simulating the underlying relief (Synthetic Vision System, SVS). The location of the sensor is presented in Figure 1. Processing of the EVS frame for effective presentation on the HUD received a lot of scholar attention recently (Howells, Brown, 2007, Mohideen et al., 2013, Kramer et al., 2011. Still, to the best of our knowledge, there is no research regarding elimination of the parallax between the pilot's eyebox and the EVS sensor pose. While the position of the virtual camera of the synthetic vision system can be calibrated to the pose of the pilot's head (pilot's eyebox), there is a significant parallax between the pilot's eyebox and the external thermal camera. The parallax effect is caused by the displacement of the pilot's head with respect of the camera's pose. It leads to discrepancies between the actual position of objects and the image projected on the HUD (Figure 1, right top).
Discrepancies resulting from a small parallax become a real problem during take-off and landing, when observed objects (runway, buildings, trees) are in the close proximity to the aircraft. Therefore reprojection of the EVS frame is required to avoid misplacement of object's contours on the head-up display.
To the best of our knowledge there is no published studies on  Figure 2. The proposed pipeline for re-projecting of the EVS frame.
re-projecting of the EVS frame for conform presentation on a head-up display. This paper is focused on the development of a calibration algorithm for conform projection of images from the enhanced vision sensor on the head-up display. We hypothesize that the parallax could be eliminated by projection of the EVS frame as a texture on the 3D model of the scene available from the synthetic vision system.
However, such re-projection could be used only if the position of the virtual camera is precisely aligned with the pose of the EVS sensor. In practice this is not always true due to errors in the position of the virtual camera caused by an inertial measurement unit. To avoid flatter of the re-projected image, we leverage an additional alignment step based on the deep neural network. The proposed pipeline is presented in Figure 2.
The proposed calibration algorithm was implemented in a prototype software and evaluated using an environment simulator. The results of the evaluation are encouraging and demonstrate that the proposed pipeline improves the alignment of object contours on the head-up display. The average contour distance after the alignment is 2.3 pixel that provides a comfortable operation during all stages of the flight. The proposed algorithm could be implemented using modern integrated modular avionics. Installation of the proposed algorithm as an additional module for the HUD display would improve the crew's situational awareness.
The rest of the paper is organized as follows. Firstly, we discuss modern research regarding pilot's situational awareness and head-up displays in Section 2. After that, we presents details of the proposed framework, YOLO-HUD network and the virtual camera pose estimation algorithm in Section 3. The qualitative and quantitative evaluation of the proposed algorithm is presented in Section 4.

Contributions
We present three key technical contributions: (1) a calibration algorithm for alignment of the contours of objects projected on the head-up display with the corresponding contours of real objects viewed by a pilot, (2) a YOLO-HUD neural network architecture for detection of the runway and horizon tilt estimation, (3) an evaluation of the proposed algorithm using an environment simulator.

Computer Vision for pilot's situational awareness
The processing of the EVS frame for an effective presentation on the head-up display received a lot of scholarly attention. In (Prinzel III et al., 2004) two different tunnel and guidance symbology concepts were compared. The results of experiments have suggested that considerations of tunnel format and guidance symbology interact with the type of SVS display. It has been shown that the dynamic crows feet tunnel concept was preferred for both the primary flight display and head-up display. Another promising area in the field of SVS / EVS has become the development of spatially-integrated display systems (head-worn or helmet-mounted display systems) . The effect of system delays on the utility, usability and acceptability of such systems was investigated. Research results demonstrated that the system latency must be less than 20 ms.
In the further work on this topic two piloted simulation studies were conducted  to evaluate the headworn system performance. The use of a head-worn display was evaluated for equivalent visual operations and compared to a visual concept and a head-down display concept. Then, symbology variations under different visibility conditions was evaluated. In (Arthur III et al., 2014) the possibility of using head-worn display instead of head-up display was investigated. The results showed that there were no statistical differences between these systems. Such results will allow providing the same safety and operational benefits as current HUD.
Another important task is the development of optical flow estimation methods. In (Baker et al., 2007) a collection of datasets for the evaluation of optical flow algorithms was presented. Applications for support the pilot in difficult situations have also been significantly developed. Flight Deck Interval Management (FIM)  seeks to enhance airport efficiency through the scheduling and management of aircraft-to-aircraft spacing at the runway threshold through precision spacing and onboard speed guidance. Experiments confirm that FIM can improve runway throughput by more precisely spacing aircraft and that SVS/EVS, coupled with FIM, may provide reduced aircraft separation. Pilot workload reduced and situational awareness and safety increased.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) The algorithm (Kniaz, 2014) based on EVS frame sequence analysis is proposed for an automatic recognition of a foreign object in a runway. The difference of orthophotos generated using the external orientation of the camera relative to the runway is used for foreign object recognition. In (Mohideen et al., 2013) a system to estimate the aircraft position and orientation relative to taxiway markings based on EVS to use as lateral guidance aid was proposed. The aircraft yaw angle and lateral offset from the slope of the taxiway center line and horizontal position of the vanishing line are estimated. This system can be used as guidance commands by the pilot for surface operation or as means for automatic nose wheel steering.

Head-up Projection Systems
Today more and more aircrafts have a head-up display that is included in the flight deck display suite. Displaying video on the HUD is a hard challenge because this display has to operate in a wide range of lighting conditions while performing its primary function of displaying flight information (Howells, Brown, 2007).
Currently, it is possible to use elements of artificial intelligence and deep learning to ensure flight safety. Such technology are used not only in the flight. For example, in (Abdi, Meddeb, 2017) a fast deep learning-based object detection approaches for identifying and recognizing road obstacle types and predicting complex traffic situations wer presented. This method is based on Augmented Reality Head-Up Display (AR-HUD) that can be used in aircraft. In (Langner et al., 2016) a system was presented that constantly monitors the level of attention of a driver of a vehicle. If the driver is inattentive and fails to recognize a threat, the assistance system produces a warning. The use of augmented reality technologies including HUD requires an accurate comparison of real and added elements. In order to do this, the proper calibration of the complete system should be performed.
In (Ballestin et al., 2019) the calibration process of an optical see-through device, based on a visual alignment method, was described. Recently, one such augmented reality system has been developed for a helicopter pilot assistance (Walko, Peinecke, 2020).

Camera Pose Estimation
Camera pose estimation has great importance for scene understanding and augmented reality. The work (Kehl et al., 2016) presents a 3D object detection method that uses regressed descriptors of locally-sampled RGB-D patches for 6D pose vote casting. The usage of deep learning algorithms for 6D pose estimation with RGB-D data has shown the state-of-the-art results on various datasets.
A new computationally effective and robust algorithm for external orientation based on positions of two known reference points and a gravity vector (Kniaz, 2016) allows to perform external orientation in limited visibility conditions. The algorithm based on optical flow estimation (Kniaz, 2018) provides high quality of presentation of the EVS video on a head-up display. The optical flow is estimated using ray tracing and convolutional neural network. The method provides a significant increase in the brightness of obstacles and reduces the intensity of non-informative areas.

Framework Overview
The aim of the proposed algorithm for enhanced vision system calibration is a conform projection of the EVS frame on the pilot's head-up display. The calibration process is twofold. Firstly, an off-line calibration is performed to estimate the external orientation of three sensors: (1) a enhanced vision system sensor exterior orientation PEV S , (2) a virtual camera corresponding to the position of the pilot's eye-box, (3) the synthetic vision system virtual camera exterior orientation PSV S (Figure 3).
The second step is the re-projection of an image captured by the EVS sensor to be conform with the geometry visible from the position of the pilot's eye-box. To perform the re-projection, we follow to the Algorithm 1. find corresponding pixel pevs in current EVS frame IEV S using XEV S 4 store pevs as texture TSV S pixel for P ∈ R 5 create current SVS frame ISV S using XSV S and R 6 draw current SVS frame ISV S with R textured with TSV S 7 end This procedure is performed using OpenGL graphic library. The OpenGL Shader pseudo-code of the re-projection procedure is presented as Algorithm 2.
Algorithm 2: HUD Reprojection Shader Pseudo-code Input: The cameras positions represented as V iewevs, V iewsvs matrices, cameras parameters as P rojectionevs, P rojectionsvs matrices, relief geometry V ertices, texture coordinates T exCoords for the relief geometry, and the current EV S frame as 2D texture Output: 1 Get ClipSpacexyzw from multiplying V ertices by V iewevs and P rojectionevs matrices; 2 Find N DCxy by dividing the ClipSpacexy by ClipSpacew; 3 Calculate U V evs = N DCxy + 1 · 0.5; 4 Get the current fragment color from EV S texture using U V evs as texture coordinates; To perform the re-projection, we calculate the UVevs coordinates for every fragment of relief geometry (Figure 3). After that, render the textured SVS relief for the virtual camera positioned at the pilot's eye-box.
When comparing SVS and EVS frames on HUD, it is assumed that virtual cameras have different vertical and horizontal positions. If the location of the two cameras relative to each other The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) and their parameters are known, then using the view and projection matrices it becomes possible to project an EVS frame. Assuming relief geometry is the same on both SVS rendered from EVS camera pose and EVS frame, finding corresponding points on two frames is the task of converting coordinates from world space to NDCevs using view and projection matrices of EVS camera. After transformation NDCevs to UVevs coordinates, with EVS frame as a texture and relief geometry rendering HUD frame is possible. Finding UVevs coordinates will perform in fragment shader using fragment position.

YOLO-HUD Convolutional Network
To eliminate errors in pose estimation of the SVS virtual camera, we perform an alignment of SVS and EVS frames based on the deep convolutional network. Our network is based on the YOLOv3 model and termed as YOLO-HUD. Our YOLO-HUD algorithm solves two tasks. Firstly, it detects the runway bounding box. Secondly, it estimates the tilt of the horizon. The details of our YOLO-HUD architecture are presented in Table 1.

Virtual Camera Pose Estimation
Our re-projection algorithm is threefold. Firstly, an off-line camera calibration is performed. During this step, we estimate the positions of a virtual camera of the SVS, the EVS sensor and the pilot's eye-box with respect to aircraft reference frame (Figure 3 for details). The off-line step is performed while the aircraft is on the surface of the runway. We use the runway markings as the system of the Ground Control Points (GCPs). The locations of the GCPs is extracted from EVS frame automatically. To estimate the locations of the GCPs on the SVS frame projected to the head-up display, the pilot is asked to move the virtual cursor on the HUD, while looking through it.
Using all collected observations, we preform a bundle adjustment to estimate the locations of three cameras: the SVS -PSV S , the EVS -PEV S , and the pilot's eye box -Peye.
The rest of the algorithm is performed online in realtime. For each frame IEV S acquired by the EVS, we estimate the center of the runway bounding box cevs ∈ R 2 using our YOLO-HUD model. After that, we estimate the location of the runway center on the SVS frame csvs ∈ R 2 . The difference of the two points in the image space is given by: We perform the linearization of the local angular erros in the camera alignment, to estimate the errors in the SVS camera external orientation. Specifically, we assume that: where ∆ψ is the difference in the yaw angle between the estimated orientation of SVS virtual camera and the ground truth orientation, ∆θ is the difference in the pitch angle. The difference in the roll angle ∆φ is estimated by the our YOLO-HUD model.
After the cameras are aligned, we perform the re-pojection of the EVS frame to the SVS 3D model using the Algorithm 1.

Environment Simulator and Dataset Generation
An environment simulator was created to generate image sequences for the training dataset for further training the developed neural network model. The dataset includes color image sequences, the ground truth runway bounding boxes and the ground truth roll values for each frame. Three stages of the flight were simulated: the landing, the aircraft taxi, and the takeoff. For each position of the virtual camera, two types of images were generated: the cockpit view of the scene (color) used as the background for the HUD modeling, the EVS sensor frame, the ground-truth segmentation. Examples from the dataset are presented in Figure 5.

EXPERIMENTS
The algorithm was evaluated using the generated dataset. The following section presents details on the learning of the YOLO-HUD and the perceptual and the quantitative evaluation of our algorithm.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) Cockpit View

Network Training
The YOLOv3 neural network (Redmon, Farhadi, 2018) training was performed with using synthetic data with manual marking. For generating synthetic images the digital terrain model and objects corresponding to the Sochi airport were used.
For training the neural network, we use pre-trained weights (Jocher, et al., 2019) without additional changing the network hyper parameters. The training was performed using power PC with Nvidia Geforce GTX 1080 8Gb graphics card and took about two hours.

Qualitative Evaluation
A perceptual evaluation of the quality of the EVS video was performed. A group of twelve volunteers took part in the percep-tual evaluation. Each volunteer was provided with eight pairs of video sequences length of 5 seconds length. The evaluation pair included the original EVS video sequence and the YOLO-HUD processed video sequence presented on the head-up display. Examples of the video pairs used for the validation are presented in Figure 4.
The volunteers have been asked to indicate which type of the video they prefer or to label the video as an inappropriate for the visual navigation. The results of the perceptual validation are presented in Table 2 The results presented in Table 2 shows that the processed video sequences have better quality (95% of volunteer's votes), with no YOLO-HUD processed sequences estimating as "unusable".

Quantitative Evaluation
We evaluate our algorithm quantitatively in terms of Intersection over Union (IoU) metric for runway detection and average runway contour distance. The independent test split of the dataset was used during the evaluation. The evaluation results are presented in Table 3.
The evaluation results of Table 3 reconfirm the outcome of perceptual assessment by the volunteers and prove the high performance of the developed algorithm.   Table 3. IoU metric and object contour alignment accuracy for different distances to the runway.

CONCLUSION
An online calibration algorithm for conform projection of the 3D terrain and runway models on the pilot's head-up display is developed. It provides accurate aligning of the objects visible through the cockpit glass with their projections on the head-up display. The accurate aligning is provided by deep learning based recognition of the observed 3D scene and re-projecting the sensor's frame onto the digital terrain 3D model from an aircraft synthetic vision system.
For the developed YOLO-HUD neural network training a special dataset was created with the aid of an environment simulator. The dataset includes 2000 images of video sequences representing aircraft's motion during takeoff, landing and taxi. The results of the developed technique evaluation demonstrated its high performance as by qualitative estimation by a group of volunteers as by quantitative assessment in terms of Intersection over Union metric for runway detection and average runway contour distance.