EYE RECOGNITION SYSTEM TO PREVENT ACCIDENTS ON THE ROAD

: Today, possibilities of artificial intelligence allow us to see the emergence of autonomous cars. However, there are still many problems in this area at present. Often, such vehicles are "too slow to think", are not able to reliably process data from video cameras in the event of reflections, glare, and there are also questions about the safety of such driving in difficult weather conditions or in heavy traffic. At the same time, the human factor plays a major role in accidents of driven vehicles. Many accidents involve driver fatigue, distraction, or even falling asleep. At the same time, it is potentially possible to monitor the state of a person behind the wheel by a video sequence received from a camera installed in the car's interior and registering the driver's face in video sequence. In this paper, the existing databases of images of faces and eyes are considered, and an algorithm is presented that detects the state of closed eyes based on Haar detectors and convolutional neural networks.


INTRODUCTION
Now days one of the main applications for artificial intelligence, along with natural language processing and reinforcement learning, is computer vision. Indeed, a person receives about 70% of information through the organs of vision, so the appearance of many applications with technical vision is obvious.
At the same time, one of the areas of application of computer vision algorithms is the detection, recognition and analysis of a person directly. Most often, this task comes down to detecting a face in the image and comparing this face with faces from the existing database. However, in some tasks, recognition of a specific person may not be required. In particular, the faces of different people have many of the same features. In this case, a person's face can be used to detect parts of the face, such as nose, mouth, eyes, as well as to determine emotions expressed on the face.
The task of highlighting the eyes, in turn, is relevant not only in various monitoring systems (distraction during classes, averting eyes during an online exam, etc.), but also in the analysis of driver states. According to statistics, the majority of road traffic accidents are related to the human factor, and about 20% of all cases are associated with falling asleep while driving (Kamberova, 2017).
At the same time, modern algorithms for pattern recognition (Andriyanov, 2020;Choy 2020) allow, based on the use of convolutional neural networks (CNNs) and deep learning (DL), to obtain sufficiently convincing characteristics in terms of recognition accuracy. Together with the methods of object detection (Andriyanov, 2019;Beal, 2020;Yu, 2021), it seems possible to construct an effective algorithm that will recognize the states of fatigue and sleep by the video sequence of the driver's face. Based on the results of such recognition, it is possible to carry out some kind of counter-effect on the driver, for example, by generating a loud sound signal.

RELATED WORKS
Fatigue and falling asleep of drivers is the main cause of road accidents. This topic is really so relevant that car companies, together with scientists, are creating various systems that can analyze driving habits, and some even analyze brain waves and vital functions of the driver. Most of these algorithms are developed using machine learning. All of them can be classified into 3 categories.
Algorithms of the first category set the main goal to find changes in the behavior of a vehicle. An example of such an algorithm is described in (McDonald, 2018). The authors obtained a model that reveals anomalies in steering wheel deflection, vehicle speed and gas pedal position. The analysis of such anomalies is performed using a Bayesian network, the output of which decides whether the driver is sleepy or tired. The algorithm showed a low false alarm probability compared to methods (Johnson, 1998) predicting fatigue by assessing eyelid movements. It should be noted that the algorithm makes the decision as a result of the strong context of the situation, due to which false positives were reduced.
The second category of algorithms processes the vital signs of drivers, brain waves and electroencephalogram (EEG) readings. Based on the analysis of such time series, a forecast of the future state is made. Work (Wei, 2018) is devoted to the comparison of various EEG devices. The main result of the study was the conclusion about the possibility of using new convenient devices. However, the EEG itself does not provide an accurate identification of the stages of drowsiness. Work (Kartsch, 2018) supplements the studies of EEG signals with sensors of inertial devices. As a result of such aggregation, very high metrics are obtained, up to 95% accurate sleep detection. But the authors of the work (Tateno, 2018) proposed to monitor the heart rate and respiration of a person. However, the use of additional devices while driving is not currently seen as a promising direction.
Finally, the last group of methods is computer vision-based methods. It is important to note that the driver's facial features change a lot when he gets tired, which allowed the development of a convolutional neural network for analyzing the driver's condition. Closure of the eyes was proposed to determine by measuring the angle of curvature of the eyelid (Tayab, 2019). The achieved accuracy is up to 95%, but high-quality shooting conditions and high image resolution are required. In (Shakeel, 2019), the application of the MobileNet-SSD architecture with training on 350 images (custom approach) was investigated. The main advantage was the ability to use the system on Android, on the other hand, the accuracy was about 80%. Additional studies (Celona, 2018) significantly complicate the processing device, since they analyze the eyes, mouth, and head posture.
Thus, in our opinion, an interesting approach is using computer vision, which requires an increase in metrics and an increase in productivity.

METHODS AND DATASETS FOR FACE AND EYE DETECTION AND RECOGNITION
When creating a face recognition system, among the set of algorithms, one can single out the general structure of the process (Kolomiets, 2014), shown in Figure 1:  (Kolomiets, 2014).
At the final stage, the found feature vectors are compared with the available features in the database to find the most likely similarity between them for further identification of a person, it is known that correlation filters (CF), convolutional neural network (CNN), and the k-nearest neighbors method (K-NN) effectively solve this problem.
Compared to other biometric human identification systems, such as eye, iris, or fingerprint recognition, facial recognition is not the most efficient or reliable. Moreover, when identifying a face in a natural environment, there are many problems associated with different lighting, facial expressions, age or dynamic background. Therefore, the review will focus on more advanced methods of face recognition using various databases. All considered methods can be classified according to the used features of objects into three groups (Kuznetsov, 2020): 1. Structural approach -describes objects as a system consisting of many interconnected elements ( 3. Hybrid (Cluster) approach combines structural and holistic approaches to improve recognition accuracy.
Among the hybrid methods, it should highlight the hybrid CNN-LSTM-ELM system, invented by a group of Chinese scientists (Sun, 2018), which perfectly recognizes human activity (HAR). Figure 2 shows a visual diagram of this process.   Thus, it was decided to use the Viola-Jones algorithms and Haar detectors to extract faces in images from face databases, followed by geometric selection of eyes in the face area, as well as images with eyes directly.

TRAINING A NEURAL NETWORK USING READY-MADE DATASETS
Since face recognition task has been modified into the problem of recognizing the state of the eyes, a logical change in the scheme shown in Figure 1, became the diagram shown in Figure  5.

Figure 5. Scheme of the eye condition analysis
It can be seen from the presented diagram that it works for one eye. In fact, 2 pre-trained models have been allocated, one for the left eye and the other for the right eye. CNNs for such models were trained separately. From the available database for training and image testing, a sample was created containing 10,000 images, of which 2,000 are a closed right eye for training, 2,000 are a closed left eye for training, 2,000 were an open right eye for training, 2,000 are open left eye for training, and 500 images of each of the four states described earlier for testing. The bases considered earlier were used, and the resulting images of the eyes were converted to 28x28 pixels. In both cases, training took place on the basis of its own neural network with 5 layers of 128 neurons each and with the training transfer of the VGG-19 network (Koonce, 2021). Figure 6 shows the architecture of the VGG-19 neural network.   The analysis of the presented dependencies shows that it is advisable to use models trained using transfer learning, since the chosen architecture of the five-layer network is significantly inferior to the VGG-19 architecture. At the same time, as applied to the test sample, the characteristics of the share of correct recognitions have somewhat decreased. The results are presented in Table 1.
where TP is the number of correctly recognized images of closed eyes; FN is the number of closed-eye images recognized as open-eyed images.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-2/W1-2021 4th Int. Worksh. on "Photogrammetric & computer vision techniques for video surveillance, biometrics and biomedicine", 26-28 April 2021, Moscow, Russia As can be seen from the Table 2, the probability of a skip target error for networks with the VGG-19 architecture is approximately 2 -2.5 times less than for a network trained from scratch. However, a decrease in this indicator is required by an order of magnitude or even more.

RESULTS OF A VIDEO STREAM PROCESSING
Testing the operation of the algorithm shown in the diagram in Figure 5 was performed with a Bluesonic webcam (Full HD, IR illumination) in daytime conditions. In this case, the video sequence of images was processed with a slight delay. Figures  9-11 show examples of determining different eye conditions based on the VGG-19 model. However, the results of processing on the images from this camera are generally comparable to the results on the test sample using VGG-19 and are presented in Both 0.84 Table 3. Proportion of correct recognition based on data from a video camera Obviously, the VGG-19 model is more robust and better able to generalize data. Further improvement in performance can be achieved using data from a specific video camera during training.

CONCLUSION
In this paper, the main algorithms of computer vision for working with faces were considered, and several datasets with such images were presented. A comparative study was carried out on the results of recognizing the states of open and closed eyes on the face, which showed the need to use transfer learning when solving this problem. The obtained characteristics for a network based on the VGG-19 architecture are significantly superior to a simple five-layer convolutional neural network trained from scratch. The obtained characteristics in terms of the share of errors of the first kind (skip target) currently do not allow to unambiguously recommend the algorithm for practical application, however, due to the expansion of the training sample and the study of other architectures in the future, it seems to reduce the errors several times compared to the 9% presented in the work. The video processing algorithm also showed that the model based on the VGG-19 architecture remains stable with only a slight deterioration in the recognition quality in terms of accuracy.