PUPIL VISUAL TRACKING ALGORITHMS FOR AUTOMATED STATIC PERIMETRY SYSTEMS

Some diseases, for instance, a glaucoma, cause visual field defects. For the timely diagnostics of such defects, various methods are used. One of the state-of-the-art diagnostic methods is automated static perimetry. The method of static perimetry consists in the light sensitivity determination in different parts of the visual field using stationary objects of variable luminosity. When scanning the visual field in this way, an important factor is the control of gaze fixation at the fixation point. The greatest accuracy in determining the gaze fixation position is achieved by the method of the pupil visual tracking using a video camera. In this paper, four groups of visual tracking algorithms are considered: segmentation-based methods, correlation methods, methods based on optical flow and on weighted average. An experimental comparison of these methods was carried out using the base of video recordings obtained in the automatic static perimetry apparatus. On these videos the ground truth tracks of pupil were marked. The comparison was conducted according to two criteria: center location error and tracking length. It is shown that only the weighted average method has an acceptable tracking length. * Corresponding author


INTRODUCTION
One of the state-of-the-art methods of medical examination of visual fields is automated static perimetry (Delgado et al, 2002). The essence of this examination is that the patient is presented with luminous point stimuli in different zones of the visual field. Evaluation is whether the patient sees stimuli in all areas of the visual field. In order to know which position of the visual field the stimulus locates, it is necessary to have information about the gaze direction and to establish the fact of its fixation at the fixation point. It is possible to control the gaze fixation through video recording of the eye using a video camera installed inside the automatic static perimetry apparatus. The typical images of human eye filmed by used automated static perimetry apparatus are shown on Figure 1. In order to determine the gaze direction, methods of pupil tracking on video are used. At present, various tracking algorithms are known, and the issues of their use in tasks of automatic static perimetry have been little studied. Hence, the issue of selection the pupil tracking algorithm for automated static perimetry systems is relevant. In this paper, four groups of visual tracking algorithms are considered: segmentation-based methods, correlation methods, methods based on optical flow and on weighted average.

Segmentation-based methods
Segmentation-based methods generate a label matrix from the eye image (Gonzalez and Woods, 2008 A large number of segmentation methods are known, the survey on image segmentation techniques is carried out by Zaitoun and Aqel (2015). In this paper, the main ideas of multi-threshold segmentation described by Babayan, Skonnikov and Trofimov (2020), are realized by Otsu method (Otsu, 1979). The label matrix obtained as a result of segmentation is divided into binary masks. These binary images are processed using morphological methods based on erosion and dilation: cleaning and closing (Gonzalez, Woods and Eddins, 2020). Connected regions are selected on the processed binary masks. A set of parameters is calculated for each region: centroid coordinates, area, width, height, perimeter length. Then the regions are filtered: regions with atypical parameter values are discarded. One of the remaining regions with the smallest perimeter-area ratio, i.e., the one most similar to a circle by this criterion, is selected. The centroid coordinates of the selected region are considered the coordinates of the pupil. Figure 2 shows as an example the main stages of processing by the segmentation method. Analysis of this figure shows, that as a result of processing, the object corresponding to the pupil is correctly identified. In the example shown, the pupil is not fully segmented due to the effect of the flare. Therefore, the selected object has a nonelliptical shape. Besides that, other segments of similar area are also present on the label matrix.

Correlation methods
Correlation methods are based on comparing different parts of the source image with a predefined reference. The general principles of correlation algorithms can be described as follows (Yilmaz, Javed and Shah, 2006 where I and R are the elements of the search area and the reference, respectively. x y x y = . (4) In correlation tracking methods, the reference can be captured manually in the first frame or set a priori. In order to automate the tracking process several references were set a priori. The pupil position was determined using the one reference appropriate the lowest value of the criterion function minimum. Figure 3 shows a criterion function typical for pupil tracking in the automatic perimetry apparatus.

Optical flow
One of the frequently used groups of tracking methods are methods based on the optical flow calculation (Zhao, Shi, Chen, Li and Wang, 2015). The method based on optical flow was implemented as follows. First, the optical flow was calculated for the current frame. In this paper, three methods have been tested to calculate the dense optical flow: Lucas-Kanade derivative of Gaussian method, Horn-Schunck method (Barron, Fleet, Beauchemin, Burkit, 1992) and Farneback method (Farnebäck, 2003). Preliminary studies have shown that the results obtained by the Farneback method are well grouped into two separate clustersthe object and the background, while the other methods provide too fuzzy separation between the clusters. Therefore, this method is hereinafter referred as the calculation of the optical flow.
The result of calculating the optical flow can be represented as a data set consisting of displacements in two coordinates dx and dy . The obtained two-dimensional set of displacements was clustered. The k-means (Arthur, Vassilvitskii, 2006) and DBSCAN (Ester, Kriegel, Sander, Xiaowei, 1996) clustering algorithms were tested. According to the results of the studies given below, the best data clustering was achieved using the DBSCAN method. Figure 4 shows the result of points clustering, representing a displacements in two coordinates for different pixels using the DBSCAN method. For clarity of displaying a large number of points in this figure, the data was decimated. Here, the points recognized as noise are displayed in red, and points belonging to the two found clusters are displayed in blue and green. The one of the obtained clusters corresponding to the pupil was selected. Several criteria were used to determine which cluster corresponds to the pupil and which to the background. These include the number of cluster elements N (that is, the area of the segment in image coordinates) and the average speed of movement in the cluster v . The number of elements of the cluster marked in green in Figure 3 is more than 80% of all image pixels, and the average velocities within this cluster in both coordinates are close to zero. According to the specified criteria, it was determined that this cluster 0 С corresponds to the background. The only remaining cluster P С , that is, the cluster marked in blue, was assigned to the pupil. Pupil position was determined using two ways. The first way is to calculate the centroid coordinates of the selected cluster, as it was done in the tracking method based on brightness segmentation. The second way is to add the average displacement of the current cluster to the pupil coordinates from the last frame. It should be noted that the second method is characterized by the accumulation of errors. In this work, if there is a jerky movement in the frame, the first method was usedthe calculation of the average position of pixels belonging to the cluster P C : where n x and n ycoordinates of the pixel with the ordinal number n . A jerky movement v was determined by the excess of the average speed of this cluster of a certain empirically selected threshold  .
If there is no sharp movement v   in the frame number f n or there is no cluster P C assigned to the pupil, the second method was used: correction of the previously determined position of the pupil by the displacements dx and dy obtained during the calculation of the optical flow: x n x n dx n y n x n dy n The values of dx and dy were determined similarly to formula (5) by averaging the elements of dense optical flow n dx and n dy over region P C , and the average velocity v was determined as In cases where the cluster P C was not determined, averaging was performed over the entire image area.

Weighted average
The fourth method is based on weighted average values (van Assen, Egmont-Petersen, Reiber, 2002).
The values p and q themselves are not estimates of pupil position P x and P y . In order to determine the position of the pupil, an additional transformation must be performed. In the simplest case, the dependences P () xp and P () yq have a linear form: The adjustment factors X k and X b are determined experimentally during calibration. These values can be found by solving a system of two equations, although for a more accurate calculation it is advisable to use methods that allow for a larger number of points, for example, RANSAC (Fischler, Bolles, 1981): relative to the border of the image I may not be taken into account separately if calculations by formulas (11) and (12) are performed in the coordinates of the original image, and not in the coordinates of the region of interest. Figures 5 (a) and (b) show two examples of different inverted regions of interest, in which the obtained pupil center coordinates estimation results are marked with cross markers. In this figure, for clarity, contrast was performed according to formula (1), although this operation is not required to determine the center of the pupil. In this figure, it is shown that the marker position, determined by the weighted average method, is not the exact center of the pupil, but is located within the pupil.

EXPERIMENTAL RESULTS
The described algorithms were tested on eight video recordings of the human eye in an automated static perimetry apparatus, each 600 frames long. Ground truth tracks of the pupil are marked on each of the eight videos in semi-automatic mode. Taking into account the tracking quality criteria described by Dutta et al. (2019), two pupil tracking quality assessments were calculated; mean center location error  during successful tracking (prior to tracking failure) and mean tracking length f max n .
Mean center location error  was calculated as average where the distance f () rn at the frame number f n was calculated as follows: The situations when the position of the pupil center was not established or the current error exceeded the typical radius of the pupil of 50 pixels was taken as tracking failure.
The experimental studies result of the four considered pupil tracking algorithms quality are shown in Table 1

CONCLUSIONS
Based on the experiment results, the following conclusions are drawn.
The segmentation-based method allows obtaining fairly accurate pupil center coordinates in cases where the pupil mask was extracted correctly. However, error often occurs at any of several steps in a segmentation-based algorithm. As a result, the region corresponding to the pupil is incorrectly determined. In this regard, most of the results obtained by segmentation methods are incorrect. Correlation methods also often give a large error. There are two reasons for this. First, the real images of the human eye significantly differ from the references chosen a priori. An increase in the number of references leads to an unacceptable increase in computational capability. Secondly, the eye appearance changes significantly during the checkup process, and these changes lead to the tracking failure. Methods based on optical flow provide high accuracy in measuring the pupil coordinates, but only under two conditions. Firstly, the optical flow determination is possible only at the moments of eye movement; this method is not applicable to static video. Secondly, facial movements introduce a significant error, and strict face fixation is difficult in real conditions. In this regard, this method can only be considered as an addition to another one. The method based on the weighted average calculation provides the most stable results, but the measurement accuracy does not allow to estimate the accurate position of the pupil. At the same time, for the tasks of automated static perimetry, the information that the gaze is fixed in a certain zone is enough.
This can be explained as follows. The beginning of the gaze movement indicates that a new stimulus is visible and the eye begins to change its direction. After stopping the eye movement, it can be assumed that the gaze is fixed at the point corresponding to the stimulus, since there are no other objects in the field of view, and when the current stimulus was appeared, the eye began to move in its direction. If the gaze does not move in the appropriate direction when the stimulus position is changed, then we can conclude that the visual field has a defect in this place. The starting of eye movement indicates a normal field of view at corresponding point. Thus, according to the results of experimental studies, it was found that the method based on the weighted average is best suited for pupil visual tracking in automated static perimetry systems.