ENHANCEMENT OF FAST FACE DETECTION ALGORITHM BASED ON A CASCADE OF DECISION TREES

Face detection algorithm based on a cascade of ensembles of decision trees (CEDT) is presented. The new approach allows detecting faces other than the front position through the use of multiple classifiers. Each classifier is trained for a specific range of angles of the rotation head. The results showed a high rate of productivity for CEDT on images with standard size. The algorithm increases the area under the ROC-curve of 13% compared to a standard Viola-Jones face detection algorithm. Final realization of given algorithm consist of 5 different cascades for frontal/non-frontal faces. One more thing which we take from the simulation results is a low computational complexity of CEDT algorithm in comparison with standard Viola-Jones approach. This could prove important in the embedded system and mobile device industries because it can reduce the cost of hardware and make battery life longer.

The Viola-Jones algorithm is the classical face detection approach (Viola, 2001), (Tan, 2007) and (Chen, 2013).Viola and Jones proposed to use the signs based on Haar wavelets.They has introduced the two kinds of two rectangular, two kinds of three rectangular view and one four rectangular signs.The value of two rectangular features is the difference between the sum of the intensities of the pixels in a dark box and the sum of the intensities of pixels in a light box.The three rectangular sign sum of the intensities of pixels considered for two bright rectangles.Even for a small 3x3 pixel image, the number of features is essential (12 double rectangular features three 6-square and 4 four-square, for a total of 22 sign).For an image size of the 4x4 number of attributes increases to 136.If we consider the standard size of an image in 24x24 pixel, which is used for the training of face detector in most implementations of the algorithm Viola-Jones, the feature set will consist of 162,336 values.This detector is capable of processing images extremely (Viola, 2001).
Viola and Jones have proposed for their detection cascade structure consisting of units of layers in the form of strong classifiers (Viola, 2001).This structure allows quick cast a "not face" at the first stage, and the second stage they are calculating a few pairs of rectangular signs.For each stage chose the threshold level so that the relatively high to provide some minimum level of detection at relatively low requirements to the level of a false alarm.Thus, a cascade of rejects at each stage of increasingly sophisticated "not face" passing on all or nearly of the "face".
In this paper, the novel face detection algorithm is based on a cascade of ensembles of decision trees (CEDT).Our approach is a modification of the standard Viola-Jones algorithm with an imagescanning cascade of binary classifiers.If the image's area passes through all the stages of the cascade, it will be classified as an object of interest.Each binary classifier comprises an ensemble of decision trees, which compare the intensity of the pixels in a binary test of their internal nodes.The learning process consists of a procedure for constructing regression tree was based on the greedy algorithm.Most modern algorithms construct regression trees are greedy.The greedy algorithm creates trees from top to bottom by a recursive division of the training data and may be briefly described as follows:  selection the best separation (providing an extremum of a criterion);  separation of raw data into subsets;  recursive application of this procedure for each of the selected subsets.
Greedy algorithms have low complexity, good scalability, but have several disadvantages: a) regression tree is created slowly without returning to previous decisions; b) each step of the algorithm is locally optimal solution.It solution gives the maximum effect on the current step, without regard to impact on the overall solution.Greedy algorithms conduct an optimal separation of data.
To solve the problem that based on regression, we will use the optimized binary decision trees.This approach uses a comparison of pixel intensity as a binary test in its internal nodes.This strategy was proposed by Amit and Geman (Amit, 1997), and later successfully used by researchers and engineers.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W4, 2017 2nd International ISPRS Workshop on PSBB, 15-17 May 2017, Moscow, Russia A pixel intensity comparison binary test on image I is defined as: 1 2 1 2 0, ( ) ( ), ( ; , ) 1, , where ( ) l I l is the pixel intensity at location i l . 1 l and 1 l are normalized coordinates from the set [ 1; 1] [ 1; 1]      .It allows resizing binary tests, if necessary.Each terminal node of the tree contains the scalar which models the output value.
Viola and Jones have made object detection feasible in real applications.This is related to the fact that the system based on their algorithm can process the image faster than other approaches with similar results.Mobile devices have limited processing power.Mobile developers are interested in the development of faster detection.Developers are ready to sacrifice precision for the best detection processing speeds for the system to work with limited resources.CEDT algorithm is used to process images and video at high speed.This algorithm maintains the accuracy of the comparison.It allows the re-training algorithm to a new set of data.Also, it is able to classify individuals rotated at different angles relative to the vertical axis.The algorithm is invariant to rotation of the image plane of the screen by using at training multiple copies of the original image rotated by angles uniformly selected from the interval [0;2) and for small shifts.

THE FACE DETECTION ALGORITHM
The face detection algorithm based on CEDT is trained on the following dataset: {( , , ) : , where s v is the ground truth for image , is a factor of importance (weight).For example, in the case of binary classification, ground truths have two class labels: positive and negative samples are annotated with +1, -1, respectively.Weights allow ranking these samples according to their importance.The binary test in each node of the tree is chosen in a way to minimize the weighted mean squared error obtained after splitting the input data by the test.The minimization is made according to the following equation: , where C0 and C1 are groups of training samples for which the results of the binary test are equal to 0 and 1, respectively.Scalars 0 v and 1 v are weighted mean values for ground truths in C0 and C1, respectively.
Since the number of comparisons pixel intensity is very large, while optimizing each internal node is created only a small portion of the sample by repeated two coordinates from a uniform distribution on the square [ 1 The training data are recursively grouped together so long until the terminating condition is satisfied.The depth of the trees is restricted to minimize the training time, to increase the processing speed and according to memory requirements.The output value for every terminal node is equal to the weighted mean value for ground truth that is obtained in a training process.
If you limit the depth of the tree through D and considered binary tests in each internal node, as a result the training time will The single decision tree usually provides the medium accuracy.On the other hand, the ensemble of trees can achieve impressive results.The Gentle-Boost algorithm (the modification of widely used AdaBoost) is used to create the discriminative ensemble fitting the decision tree to an appropriate least squares problem (Riopka, 2003).
The following steps are required to generate an ensemble of K trees using training dataset {(I , c ) : s 1, 2, ..., } s s S  : 1. Choosing the start weights for each image and its class label where P and N are the total numbers of positive and negative samples, respectively.During runtime, outputs of all trees in the ensemble are summed and the resulting value is thresholded to obtain the class label.The detection rate is adjusted by varying the ensemble output threshold for every stage of detectors.Each stage uses the soft output ("confidence") of the previous stage as additional information to improve its discriminability.This is achieved by progressively accumulating the outputs of all classification stages in the cascade.The detector is resistant to small changes in the position and scale around each region of interest may be a few frames.These overlapping detections are combined as a result of post-processing.Two detection combined if the overlap there between is more than 30%: Two datasets are required for the detector training: a dataset with positive samples that contain faces and a dataset with negative samples that do not contain faces.Database AFLW visualization shows on Fig. 2, that consists of 14 032 annotated faces is used for frontal detector training.In order to improve the algorithm performance, the original images from the database are transformed in different ways.Figure 1.The final scheme of face detection algorithm using CEDT approach

SIMILATION RESULTS
Database AFW is chosen for testing and analyzing the detector characteristics.This database contains 205 images with 468 annotated faces rotated on different degrees (Zhu, X., 2012).
ROC-curves for different modules of CEDT detector are presented in Fig. 2a.The areas under ROC-curves are equal to 0.932 (CEDT frontal), 0.856 (CEDT left 30-60), 0.852 (CEDT right 30-60), 0.830 (CEDT left 60-90), 0.852 (CEDT right 60-90).In Fig. 2b the areas under ROC-curves are equal to 0.830 (Viola-Jones), 0.932 (CEDT frontal), 0.951 (CEDT full).Thus, the proposed CEDT algorithm increases the area under ROC-curve by 13% in comparison to Viola-Jones algorithm.The experiment was performed on Python and C++ programming languages and PC platform with the Intel Core i7-4770 3,40 GHz processor.The average time of CEDT face detection on the 1024×768 pixels image resolution and at the minimum window size of 40×40 pixels is 0.19 seconds.At each iteration the frame size increased produced by 20% of the previous size.We have compared the proposed approach with the Viola-Jones detector from OpenCV library.The average time of the detector is 0.26 seconds under the same settings.The final time of the algorithm is not significantly increased in parallel operation of CEDT detectors.This allows the detection system to use 3-5 detectors for the detection of faces with different orientations relative to the camera.
Visual comparison of face detection quality between Viola-Jones algorithm and CEDT approach is shown in Fig. 3.This picture shows the practical improvement of face detector quality which can achieve without increasing the computational complexity.

CONCLUSIONS
The proposed algorithm based on CEDT increases the area under ROC-curve by 13% in comparison to standard Viola-Jones detection method.Final realization of given algorithm consist of 5 different cascades for frontal/non-frontal faces.
One more thing which we take from the simulation results is a low computational complexity of CEDT algorithm in comparison with standard Viola-Jones approach.This could prove important in the embedded system and mobile device industries because it can reduce the cost of hardware and make battery life longer.
set with S samples.Each training sample is tested with B comparing the intensity of pixels for each internal node, which it passes on the path length D of the root node to the terminal.Construction of a tree requires (2 ) D O byte of storage and speed of their work is proportional to ( ) O D .
a decision tree Tk by weighted least squares cs for image Is with weight ws b) Update weights: 15 positive training samples with variations in pose and scale of a face are obtained from every original image after transformation.This makes the detector more robust to noises.300 000 negative samples are also used for training.The training parameters are set previously.The depth of each tree is fixed at 6 and use 20 classification degrees.Each stage has a predetermined amount of classification trees and the level of detection.Optimization for each internal node of the tree included 256 binary tests.The optimization process significantly improves the performance stage.AFLW database are required also for rotated face detectors training.This dataset contains 4264 images with annotated frame of rotated face on 30 -60° and 6248 images with annotated frame of rotated face on 60 -90°.The negative samples are similar to samples were chosen for training the frontal detector.The following transformations are applied to this dataset: in-plane rotation through angles 5 , 10 ,     shifting the image on 2.5 , 5 ,     scaling 5   .The final detector (CEDT Multi) consists of five trained modules: CEDT frontal, CEDT left 30-60, CEDT left 60-90, CEDT right 30-60, CEDT right 60-90 as shown in Fig. 1.

Figure 3 .
Figure 2. ROC-curves comparison: a) different modules of CEDT face detector; b) CEDT face detector vs Viola-Jones algorithms