AN ALGORITHM FOR PEDESTRIAN DETECTION IN MULTISPECTRAL IMAGE SEQUENCES

Abstract. The growing interest for self-driving cars provides a demand for scene understanding and obstacle detection algorithms. One of the most challenging problems in this field is the problem of pedestrian detection. Main difficulties arise from a diverse appearances of pedestrians. Poor visibility conditions such as fog and low light conditions also significantly decrease the quality of pedestrian detection. This paper presents a new optical flow based algorithm BipedDetet that provides robust pedestrian detection on a single-borad computer. The algorithm is based on the idea of simplified Kalman filtering suitable for realization on modern single-board computers. To detect a pedestrian a synthetic optical flow of the scene without pedestrians is generated using slanted-plane model. The estimate of a real optical flow is generated using a multispectral image sequence. The difference of the synthetic optical flow and the real optical flow provides the optical flow induced by pedestrians. The final detection of pedestrians is done by the segmentation of the difference of optical flows. To evaluate the BipedDetect algorithm a multispectral dataset was collected using a mobile robot.


INTRODUCTION
Pedestrian detection algorithms developed to date could be composed in a vast list.Such algorithms range from classical computer vision methods based on the feature detection to modern deep convolutional neural networks based methods.Modern algorithms such as DetectNet deep neural network provide performance comparable with a human operator.However such methods require a powerful GPU for a real-time detection.Thus a different approach is required to provide a robust performance on a system with a limited computational resources such as a mobile robot.
This paper presents a new optical flow based algorithm BipedDetect that provides robust pedestrian detection on a single-borad computer.Optical flow based methods prove to be a fast and robust solution for the problem of moving pedestrian detection.As the optical flow detection is based on motion it is robust against unknown pedestrian appearance.However most of optical flow estimation algorithms provide poor performance under low light conditions.The BipedDetect algorithm is designed to work with multispectral image sequences to improve the performance in degraded visual environments.

Related work
The following section presents related work on pedestrian detection.To analyse the requirements to a pedestrian detection dataset an overview over existing datasets for pedestrian detetction algorithms is provided.
The problem of pedestrian detection has received a lot of scholar attention.The first approaches to detect obstacles and pedestrians are dated to 2000s.In the paper by (Zheltov and Sibiryakov, 2000) a method of 3D-object detection based on orthophoto difference analysis is proposed.The main idea of the detection method is as follows.If an analytical surface model is known, then orthogonal projections of the given surface are created on some convenient plane using the left and right stereopair images.The calculation of the difference of such projections leads to the appearance of characteristic geometric structures in the vicinity of 3D objects that do not belong to a given surface.Being relatively simple the differential methods prove to be a robust solution for object detection (Kniaz, 2014).
In (Dollár et al., 2010) a multiscale pedestrian detector operating in near real time (6 fps on 640x480 images) with state-of-theart detection performance was presented.To reduce the computational time of the detector it is proposed to approximate the feature responses of a broad family of features, including gradient histograms, using their feature responses at a single scale.The results have showed about 6 fps for detecting pedestrians at least 100 pixels high and 3 fps over 50 pixels.
Interesting method of pedestrian detection was published in (Shirazi and Morris, 2016).Suggested system of detection and tracking works only with necessary area including typical pedestrian paths in the frame.Tracking system uses two methods, optical flow and bipartite graph matching of detections.The presence of typical ways of pedestrians helps to remove false tracks.
Stereo matching and optical flow estimation benchmark was presented in (Geiger et al., 2012).It comprises 194 training and 195 test image pairs at resolution of 1240 x 376 pixels with semi-dense ground truth.3D visual odometry dataset consists of 22 stereo sequences.All the objects were manually labelled in 3D point clouds.
A big multispectral pedestrian dataset was introduced in (Hwang et al., 2015).This dataset includes thermal image sequences of regular traffic scene as well as color image sequences.It has both The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W4, 2017 2nd International ISPRS Workshop on PSBB, 15-17 May 2017, Moscow, Russia This contribution has been peer-reviewed.doi:10.5194/isprs-archives-XLII-2-W4-73-2017are color and thermal image pairs and contains the nighttime traffic sequences which are rarely provided.
An interesting approach for an optical flow dataset generation was proposed by (Menze and Geiger, 2015).It includes two steps: firstly, the static background of the scene is recovered by removing all dynamic objects and compensating for the vehicle's egomotion.After that, dynamic objects are inserted in the scene by fitting their detailed CAD models to the point clouds for each frame.
Recently a number of large datasets for pedestrian detection such as the Caltech pedestrian dataset (Dollár et al., 2009, Dollár et al., 2012b) and various datasets in KITTI Vision Benchmark Suite (Geiger et al., 2012) were developed.However no multispectral datasets that include both pedestrian labelling and ground truth optical flow were developed to date.To evaluate the developed algorithm a new dataset BipedFlow was generated.

DATASET
To provide a reliable basis for analysis of the developed algorithm a multispectral dataset was developed.A brief description of the dataset is presented in the following section.

Dataset design
The dataset was collected using a Hercules mobile robot equipped with a FLIR ONE thermal imaging camera.The robot was moved along various trajectories: a straight line, a sinusoid and a curve.To generate the ground truth optical flow a 3D modelling was used.Models of scenes in which the robot moved were generated using laser scanning.The internal orientation of the camera was recovered using the method proposed by (Gschwandtner et al., 2011).The dataset includes 20 sequences with an average duration of 30 seconds.The ground truth optical flow was generated using ray tracing (Wulff et al., 2012)

Mobile robot
To prepare the test sample, a wheeled robot equipped with a multispectral vision system was used.The robot is based on the Hercules mobile platform and a Raspberry Pi single-board computer (figure 1).The FLIR ONE multispectral camera was used as a video sensor.The camera is connected to a smartphone, that transfers the video sequence to a remote computer.The resolution of the camera in the visible range is 640x480 pixels.The resolution in the infrared range is 160x120 pixels.Complete technical specifications of the FLIR ONE camera are presented in table 2. The robot is controlled from a remote computer via a WiFi network using a dedicated protocol.Table 2. FLIR ONE specifications

BIPEDDETECT ALGORITHM
The BipedDetect algorithm is designed to work with multispectral image sequences to provide a reliable performance under low light conditions.The optical flow is estimated by phase correlation in the frequency domain.The correlation is performed separately in visible and infrared channels.Thus two optical flow images based on infrared and visible channels are produced.The image fusion is based on a correlation coefficient that is provided for each pixel by a phase correlation.

Coordinate systems
Four coordinate systems are defined.Object coordinate system OoXoYoZo is related to some object of interest in the observed scene and defined as follows: OoXoZo plane is normal to gravity vector, the Yo axis is normal to Xo, Zo axes.The point of origin is related to some point of the observed scene and is selected appropriately for a given problem (figure 2).
The origin of the image coordinate system OiXiYiZi is located in the upper left pixel of the image, the Xi is directed to the right, the Yi axis is directed downwards.
The origin of the robot's coordinate system OrXrYrZr is located in the center of the upped deck of the robot.The Yr axis is directed towards forward motion of the robot, the Zr axis is normal to the upper deck of the robot.
The origin of the camera coordinate system OcXcYcZc is located in the perspective center, the X axis is collinear with the Xi axis, the Yc axis is collinear with the Yi axis, the Zc axis is normal to Xc and Yc axes.The rotation of the camera coordinate system with respect to object coordinate system is defined using rotation matrix Roc: where Rα -rotation matrix around the axis Y , Rω -rotation matrix around the axis X, Rκ -rotation matrix around the axis Z.

Algorithm pipeline
The algorithm consists of four main steps.Firstly optical flow is estimated using infrared and visible spectrum image sequences.Secondly robot's motion is estimated using the data from robot's encoders and the state-space model.The block diagram of the algorithm is presented in figure 3.

Multispectral optical flow fusion
The estimation of multispectral optical flow is performed separately for images in visible spectrum and infrared images.Various optical flow estimations algorithms were evaluated (Revaud et al., 2015, Liu, 2009, Farnebäck, 2003, Dosovitskiy et al., 2015).The FlowNet deep neural network based algorithm showed the best performance during the evaluation of the algorithms using the ground truth optical flow.However only computational complexity of the algorithm proposed in (Farnebäck, 2003) was low enough for implementation on a single-board computer.
Various methods of optical flow fusion were studied.Firstly the most simple fusion model based on the selection of maximum flow among the channels were considered.Let uT V = (uT V , vT V ) be the visible spectrum optical flow, uIR = (uT V , vT V ) infrared optical flow.Then maximum fusion method is given by Also fusion method based on the mean value of the optical flows was studied.The mean fusion method is given by Finally the adaptive relay threshold method introduced in (Man et al., 2007) was studied.The fused optical flow for the relay method is given by All three methods were implemented in the dedicated software and their performance was studied using the BipedFlow dataset.
The evaluation have showed that the fusion based on the maximum has the minimal root mean square error.

State-space model of the mobile robot
A state-space model of the Hercules robot was developed during previous research (Kniaz, 2015, Kniaz, 2016).The state-space dynamic model of the robot is given by: Where output vector Y (t) of the robot consists of longitudinal speed vx and rotational speed θ: The input vector U (t) consists of average and difference motor speeds for left and right wheel motors: A modified PASCAL metric (Everingham et al., 2009) proposed in (Dollár et al., 2012a) was used.The log-average miss rate for BipedDetect algorithm was 52% during the daylight and 43% during the night.The computational complexity of the algorithm is suitable for the realisation on a single-board computer with a CPU of 1.2 GHz for input images with the resolution of 160x120.Evaluation of the algorithm proved that it is robust against low light and high noise level.

CONCLUSION
The BipedDetect algorithm for pedestrian detection on mobile robots was developed.The algorithm is based on estimation of a multispectral optical flow and a synthetic optical flow.The multispectral optical flow is generated by fusion of the optical flow estimated for visible spectrum channel and infrared channel.Various optical flow fusion methods were evaluated.The evaluation proved that the best performance is obtained by the fusion based on the maximum of the optical flow.
A synthetic optical flow is generated using ray tracing and a simplified model of the scene.To estimate an interframe motion of the camera a state-space model is used.The difference of the synthetic optical flow and the multispectral optical flow provides the optical flow induced by the moving pedestrians and doesn't include optical flow induced by the camera's egomotion.The final detection of the pedestrians is performed by thresholding the difference of multispectral and synthetic optical flows and segmentation of the resulting optical flow.
To evaluate the performance of the BipedDetect algorithm a BipedFlow multispectral optical flow dataset was developed.It was generated using a mobile robot equipped with a multispectral camera.The BipedFlow dataset includes various scenarios captured during the daylight and in low light conditions.The evaluation of the BipedDetect algorithm proved that it is robust against low light level.The log-average miss rate for BipedDetect algorithm was 52% during the daylight and 43% during the night.The computational complexity of the algorithm is suitable for the realisation on a single-board computer with a CPU of 1.2 GHz for input images with the resolution of 160x120.

Figure 1 .
Figure 1.The mobile robot with the FLIR ONE multispectral camera Figure 2. Coordinate systems

Figure 3 .
Figure 3. Block diagram of the BipedDetect algorithm

Table 1 .
in Blender 3D creation suite.Scenarios, trajectories and illumination conditions used in the BipedDetect dataset.