The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume XLIII-B2-2020
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B2-2020, 623–630, 2020
https://doi.org/10.5194/isprs-archives-XLIII-B2-2020-623-2020
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B2-2020, 623–630, 2020
https://doi.org/10.5194/isprs-archives-XLIII-B2-2020-623-2020

  12 Aug 2020

12 Aug 2020

VEHICLE TRACKING AND SPEED ESTIMATION FROM UNMANNED AERIAL VIDEOS

M. Shahbazi2, S. Simeonova1, D. Lichti1, and J. Wang1 M. Shahbazi et al.
  • 1Dept. of Geomatics Engineering, University of Calgary, Calgary, T2N 1N4, Canada
  • 2Centre de géomatique du Québec, Saguenay, G7H 1Z6, Canada

Keywords: Tracking-by-Detection, Deep learning, 3D Reconstruction, Ray Casting, Feature Tracking, Speed Estimation

Abstract. In this paper, a solution for vehicle speed estimation using unmanned aerial videos is described. First, convolutional neural networks and Kalman filtering using deep features are used for detecting and tracking vehicles. Then, a photogrammetric approach is developed for estimating the three-dimensional (3D) position of the tracked vehicles on the road, which allows determining their speed. No assumptions are made about either the 3D structure of the road (e.g., constraining it to be a planar surface) or the camera pose (e.g., restricting it to be stationary). Therefore, this solution applies to videos acquired by a moving unmanned aerial vehicle from complex road structures (e.g., multi-level highways). This solution is also robust to changes of viewpoint and scale, which makes it applicable to situations where cars undergo orientation and resolution changes as observed from the sky (e.g., in roundabouts). Experiments showed that a high detection accuracy could be achieved with an F1-score of 94.54%. Besides, the tracking technique performed well, with a multiple-object tracking accuracy of 89.8% at a speed of 11 frames per second on videos of 2720×1530 pixels. Vehicle positioning (and thus, speed estimation) could be performed with an average accuracy of 0.6 m.