The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume XLIV-2/W1-2021
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIV-2/W1-2021, 15–20, 2021
https://doi.org/10.5194/isprs-archives-XLIV-2-W1-2021-15-2021
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIV-2/W1-2021, 15–20, 2021
https://doi.org/10.5194/isprs-archives-XLIV-2-W1-2021-15-2021

  15 Apr 2021

15 Apr 2021

REAL-TIME DEEP NEURAL NETWORKS FOR MULTIPLE OBJECT TRACKING AND SEGMENTATION ON MONOCULAR VIDEO

I. Basharov and D. Yudin I. Basharov and D. Yudin
  • Intelligent Transport Lab., Moscow Institute of Physics and Technology, Dologoprudny, Russia

Keywords: Multiple object tracking, Instance segmentation, Deep neural network, Real time, Monocular video

Abstract. The paper is devoted to the task of multiple objects tracking and segmentation on monocular video, which was obtained by the camera of unmanned ground vehicle. The authors investigate various architectures of deep neural networks for this task solution. Special attention is paid to deep models providing inference in real time. The authors proposed an approach based on combining the modern SOLOv2 instance segmentation model, a neural network model for embedding generation for each found object, and a modified Hungarian tracking algorithm. The Hungarian algorithm was modified taking into account the geometric constraints on the positions of the found objects on the sequence of images. The investigated solution is a development and improvement of the state-of-the-art PointTrack method. The effectiveness of the proposed approach is demonstrated quantitatively and qualitatively on the popular KITTI MOTS dataset collected using the cameras of a driverless car. The software implementation of the approach was carried out. The acceleration of the procedure for the formation of a two-dimensional point cloud in the found image segment was done using the NVidia CUDA technology. At the same time, the proposed instance segmentation module provides a mean processing time of one image of 68 ms, the embedding and tracking module of 24 ms using the NVidia Tesla V100 GPU. This indicates that the proposed solution is promising for on-board computer vision systems for both unmanned vehicles and various robotic platforms.