Object trackers based on Convolution Neural Network (CNN) have achieved state-of-the-art performance on recent tracking benchmarks, while they suffer from slow computational speed. The high computational load arises from the extraction of the feature maps of the candidate and training patches in every video frame. The candidate and training patches are typically placed randomly around the previous target location and the estimated target location respectively. In this paper, we propose novel schemes to speed-up the processing of the CNN-based trackers. We input the whole region-of-interest once to the CNN to eliminate the redundant computations of the random candidate patches. In addition to classifying each candidate patch as an object or background, we adapt the CNN to classify the target location inside the object patches as a coarse localization step, and we employ bilinear interpolation for the CNN feature maps as a fine localization step. Moreover, bilinear interpolation is exploited to generate CNN feature maps of the training patches without actually forwarding the training patches through the network which achieves a significant reduction of the required computations. Our tracker does not rely on offline video training. It achieves competitive performance results on the OTB benchmark with 8x speed improvements compared to the equivalent tracker.
Visual object tracking is an active topic in the computer vision domain with applications extending over numerous fields. The main sub-tasks required to build an object tracker (e.g. object detection, feature extraction and object tracking) are computation-intensive. In addition, real-time operation of the tracker is indispensable for almost all of its applications. Therefore, complete hardware or hardware/software co-design approaches are pursued for better tracker implementations. This paper presents a literature survey of the hardware implementations of object trackers over the last two decades. Although several tracking surveys exist in literature, a survey addressing the hardware implementations of the different trackers is missing. We believe this survey would fill the gap and complete the picture with the existing surveys of how to design an efficient tracker and point out the future directions researchers can follow in this field. We highlight the lack of hardware implementations for state-of-the-art tracking algorithms as well as for enhanced classical algorithms. We also stress the need for measuring the tracking performance of the hardware-based trackers. Additionally, enough details of the hardware-based trackers need to be provided to allow reasonable comparison between the different implementations.