Efficient Joint Detection and Multiple Object Tracking with Spatially Aware Transformer

Home
Publications
Efficient Joint Detection and Multiple Object Tracking with Spatially Aware Transformer

Research Area

AI & Machine Learning

Author

Siddharth Sagar Nijhawan, Leo Hoshikawa, Atsushi Irie, Masakazu Yoshimura, Junji Otsuka, Takeshi Ohashi
* External authors

Company

Sony Group Corporation

Venue

Transformers for Vision Workshop at CVPR

Date

2023

View Publication

Abstract

We propose a light-weight and highly efficient Joint Detection and Tracking pipeline for the task of Multi-Object Tracking using a fully-transformer architecture. It is a modified version of TransTrack, which overcomes the computational bottleneck associated with its design, and at the same time, achieves state-of-the-art MOTA score of 73.20%. The model design is driven by a transformer based backbone instead of CNN, which is highly scalable with the input resolution. We also propose a drop-in replacement for Feed Forward Network of transformer encoder layer, by using Butterfly Transform Operation to perform channel fusion and depth-wise convolution to learn spatial context within the feature maps, otherwise missing within the attention maps of the transformer. As a result of our modifications, we reduce the overall model size of TransTrack by 58.73% and the complexity by 78.72%. Therefore, we expect our design to provide novel perspectives for architecture optimization in future research related to multi-object tracking.

Related Publications

View All

Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments

CVPR|2023

Masakazu Yoshimura, Junji Otsuka, Atsushi Irie, Takeshi Ohashi

Image recognition models that work in challenging environments (e.g., extremely dark, blurry, or high dynamic […]

View All

この記事をシェアする