Spatial Cross-Attention RGB-D Fusion Module for Object Detection

Home
Publications
Spatial Cross-Attention RGB-D Fusion Module for Object Detection

Research Area

AI & Machine Learning

Author

Shangyin Gao, Lev Markhasin, Bi Wang
* External authors

Company

Sony Europe B.V.

Venue

MMSP

Date

2021

View Publication

Abstract

We investigate different RGB and depth fusion techniques for object detection with the aim to improve the detection accuracy compared to RGB-only systems. We consider recent proposal-free convolutional object detectors which we modify for RGB-D data. We introduce a third mixed branch in our network beside the RGB and depth branches and define a novel attention mechanism which extracts weighted features from the depth branch and applies them to the RGB feature map thus fusing the branches adaptively. Our method, which we call spatial Cross-Attention Fusion network or CAF-Net yields a state-of-the-art mean average precision of 60.3% on the SUN RGB-D dataset outperforming all previous techniques by a significant margin.

この記事をシェアする