Computer Science > Computer Vision and Pattern Recognition

arXiv:2203.10314 (cs)

[Submitted on 19 Mar 2022]

Title:Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds

Authors:Chenhang He, Ruihuang Li, Shuai Li, Lei Zhang

View PDF

Abstract:Transformer has demonstrated promising performance in many 2D vision tasks. However, it is cumbersome to compute the self-attention on large-scale point cloud data because point cloud is a long sequence and unevenly distributed in 3D space. To solve this issue, existing methods usually compute self-attention locally by grouping the points into clusters of the same size, or perform convolutional self-attention on a discretized representation. However, the former results in stochastic point dropout, while the latter typically has narrow attention fields. In this paper, we propose a novel voxel-based architecture, namely Voxel Set Transformer (VoxSeT), to detect 3D objects from point clouds by means of set-to-set translation. VoxSeT is built upon a voxel-based set attention (VSA) module, which reduces the self-attention in each voxel by two cross-attentions and models features in a hidden space induced by a group of latent codes. With the VSA module, VoxSeT can manage voxelized point clusters with arbitrary size in a wide range, and process them in parallel with linear complexity. The proposed VoxSeT integrates the high performance of transformer with the efficiency of voxel-based model, which can be used as a good alternative to the convolutional and point-based backbones. VoxSeT reports competitive results on the KITTI and Waymo detection benchmarks. The source codes can be found at \url{this https URL}.

Comments:	11 pages, 4 figures, CVPR2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2203.10314 [cs.CV]
	(or arXiv:2203.10314v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2203.10314

Submission history

From: Chenhang He [view email]
[v1] Sat, 19 Mar 2022 12:31:46 UTC (1,090 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators