research-article

Open access

Towards Discrete Object Representations in Vision Transformers with Tensor Products

Authors:

Chern Hong Lim,

Ian K.T. TanAuthors Info & Claims

CSAI '23: Proceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence

Pages 190 - 194

https://doi.org/10.1145/3638584.3638633

Published: 14 March 2024 Publication History

All formats PDF

Abstract

In this work, we explore the use of Tensor Product Representations (TPRs) in a Vision Transformer model to form image representations that can later be used for symbolic manipulation in a neurosymbolic model. We propose the Tensor Product Vision Transformer (TP-ViT), an enhancement of a Vision Transformer that incorporates TPRs, an object representation methodology that utilizes filler and role vectors to represent objects. TP-ViT is the first application of TPRs on visual input, and we report qualitative and quantitative results which show that the use of TPRs allows for the formation of more targeted and diverse object representations when compared to a standard Vision Transformer.

References

[1]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009). https://doi.org/10.1109/cvpr.2009.5206848

[2]

Klaus Greff, Sjoerd Van Steenkiste, and Jürgen Schmidhuber. 2020. On the binding problem in artificial neural networks. arXiv preprint arXiv:2012.05208 (2020).

[3]

Geoffrey E Hinton, Alex Krizhevsky, and Sida D Wang. 2011. Transforming auto-encoders. In International conference on artificial neural networks. Springer, 44–51.

[4]

Qiuyuan Huang, Li Deng, Dapeng Wu, Chang Liu, and Xiaodong He. 2019. Attentive Tensor Product Learning. Proc. Conf. AAAI Artif. Intell. 33, 01 (July 2019), 1344–1351.

Digital Library

[5]

Michael Iuzzolino, Yoram Singer, and Michael C Mozer. 2019. Convolutional bipartite attractor networks. arXiv preprint arXiv:1906.03504 (2019).

[6]

Jindong Jiang, Sepehr Janghorbani, Gerard De Melo, and Sungjin Ahn. 2019. Scalor: Generative world models with scalable object representations. arXiv preprint arXiv:1910.02384 (2019).

[7]

Alexander Kolesnikov, Alexey Dosovitskiy, Dirk Weissenborn, Georg Heigold, Jakob Uszkoreit, Lucas Beyer, Matthias Minderer, Mostafa Dehghani, Neil Houlsby, Sylvain Gelly, Thomas Unterthiner, and Xiaohua Zhai. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

[8]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional Neural Networks. Commun. ACM 60, 6 (2017), 84–90. https://doi.org/10.1145/3065386

Digital Library

[9]

Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Weihao Sun, Gautam Singh, Fei Deng, Jindong Jiang, and Sungjin Ahn. 2020. Space: Unsupervised object-oriented scene representation via spatial attention and decomposition. arXiv preprint arXiv:2001.02407 (2020).

[10]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).

[11]

Michael C Mozer, Denis Kazakov, and Robert V Lindsey. 2018. State-denoised recurrent neural networks. arXiv preprint arXiv:1805.08394 (2018).

[12]

David P Reichert and Thomas Serre. 2013. Neuronal synchrony in complex-valued deep networks. arXiv preprint arXiv:1312.6115 (2013).

[13]

David Saxton, Edward Grefenstette, Felix Hill, and Pushmeet Kohli. 2019. Analysing mathematical reasoning abilities of neural models. arXiv preprint arXiv:1904.01557 (2019).

[14]

Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, and Jianfeng Gao. 2019. Enhancing the transformer with explicit relational encoding for math problem solving. arXiv preprint arXiv:1910.06611 (2019).

[15]

Paul Smolensky. 1990. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial intelligence 46, 1-2 (1990), 159–216.

[16]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[17]

Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zi-Hang Jiang, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. 2021. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE/CVF international conference on computer vision. 558–567.

[18]

Chongzhi Zhang, Mingyuan Zhang, Shanghang Zhang, Daisheng Jin, Qiang Zhou, Zhongang Cai, Haiyu Zhao, Shuai Yi, Xianglong Liu, and Ziwei Liu. 2021. Delving deep into the generalization of vision transformers under distribution shifts. arXiv preprint arXiv:2106.07617 (2021).

Index Terms

Towards Discrete Object Representations in Vision Transformers with Tensor Products
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Three-dimensional representations for computer graphics and computer vision

Representing complex three-dimensional objects in a computer involves more than just evaluating its display capabilities. Other factors are the uses and costs of the representation, what operations can be performed on it and, ultimately, how useful it ...
Three-dimensional representations for computer graphics and computer vision
SIGGRAPH '78: Proceedings of the 5th annual conference on Computer graphics and interactive techniques

Representing complex three-dimensional objects in a computer involves more than just evaluating its display capabilities. Other factors are the uses and costs of the representation, what operations can be performed on it and, ultimately, how useful it ...
A survey of the vision transformers and their CNN-transformer based variants
Abstract
Vision transformers have become popular as a possible substitute to convolutional neural networks (CNNs) for a variety of computer vision applications. These transformers, with their ability to focus on global relationships in images, offer large ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CSAI '23: Proceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence

December 2023

563 pages

ISBN:9798400708688

DOI:10.1145/3638584

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Ministry of Higher Education Malaysia

Conference

CSAI 2023

CSAI 2023: 2023 7th International Conference on Computer Science and Artificial Intelligence

December 8 - 10, 2023

Beijing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
98
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)10

Reflects downloads up to 24 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents