Computer Science > Machine Learning

arXiv:2304.13089 (cs)

[Submitted on 25 Apr 2023]

Title:Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations

Authors:Shashank Shekhar, Florian Bordes, Pascal Vincent, Ari Morcos

View PDF

Abstract:Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading paradigms for self-supervised learning of vision transformers, but they differ substantially in their transfer performance. Here, we aim to explain these differences by analyzing the impact of these objectives on the structure and transferability of the learned representations. Our analysis reveals that reconstruction-based learning features are significantly dissimilar to joint-embedding based learning features and that models trained with similar objectives learn similar features even across architectures. These differences arise early in the network and are primarily driven by attention and normalization layers. We find that joint-embedding features yield better linear probe transfer for classification because the different objectives drive different distributions of information and invariances in the learned representation. These differences explain opposite trends in transfer performance for downstream tasks that require spatial specificity in features. Finally, we address how fine-tuning changes reconstructive representations to enable better transfer, showing that fine-tuning re-organizes the information to be more similar to pre-trained joint embedding models.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Cite as:	arXiv:2304.13089 [cs.LG]
	(or arXiv:2304.13089v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2304.13089

Submission history

From: Shashank Shekhar [view email]
[v1] Tue, 25 Apr 2023 18:48:23 UTC (10,419 KB)

Computer Science > Machine Learning

Title:Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators