[PDF][PDF] A Cookbook of Self-Supervised Learning

J Geiping, Q Garrido, P Fernandez, A Bar…�- arXiv preprint arXiv�…, 2023 - arimorcos.com
arXiv preprint arXiv:2304.12210, 2023arimorcos.com
Self-supervised learning, dubbed “the dark matter of intelligence” 1, is a promising path to
advance machine learning. As opposed to supervised learning, which is limited by the
availability of labeled data, self-supervised approaches can learn from vast unlabeled data
[Chen et al., 2020b, Misra and Maaten, 2020]. Self-supervised learning (SSL) underpins
deep learning's success in natural language processing leading to advances from
automated machine translation to large language models trained on web-scale corpora of�…
Self-supervised learning, dubbed “the dark matter of intelligence” 1, is a promising path to advance machine learning. As opposed to supervised learning, which is limited by the availability of labeled data, self-supervised approaches can learn from vast unlabeled data [Chen et al., 2020b, Misra and Maaten, 2020]. Self-supervised learning (SSL) underpins deep learning’s success in natural language processing leading to advances from automated machine translation to large language models trained on web-scale corpora of unlabeled text [Brown et al., 2020, Popel et al., 2020]. In computer vision, SSL pushed new bounds on data size with models such as SEER trained on 1 billion images [Goyal et al., 2021]. SSL methods for computer vision have been able to match or in some cases surpass models trained on labeled data, even on highly competitive benchmarks like ImageNet [Tomasev et al., 2022, He et al., 2020a, Deng et al., 2009]. SSL has also been successfully applied across other modalities such as video, audio, and time series [Wickstr�m et al., 2022, Liu et al., 2022a, Schiappa et al., 2022a]. Self-supervised learning defines a pretext task based on unlabeled inputs to produce descriptive and intelligible representations [Hastie et al., 2009, Goodfellow et al., 2016]. In natural language, a common SSL objective is to mask a word in the text and predict the surrounding words. This objective of predicting the context surrounding a word encourages the model to capture relationships among words in the text without the need for any labels. The same SSL model representations can be used across a range of downstream tasks such as translating text across languages, summarizing, or even generating text, along with many others. In computer vision, analogous objectives exist with models such as MAE or BYOL learning to predict masked patches of an image or representation [Grill et al., 2020, He et al., 2022]. Other SSL objectives encourage two views of the same image, formed by say adding color or cropping, to be mapped to similar representations.
With the power to train on vast unlabeled data comes many benefits. While traditional supervised learning methods are trained on a specific task often known a priori based on the available labeled data, SSL learns generic representations useful across many tasks. SSL can be especially useful in domains such as medicine where labels are costly or the specific task can not be known a priori [Krishnan et al., 2022, Ciga et al., 2022]. There’s also evidence SSL models can learn representations that are more robust to adversarial examples, label corruption, and input perturbations—and are more fair—compared to their supervised counterparts [Hendrycks et al., 2019, Goyal et al., 2022]. Consequently, SSL is a field garnering growing interest. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry.
arimorcos.com
Showing the best result for this search. See all results