Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.00598 (cs)

[Submitted on 1 Dec 2023 (v1), last revised 28 Mar 2024 (this version, v2)]

Title:Learning from One Continuous Video Stream

Authors:João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman

View PDF HTML (experimental)

Abstract:We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling. This poses great challenges given the high correlation between consecutive video frames and there is very little prior work on it. Our framework allows us to do a first deep dive into the topic and includes a collection of streams and tasks composed from two existing video datasets, plus methodology for performance evaluation that considers both adaptation and generalization. We employ pixel-to-pixel modelling as a practical and flexible way to switch between pre-training and single-stream evaluation as well as between arbitrary tasks, without ever requiring changes to models and always using the same pixel loss. Equipped with this framework we obtained large single-stream learning gains from pre-training with a novel family of future prediction tasks, found that momentum hurts, and that the pace of weight updates matters. The combination of these insights leads to matching the performance of IID learning with batch size 1, when using the same architecture and without costly replay buffers.

Comments:	CVPR camera ready version
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2312.00598 [cs.CV]
	(or arXiv:2312.00598v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.00598

Submission history

From: Joao Carreira [view email]
[v1] Fri, 1 Dec 2023 14:03:30 UTC (9,035 KB)
[v2] Thu, 28 Mar 2024 21:29:55 UTC (16,787 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Learning from One Continuous Video Stream

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Learning from One Continuous Video Stream

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators