Seeing is Worse than Believing: Reading People’s Minds Better than Computer-Vision Methods Recognize Actions

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8693))

Included in the following conference series:

European Conference on Computer Vision

23k Accesses
3 Citations

Abstract

We had human subjects perform a one-out-of-six class action recognition task from video stimuli while undergoing functional magnetic resonance imaging (fMRI). Support-vector machines (SVMs) were trained on the recovered brain scans to classify actions observed during imaging, yielding average classification accuracy of 69.73% when tested on scans from the same subject and of 34.80% when tested on scans from different subjects. An apples-to-apples comparison was performed with all publicly available software that implements state-of-the-art action recognition on the same video corpus with the same cross-validation regimen and same partitioning into training and test sets, yielding classification accuracies between 31.25% and 52.34%. This indicates that one can read people’s minds better than state-of-the-art computer-vision methods can perform action recognition.

Download to read the full chapter text

Chapter PDF

A large-scale fMRI dataset for human action recognition

Article Open access 27 June 2023

Understanding action concepts from videos and brain activity through subjects’ consensus

Article Open access 09 November 2022

Manifold Methods for Action Recognition

Keywords

References

Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: International Conference on Computer Vision, vol. 2, pp. 1395–1402 (2005)
Google Scholar
Cao, Y., Barrett, D., Barbu, A., Narayanaswamy, S., Yu, H., Michaux, A., Lin, Y., Dickinson, S., Siskind, J.M., Wang, S.: Recognizing human activities from partially observed videos. In: Computer Vision and Pattern Recognition, pp. 2658–2665 (2013)
Google Scholar
Connolly, A.C., Guntupalli, J.S., Gors, J., Hanke, M., Halchenko, Y.O., Wu, Y.C., Abdi, H., Haxby, J.V.: The representation of biological classes in the human brain. The Journal of Neuroscience 32(8), 2608–2618 (2012)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Cox, R.W.: AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Computers and Biomedical Research 29(3), 162–173 (1996)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision 88(2), 303–338 (2010)
Article Google Scholar
Fellbaum, C.: WordNet: an electronic lexical database. MIT Press, Cambridge (1998)
Google Scholar
Gu, Q., Li, Z., Han, J.: Linear discriminant dimensionality reduction. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS, vol. 6911, pp. 549–564. Springer, Heidelberg (2011)
Chapter Google Scholar
Hanson, S.J., Halchenko, Y.O.: Brain reading using full brain support vector machines for object recognition: there is no “face” identification area. Neural Computation 20(2), 486–503 (2009)
Article MathSciNet Google Scholar
Haxby, J.V., Guntupalli, J.S., Connolly, A.C., Halchenko, Y.O., Conroy, B.R., Gobbini, M.I., Hanke, M., Ramadge, P.J.: A common, high-dimensional model of the representational space in human ventral temporal cortex. Neuron 72(2), 404–416 (2011)
Article Google Scholar
Huettel, S.A., Song, A.W., McCarthy, G.: Functional magnetic resonance imaging. Sinauer Associates, Sunderland (2004)
Google Scholar
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: International Conference on Computer Vision, pp. 1–8 (2007)
Google Scholar
Just, M.A., Cherkassky, V.L., Aryal, S., Mitchell, T.M.: A neurosemantic theory of concrete noun representation based on the underlying brain codes. PloS One 5(1), e8622 (2010)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: International Conference on Computer Vision, pp. 2556–2563 (2011)
Google Scholar
Laptev, I.: On space-time interest points. International Journal of Computer Vision 64(2-3), 107–123 (2005)
Article Google Scholar
Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Computer Vision and Pattern Recognition, pp. 3361–3368 (2011)
Google Scholar
Liu, H., Feris, R., Sun, M.T.: Benchmarking datasets for human activity recognition, ch. 20, pp. 411–427. Springer (2011)
Google Scholar
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: International Conference on Computer Vision, pp. 104–111 (2009)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Communications of the ACM 38(11), 39–41 (1995)
Article Google Scholar
Pereira, F., Botvinick, M., Detre, G.: Using Wikipedia to learn semantic feature representations of concrete concepts in neuroimaging experiments. Artificial Intelligence 194, 240–252 (2012)
Article MathSciNet Google Scholar
Poldrack, R.A., Halchenko, Y.O., Hanson, S.J.: Decoding the large-scale structure of brain function by classifying mental states across individuals. Psychological Science 20(11), 1364–1372 (2009)
Article Google Scholar
Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Machine Vision and Applications 24(5), 971–981 (2013)
Article Google Scholar
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: International Conference on Computer Vision, pp. 1036–1043 (2011)
Google Scholar
Sadanand, S., Corso, J.J.: Action Bank: A high-level representation of activity in video. In: Computer Vision and Pattern Recognition, pp. 1234–1241 (2012)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition, vol. 3, pp. 32–36 (2004)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. Computing Research Repository abs/1212.0402 (2012)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: Computer Vision and Pattern Recognition, pp. 3169–3176 (2011)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: International Conference on Computer Vision, pp. 3551��3558 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

MIT, Cambridge, MA, USA
Andrei Barbu
Purdue University, West Lafayette, IN, USA
Daniel P. Barrett, Sébastien Hélie, Jeffrey Mark Siskind, Thomas Michael Talavage & Ronnie B. Wilbur
SUNY Buffalo, Buffalo, NY, USA
Wei Chen
Stanford University, Stanford, CA, USA
Narayanaswamy Siddharth
University of California at Los Angeles, Los Angeles, CA, USA
Caiming Xiong
University of Michigan, Ann Arbor, MI, USA
Jason J. Corso
Princeton University, Princeton, NJ, USA
Christiane D. Fellbaum
Rutgers University, Newark, NJ, USA
Catherine Hanson & Stephen José Hanson
University of Texas at Arlington, Arlington, TX, USA
Evguenia Malaia
National University of Ireland Maynooth, Co. Kildare, Ireland
Barak A. Pearlmutter

Authors

Andrei Barbu
View author publications
You can also search for this author in PubMed Google Scholar
Daniel P. Barrett
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Narayanaswamy Siddharth
View author publications
You can also search for this author in PubMed Google Scholar
Caiming Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Jason J. Corso
View author publications
You can also search for this author in PubMed Google Scholar
Christiane D. Fellbaum
View author publications
You can also search for this author in PubMed Google Scholar
Catherine Hanson
View author publications
You can also search for this author in PubMed Google Scholar
Stephen José Hanson
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Hélie
View author publications
You can also search for this author in PubMed Google Scholar
Evguenia Malaia
View author publications
You can also search for this author in PubMed Google Scholar
Barak A. Pearlmutter
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Mark Siskind
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Michael Talavage
View author publications
You can also search for this author in PubMed Google Scholar
Ronnie B. Wilbur
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barbu, A. et al. (2014). Seeing is Worse than Believing: Reading People’s Minds Better than Computer-Vision Methods Recognize Actions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-10602-1_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10601-4
Online ISBN: 978-3-319-10602-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Seeing is Worse than Believing: Reading People’s Minds Better than Computer-Vision Methods Recognize Actions

Abstract

Chapter PDF

Similar content being viewed by others

A large-scale fMRI dataset for human action recognition

Understanding action concepts from videos and brain activity through subjects’ consensus

Manifold Methods for Action Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Seeing is Worse than Believing: Reading People’s Minds Better than Computer-Vision Methods Recognize Actions

Abstract

Chapter PDF

Similar content being viewed by others

A large-scale fMRI dataset for human action recognition

Understanding action concepts from videos and brain activity through subjects’ consensus

Manifold Methods for Action Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation