Abstract
The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener’s native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.
Similar content being viewed by others
Notes
See, for example, the article http://www.nationmaster.com/encyclopedia/Phonology.
“She” and “her” will stand in for “he or she” and “his or her” respectively, unless the context makes clear which gender is intended.
By contrast, in vision the correspondence problem is the challenge of matching features extracted from the two retinas (or from the one retina at different times) that correspond to the same feature in the external world.
The Brown verbal frequency is the frequency of occurrence in verbal language derived from the London-Lund Corpus of English Conversation by Brown (1984).
References
Adda-Decker M (2001) Towards multilingual interoperability in automatic speech recognition. Speech Commun 35(1):5–20
Arbib MA (2005) Interweaving protosign and protospeech: further developments beyond the mirror. Interact Stud Soc Behav Commun Biol Artif Syst 6:145–171
Arbib MA (2006) Aphasia, apraxia and the evolution of the language-ready brain. Aphasiology 20:1–30
Arbib MA (2008) Mirror neurons & language. In: Stemmer B, Whitaker H (eds) Handbook of the neuroscience of language. Elsevier Science, Amsterdam, pp 237–246
Arbib MA (2010) Mirror system activity for action and language is embedded in the integration of dorsal & ventral pathways. Brain and Language 112:12–24
Arbib MA (2012) How the brain got language: the mirror system hypothesis. Oxford University Press, New York
Arbib MA, Rizzolatti G (1997) Neural expectations: a possible evolutionary path from manual skills to language. Commun Cogn 29:393–424
Association IP (1999) The handbook of the international phonetic association. Cambridge University Press, Cambridge
Bahl LR, Jelinek F (1975) Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition. IEEE Trans Inf Theory 21(4):404–411
Barrett AM, Foundas AL, Heilman KM (2005) Speech and gesture are mediated by independent systems. Behav Brain Sci 28:125–126
Basirat A, Sato M, Schwartz J-L, Kahane P, Lachaux J-P (2008) Parieto-frontal gamma band activity during the perceptual emergence of speech forms. NeuroImage 42(1):404–413
Best C, McRoberts G, Goodell E (2001) Discrimination of non-native consonant contrasts varying in perceptual assimilation to the listener’s native phonological system. J Acoust Soc Am 109(2):775–794
Bonaiuto JB, Arbib MA (2010) Extending the mirror neuron system model, II: what did I just do? A new role for mirror neurons. Biol Cybern 102:341–359
Bonaiuto JB, Rosta E, Arbib MA (2007) Extending the mirror neuron system model, I: audible actions and invisible grasps. Biol Cybern 96:9–38
Bradlow AR, Bent T (2008) Perceptual adaptation to non-native speech. Cognition 106(2):707
Brown GD (1984) A frequency count of 190,000 words in the London-Lund Corpus of English conversation. Behav Res Methods 16(6):502–532
Buccino G, Lui F, Canessa N, Patteri I, Lagravinese G, Benuzzi F, Porro CA, Rizzolatti G (2004) Neural circuits involved in the recognition of actions performed by nonconspecifics: an FMRI study. J Cogn Neurosci 16(1):114–126
Eisner F, McQueen JM (2005) The specificity of perceptual learning in speech processing. Atten Percept Psychophys 67(2):224–238
Fagg AH, Arbib MA (1998) Modeling parietal-premotor interactions in primate control of grasping. Neural Netw 11(7–8):1277–1303
Ferrari PF, Gallese V, Rizzolatti G, Fogassi L (2003) Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex. Eur J Neurosci 17(8):1703–1714
Ferrari PF, Rozzi S, Fogassi L (2005) Mirror neurons responding to observation of actions made with tools in monkey ventral premotor cortex. J Cogn Neurosci 17(2):212–226
Ferrari PF, Visalberghi E, Paukner A, Fogassi L, Ruggiero A, Suomi SJ (2006) Neonatal imitation in rhesus macaques. PLoS Biol 4(9):e302
Francis A, Baldwin K, Nusbaum H (2000) Effects of training on attention to acoustic cues. Percept Psychophys 62(8):1668–1680. doi:10.3758/BF03212164
Francis AL, Nusbaum HC (2002) Selective attention and the acquisition of new phonetic categories. J Exp Psychol Hum Percept Perform 28(2):349–366
Galantucci B, Fowler CA, Turvey MT (2006) The motor theory of speech perception reviewed. Psychon Bull Rev 13(3):361–377
Gales M, Young S (2007) The application of hidden Markov models in speech recognition. Found Trends in Signal Process 1: 195–304
Gallese V, Fogassi L, Fadiga L, Rizzolatti G (2002) Action representation and the inferior parietal lobule. In: Prinz W, Hommel B (eds) Attention & performance XIX. Common mechanisms in perception and action. Oxford University Press, Oxford
Goldinger SD (1998) Echoes of echoes? An episodic theory of lexical access. Psychol Rev 105(2):251
Goldstein L, Byrd D, Saltzman E (2006) The role of vocal tract gestural action units in understanding the evolution of phonology. In: Arbib MA (ed) From action to language via the mirror system. Cambridge University Press, Cambridge, pp 215–249
Goldstone RL (1998) Perceptual learning. Annu Rev Psychol 49(1):585–612
Goodale MA, Milner AD (1992) Separate visual pathways for perception and action. Trends Neurosci 15:20–25
Grossberg S (2003) Resonant neural dynamics of speech perception. J Phon 31(3):423–445
Guenther FH, Ghosh SS, Tourville JA (2006) Neural modeling and imaging of the cortical interactions underlying syllable production. Brain Lang 96(3):280–301
Hawkins S (2003) Roles and representations of systematic fine phonetic detail in speech understanding. J Phon 31(3):373–405
Hickok G (2009) The functional neuroanatomy of language. Phys Life Rev 6:121–143
Hickok G, Poeppel D (2004) Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92(1–2):67–99
Hickok G, Poeppel D (2009) Motor influence of speech perception: the view from Grenoble. Talking brains news and views on the neural organization of language (Blog moderated by Greg Hickok and David Poeppel) http://talkingbrains.blogspot.com/2009/2004/motor-influence-of-speech-perception.html
Hintzman DL (1986) Schema abstraction in a multiple-trace memory model. Psychol Rev 93:411–428
Jaynes ET (2003) Probability theory: the logic of science. Cambridge university press, Cambridge
Kirchhoff K (1998) Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments. In: Proceedings of ICSLP, Citeseer, pp 891–894
Klatt DH (1979) Speech perception: a model of acoustic-phonetic analysis and lexical access. J Phon 7(312):1–26
Kohler E, Keysers C, Umilta MA, Fogassi L, Gallese V, Rizzolatti G (2002) Hearing sounds, understanding actions: action representation in mirror neurons. Science 297(5582):846–848
Kröger BJ, Kannampuzha J, Neuschaefer-Rube C (2009) Towards a neurocomputational model of speech production and perception. Speech Commun 51(9):793–809
Kuhl PK, Miller JD (1975) Speech perception by the chinchilla: voiced-voiceless distinction in alveolar plosive consonants. Science 190:69–72
Liberman AM, Mattingly IG (1985) The motor theory of speech perception revised. Cognition 21:1–36
Liberman AM, Whalen DH (2000) On the relation of speech to language. Trends Cogn Sci 4(5):187–196
Lindblom B (1990) Explaining phonetic variation: a sketch of the H &H theory. Speech Prod Speech Model 55:403–439
Lotto AJ, Hickok GS, Holt LL (2009) Reflections on mirror neurons and speech perception. Trends Cogn Sci 13(3):110–114
Lotto AJ, Kluender KR, Holt LL (1997) Perceptual compensation for coarticulation by Japanese quail (Coturnix coturnix japonica). J Acoust Soc Am 102(2 Pt 1):1134–1140
Luria AR (1973) The working brain. Penguin Books, Harmondsworth
MacNeilage PF (1998) The frame/content theory of evolution of speech production. Behav Brain Sci 21:499–546
MacNeilage PF, Davis BL (2005) The frame/content theory of evolution of speech: comparison with a gestural origins theory. Interact Stud Soc Behav Commun Biol Artif Syst 6:173–199
Massaro DW, Chen TH (2008) The motor theory of speech perception revisited. Psychon Bull Rev 15(2):453–457; discussion 458–462
Meltzoff AN, Moore MK (1977) Imitation of facial and manual gestures by human neonates. Science 198:75–78
Moineau S, Dronkers NF, Bates E (2005) Exploring the processing continuum of single-word comprehension in aphasia. J Speech Lang Hear Res 48(4):884–896
Moulin-Frier C, Laurent R, Bessière P, Schwartz J-L, Diard J (2012) Adverse conditions improve distinguishability of auditory, motor and percep-tuo-motor theories of speech perception: an exploratory Bayesian modeling study. Lang Cogn Process 27:1240–1263 (7–8 Special Issue: Speech Recognition in Adverse Conditions) doi:10.1080/01690965.2011.645313
Norris D, McQueen JM, Cutler A (2003) Perceptual learning in speech. Cogn Psychol 47(2):204–238
Oztop E, Arbib MA (2002) Schema design and implementation of the grasp-related mirror neuron system. Biol Cybern 87(2):116–140
Oztop E, Bradley NS, Arbib MA (2004) Infant grasp learning: a computational model. Exp Brain Res 158(4):480–503
Pierrehumbert J (2002) Word-specific phonetics. Lab Phonol 7:101–139
Pinto J, Szoke I (2008) Fast approximate spoken term detection from sequence of phonemes. The 31st annual international ACM SIGIR conference 20–24 July 2008, Singapore
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Pro IEEE 77(2):257–286
Rauschecker JP (1998) Parallel processing in the auditory cortex of primates. Audiol Neurootol 3:86–103
Rauschecker JP, Tian B (2000) Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proc Natl Acad Sci 97(22):11800–11806. doi:10.1073/pnas.97.22.11800
Rizzolatti G, Arbib M (1998) Language within our grasp. Trends Neurosci 21:188–194
Rizzolatti G, Craighero L (2004) The mirror-neuron system. Annu Rev Neurosci 27:169–192
Rizzolatti G, Fadiga L, Gallese V, Fogassi L (1996) Premotor cortex and the recognition of motor actions. Cogn Brain Res 3:131–141
Sato M, Baciu M, Lœvenbruck H, Schwartz JL, Cathiard MA, Segebarth C, Abry C (2004) Multistable representation of speech forms: a functional MRI study of verbal transformations. NeuroImage 23(3):1143–1151
Schwartz J-L, Boë L-J, Abry C (2007) Linking dispersion-focalization theory and the maximum utilization of the available distinctive features principle in a perception-for-action-control theory. Oxford University Press, Oxford
Schwartz J-L, Basirat A, Ménard L, Sato M (2012) The perception-for-action-control theory (PACT): a perceptuo-motor theory of speech perception. J Neurolinguistics 25(5):336–354
Skipper JI, Goldin-Meadow S, Nusbaum HC, Small SL (2007) Speech-associated gestures, Broca’s area, and the human mirror system. Brain Lang 101(3):260–277
Studdert-Kennedy M, Goldstein L (2003) Launching language: the gestural origin of discrete infinity. Stud Evol Lang 3:235–254
Umiltà MA, Escola L, Intskirveli I, Grammont F, Rochat M, Caruana F, Jezzini A, Gallese V, Rizzolatti G (2008) When pliers become fingers in the monkey motor system. Proc Natl Acad Sci USA 105(6):2209–2213
Ungerleider LG, Mishkin M (1982) Two cortical visual systems. In: Ingle DJ, Goodale MA, Mansfield RJW (eds) Analysis of visual behavior. The MIT Press, Cambridge
van Wassenhove V, Grant KW, Poeppel D (2005) Visual speech speeds up the neural processing of auditory speech. Proc Natl Acad Sci USA 102(4):1181–1186
Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2):260–269
Weinberger HS (2010) The speech accent archive. George Mason University http://accent.gmu.edu/index.php
Whalen DH, Noiray A, Iskarous K, Bolanos L (2009) Relative contribution of jaw and tongue to the vowel height dimension in American English. J Acoust Soc Am 125(4):2698–2698
Wilson M (1988) MRC psycholinguistic database: machine-usable dictionary, version 2.00. Behav Res Methods Instrum Comput 20:6–10
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Moulin-Frier, C., Arbib, M.A. Recognizing speech in a novel accent: the motor theory of speech perception reframed. Biol Cybern 107, 421–447 (2013). https://doi.org/10.1007/s00422-013-0557-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00422-013-0557-3