×

Recognizing speech in a novel accent: the motor theory of speech perception reframed. (English) Zbl 1294.92011

Summary: The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener’s native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.

MSC:

92C20 Neural biology
91F20 Linguistics
68T10 Pattern recognition, speech recognition

References:

[1] Adda-Decker M (2001) Towards multilingual interoperability in automatic speech recognition. Speech Commun 35(1):5-20 · Zbl 0983.68817 · doi:10.1016/S0167-6393(00)00092-3
[2] Arbib MA (2005) Interweaving protosign and protospeech: further developments beyond the mirror. Interact Stud Soc Behav Commun Biol Artif Syst 6:145-171 · doi:10.1075/is.6.2.02arb
[3] Arbib MA (2006) Aphasia, apraxia and the evolution of the language-ready brain. Aphasiology 20:1-30
[4] Arbib, MA; Stemmer, B. (ed.); Whitaker, H. (ed.), Mirror neurons & language, 237-246 (2008), Amsterdam · doi:10.1016/B978-0-08-045352-1.00023-9
[5] Arbib MA (2010) Mirror system activity for action and language is embedded in the integration of dorsal & ventral pathways. Brain and Language 112:12-24 · doi:10.1016/j.bandl.2009.10.001
[6] Arbib MA (2012) How the brain got language: the mirror system hypothesis. Oxford University Press, New York
[7] Arbib MA, Rizzolatti G (1997) Neural expectations: a possible evolutionary path from manual skills to language. Commun Cogn 29:393-424
[8] Association IP (1999) The handbook of the international phonetic association. Cambridge University Press, Cambridge
[9] Bahl LR, Jelinek F (1975) Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition. IEEE Trans Inf Theory 21(4):404-411 · Zbl 0309.94037 · doi:10.1109/TIT.1975.1055419
[10] Barrett AM, Foundas AL, Heilman KM (2005) Speech and gesture are mediated by independent systems. Behav Brain Sci 28:125-126 · doi:10.1017/S0140525X05220034
[11] Basirat A, Sato M, Schwartz J-L, Kahane P, Lachaux J-P (2008) Parieto-frontal gamma band activity during the perceptual emergence of speech forms. NeuroImage 42(1):404-413 · doi:10.1016/j.neuroimage.2008.03.063
[12] Best C, McRoberts G, Goodell E (2001) Discrimination of non-native consonant contrasts varying in perceptual assimilation to the listener’s native phonological system. J Acoust Soc Am 109(2):775-794 · doi:10.1121/1.1332378
[13] Bonaiuto JB, Arbib MA (2010) Extending the mirror neuron system model, II: what did I just do? A new role for mirror neurons. Biol Cybern 102:341-359 · Zbl 1266.92016 · doi:10.1007/s00422-010-0371-0
[14] Bonaiuto JB, Rosta E, Arbib MA (2007) Extending the mirror neuron system model, I: audible actions and invisible grasps. Biol Cybern 96:9-38 · Zbl 1118.92008 · doi:10.1007/s00422-006-0110-8
[15] Bradlow AR, Bent T (2008) Perceptual adaptation to non-native speech. Cognition 106(2):707 · doi:10.1016/j.cognition.2007.04.005
[16] Brown GD (1984) A frequency count of 190,000 words in the London-Lund Corpus of English conversation. Behav Res Methods 16(6):502-532 · doi:10.3758/BF03200836
[17] Buccino G, Lui F, Canessa N, Patteri I, Lagravinese G, Benuzzi F, Porro CA, Rizzolatti G (2004) Neural circuits involved in the recognition of actions performed by nonconspecifics: an FMRI study. J Cogn Neurosci 16(1):114-126 · doi:10.1162/089892904322755601
[18] Eisner F, McQueen JM (2005) The specificity of perceptual learning in speech processing. Atten Percept Psychophys 67(2):224-238 · doi:10.3758/BF03206487
[19] Fagg AH, Arbib MA (1998) Modeling parietal-premotor interactions in primate control of grasping. Neural Netw 11(7-8):1277-1303 · doi:10.1016/S0893-6080(98)00047-1
[20] Ferrari PF, Gallese V, Rizzolatti G, Fogassi L (2003) Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex. Eur J Neurosci 17(8):1703-1714 · doi:10.1046/j.1460-9568.2003.02601.x
[21] Ferrari PF, Rozzi S, Fogassi L (2005) Mirror neurons responding to observation of actions made with tools in monkey ventral premotor cortex. J Cogn Neurosci 17(2):212-226 · doi:10.1162/0898929053124910
[22] Ferrari PF, Visalberghi E, Paukner A, Fogassi L, Ruggiero A, Suomi SJ (2006) Neonatal imitation in rhesus macaques. PLoS Biol 4(9):e302 · Zbl 1104.92313
[23] Francis A, Baldwin K, Nusbaum H (2000) Effects of training on attention to acoustic cues. Percept Psychophys 62(8):1668-1680. doi:10.3758/BF03212164 · doi:10.3758/BF03212164
[24] Francis AL, Nusbaum HC (2002) Selective attention and the acquisition of new phonetic categories. J Exp Psychol Hum Percept Perform 28(2):349-366 · doi:10.1037/0096-1523.28.2.349
[25] Galantucci B, Fowler CA, Turvey MT (2006) The motor theory of speech perception reviewed. Psychon Bull Rev 13(3):361-377 · doi:10.3758/BF03193857
[26] Gales M, Young S (2007) The application of hidden Markov models in speech recognition. Found Trends in Signal Process 1: 195-304 · Zbl 1145.68045
[27] Gallese, V.; Fogassi, L.; Fadiga, L.; Rizzolatti, G.; Prinz, W. (ed.); Hommel, B. (ed.), Action representation and the inferior parietal lobule (2002), Oxford
[28] Goldinger SD (1998) Echoes of echoes? An episodic theory of lexical access. Psychol Rev 105(2):251 · doi:10.1037/0033-295X.105.2.251
[29] Goldstein, L.; Byrd, D.; Saltzman, E.; Arbib, MA (ed.), The role of vocal tract gestural action units in understanding the evolution of phonology, 215-249 (2006), Cambridge · doi:10.1017/CBO9780511541599.008
[30] Goldstone RL (1998) Perceptual learning. Annu Rev Psychol 49(1):585-612 · doi:10.1146/annurev.psych.49.1.585
[31] Goodale MA, Milner AD (1992) Separate visual pathways for perception and action. Trends Neurosci 15:20-25 · doi:10.1016/0166-2236(92)90344-8
[32] Grossberg S (2003) Resonant neural dynamics of speech perception. J Phon 31(3):423-445 · doi:10.1016/S0095-4470(03)00051-2
[33] Guenther FH, Ghosh SS, Tourville JA (2006) Neural modeling and imaging of the cortical interactions underlying syllable production. Brain Lang 96(3):280-301 · doi:10.1016/j.bandl.2005.06.001
[34] Hawkins S (2003) Roles and representations of systematic fine phonetic detail in speech understanding. J Phon 31(3):373-405 · doi:10.1016/j.wocn.2003.09.006
[35] Hickok G (2009) The functional neuroanatomy of language. Phys Life Rev 6:121-143 · doi:10.1016/j.plrev.2009.06.001
[36] Hickok G, Poeppel D (2004) Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92(1-2):67-99 · doi:10.1016/j.cognition.2003.10.011
[37] Hickok G, Poeppel D (2009) Motor influence of speech perception: the view from Grenoble. Talking brains news and views on the neural organization of language (Blog moderated by Greg Hickok and David Poeppel) http://talkingbrains.blogspot.com/2009/2004/motor-influence-of-speech-perception.html
[38] Hintzman DL (1986) Schema abstraction in a multiple-trace memory model. Psychol Rev 93:411-428 · doi:10.1037/0033-295X.93.4.411
[39] Jaynes ET (2003) Probability theory: the logic of science. Cambridge university press, Cambridge · Zbl 1045.62001 · doi:10.1017/CBO9780511790423
[40] Kirchhoff K (1998) Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments. In: Proceedings of ICSLP, Citeseer, pp 891-894
[41] Klatt DH (1979) Speech perception: a model of acoustic-phonetic analysis and lexical access. J Phon 7(312):1-26
[42] Kohler E, Keysers C, Umilta MA, Fogassi L, Gallese V, Rizzolatti G (2002) Hearing sounds, understanding actions: action representation in mirror neurons. Science 297(5582):846-848 · doi:10.1126/science.1070311
[43] Kröger BJ, Kannampuzha J, Neuschaefer-Rube C (2009) Towards a neurocomputational model of speech production and perception. Speech Commun 51(9):793-809 · doi:10.1016/j.specom.2008.08.002
[44] Kuhl PK, Miller JD (1975) Speech perception by the chinchilla: voiced-voiceless distinction in alveolar plosive consonants. Science 190:69-72 · doi:10.1126/science.1166301
[45] Liberman AM, Mattingly IG (1985) The motor theory of speech perception revised. Cognition 21:1-36 · doi:10.1016/0010-0277(85)90021-6
[46] Liberman AM, Whalen DH (2000) On the relation of speech to language. Trends Cogn Sci 4(5):187-196 · doi:10.1016/S1364-6613(00)01471-6
[47] Lindblom B (1990) Explaining phonetic variation: a sketch of the H &H theory. Speech Prod Speech Model 55:403-439 · doi:10.1007/978-94-009-2037-8_16
[48] Lotto AJ, Hickok GS, Holt LL (2009) Reflections on mirror neurons and speech perception. Trends Cogn Sci 13(3):110-114 · doi:10.1016/j.tics.2008.11.008
[49] Lotto AJ, Kluender KR, Holt LL (1997) Perceptual compensation for coarticulation by Japanese quail (Coturnix coturnix japonica). J Acoust Soc Am 102(2 Pt 1):1134-1140 · doi:10.1121/1.419865
[50] Luria AR (1973) The working brain. Penguin Books, Harmondsworth
[51] MacNeilage PF (1998) The frame/content theory of evolution of speech production. Behav Brain Sci 21:499-546
[52] MacNeilage PF, Davis BL (2005) The frame/content theory of evolution of speech: comparison with a gestural origins theory. Interact Stud Soc Behav Commun Biol Artif Syst 6:173-199 · doi:10.1075/is.6.2.03mac
[53] Massaro DW, Chen TH (2008) The motor theory of speech perception revisited. Psychon Bull Rev 15(2):453-457; discussion 458-462
[54] Meltzoff AN, Moore MK (1977) Imitation of facial and manual gestures by human neonates. Science 198:75-78 · doi:10.1126/science.198.4312.75
[55] Moineau S, Dronkers NF, Bates E (2005) Exploring the processing continuum of single-word comprehension in aphasia. J Speech Lang Hear Res 48(4):884-896 · doi:10.1044/1092-4388(2005/061)
[56] Moulin-Frier C, Laurent R, Bessière P, Schwartz J-L, Diard J (2012) Adverse conditions improve distinguishability of auditory, motor and percep-tuo-motor theories of speech perception: an exploratory Bayesian modeling study. Lang Cogn Process 27:1240-1263 (7-8 Special Issue: Speech Recognition in Adverse Conditions) doi:10.1080/01690965.2011.645313
[57] Norris D, McQueen JM, Cutler A (2003) Perceptual learning in speech. Cogn Psychol 47(2):204-238 · doi:10.1016/S0010-0285(03)00006-9
[58] Oztop E, Arbib MA (2002) Schema design and implementation of the grasp-related mirror neuron system. Biol Cybern 87(2):116-140 · Zbl 1104.92313 · doi:10.1007/s00422-002-0318-1
[59] Oztop E, Bradley NS, Arbib MA (2004) Infant grasp learning: a computational model. Exp Brain Res 158(4):480-503 · doi:10.1007/s00221-004-1914-1
[60] Pierrehumbert J (2002) Word-specific phonetics. Lab Phonol 7:101-139
[61] Pinto J, Szoke I (2008) Fast approximate spoken term detection from sequence of phonemes. The 31st annual international ACM SIGIR conference 20-24 July 2008, Singapore
[62] Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Pro IEEE 77(2):257-286 · doi:10.1109/5.18626
[63] Rauschecker JP (1998) Parallel processing in the auditory cortex of primates. Audiol Neurootol 3:86-103 · doi:10.1159/000013784
[64] Rauschecker JP, Tian B (2000) Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proc Natl Acad Sci 97(22):11800-11806. doi:10.1073/pnas.97.22.11800 · doi:10.1073/pnas.97.22.11800
[65] Rizzolatti G, Arbib M (1998) Language within our grasp. Trends Neurosci 21:188-194 · doi:10.1016/S0166-2236(98)01260-0
[66] Rizzolatti G, Craighero L (2004) The mirror-neuron system. Annu Rev Neurosci 27:169-192 · doi:10.1146/annurev.neuro.27.070203.144230
[67] Rizzolatti G, Fadiga L, Gallese V, Fogassi L (1996) Premotor cortex and the recognition of motor actions. Cogn Brain Res 3:131-141 · doi:10.1016/0926-6410(95)00038-0
[68] Sato M, Baciu M, Lœvenbruck H, Schwartz JL, Cathiard MA, Segebarth C, Abry C (2004) Multistable representation of speech forms: a functional MRI study of verbal transformations. NeuroImage 23(3):1143-1151 · doi:10.1016/j.neuroimage.2004.07.055
[69] Schwartz J-L, Boë L-J, Abry C (2007) Linking dispersion-focalization theory and the maximum utilization of the available distinctive features principle in a perception-for-action-control theory. Oxford University Press, Oxford
[70] Schwartz J-L, Basirat A, Ménard L, Sato M (2012) The perception-for-action-control theory (PACT): a perceptuo-motor theory of speech perception. J Neurolinguistics 25(5):336-354
[71] Skipper JI, Goldin-Meadow S, Nusbaum HC, Small SL (2007) Speech-associated gestures, Broca’s area, and the human mirror system. Brain Lang 101(3):260-277 · doi:10.1016/j.bandl.2007.02.008
[72] Studdert-Kennedy M, Goldstein L (2003) Launching language: the gestural origin of discrete infinity. Stud Evol Lang 3:235-254 · doi:10.1093/acprof:oso/9780199244843.003.0013
[73] Umiltà MA, Escola L, Intskirveli I, Grammont F, Rochat M, Caruana F, Jezzini A, Gallese V, Rizzolatti G (2008) When pliers become fingers in the monkey motor system. Proc Natl Acad Sci USA 105(6):2209-2213 · doi:10.1073/pnas.0705985105
[74] Ungerleider, LG; Mishkin, M.; Ingle, DJ (ed.); Goodale, MA (ed.); Mansfield, RJW (ed.), Two cortical visual systems (1982), Cambridge
[75] van Wassenhove V, Grant KW, Poeppel D (2005) Visual speech speeds up the neural processing of auditory speech. Proc Natl Acad Sci USA 102(4):1181-1186 · doi:10.1073/pnas.0408949102
[76] Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2):260-269 · Zbl 0148.40501 · doi:10.1109/TIT.1967.1054010
[77] Weinberger HS (2010) The speech accent archive. George Mason University http://accent.gmu.edu/index.php
[78] Whalen DH, Noiray A, Iskarous K, Bolanos L (2009) Relative contribution of jaw and tongue to the vowel height dimension in American English. J Acoust Soc Am 125(4):2698-2698
[79] Wilson M (1988) MRC psycholinguistic database: machine-usable dictionary, version 2.00. Behav Res Methods Instrum Comput 20:6-10 · doi:10.3758/BF03202594
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.