×

Learning perceptually grounded word meanings from unaligned parallel data. (English) Zbl 1319.68196

Summary: In order for robots to effectively understand natural language commands, they must be able to acquire meaning representations that can be mapped to perceptual features in the external world. Previous approaches to learning these grounded meaning representations require detailed annotations at training time. In this paper, we present an approach to grounded language acquisition which is capable of jointly learning a policy for following natural language commands such as “Pick up the tire pallet,” as well as a mapping between specific phrases in the language and aspects of the external world; for example the mapping between the words “the tire pallet” and a specific object in the environment. Our approach assumes a parametric form for the policy that the robot uses to choose actions in response to a natural language command that factors based on the structure of the language. We use a gradient method to optimize model parameters. Our evaluation demonstrates the effectiveness of the model on a corpus of commands given to a robotic forklift by untrained users.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T40 Artificial intelligence for robotics
68T50 Natural language processing

Software:

MALLET

References:

[1] Branavan, S. R. K.; Chen, H.; Zettlemoyer, L. S.; Barzilay, R., Reinforcement learning for mapping instructions to actions, 82-90 (2009)
[2] Chen, D. L.; Mooney, R. J., Learning to interpret natural language navigation instructions from observations (2011)
[3] Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. The Journal of Artificial Intelligence Research, 34(1), 1-25. · Zbl 1182.68161
[4] Clarke, J.; Goldwasser, D.; Chang, M.; Roth, D., Driving semantic parsing from the world’s response, 18-27 (2010), New York
[5] Marneffe, M.; MacCartney, B.; Manning, C., Generating typed dependency parses from phrase structure parses, Genoa, Italy
[6] Dzifcak, J.; Scheutz, M.; Baral, C.; Schermerhorn, P., What to do and how to do it: translating natural language directives into temporal and dynamic logic representation for goal management and action execution, 4163-4168 (2009)
[7] Ekvall, S., & Kragic, D. (2008). Robot learning from demonstration: a task-level planning approach. International Journal of Advanced Robotic Systems, 5(3).
[8] Hsiao, K., Tellex, S., Vosoughi, S., Kubat, R., & Roy, D. (2008). Object schemas for grounding language in a responsive robot. Connection Science, 20(4), 253-276. · doi:10.1080/09540090802445113
[9] Jackendoff, R. S. (1983). Semantics and cognition (pp. 161-187). Cambridge: MIT Press.
[10] Kollar, T.; Tellex, S.; Roy, D.; Roy, N., Toward understanding natural language directions, 259-266 (2010)
[11] Kruger, V., Kragic, D., Ude, A., & Geib, C. (2007). The meaning of action: a review on action recognition and mapping. Advanced Robotics, 21(13).
[12] Kwiatkowski, T.; Zettlemoyer, L.; Goldwater, S.; Steedman, M., Inducing probabilistic ccg grammars from logical form with higher-order unification, 1223-1233 (2010), New York
[13] Liang, P.; Jordan, M. I.; Klein, D., Learning dependency-based compositional semantics (2011)
[14] MacMahon, M.; Stankiewicz, B.; Kuipers, B., Walk the talk: connecting language, knowledge, and action in route instructions, 1475-1482 (2006)
[15] Matuszek, C.; Fox, D.; Koscher, K., Following directions using statistical machine translation, 251-258 (2010)
[16] Matuszek, C., FitzGerald, N., Zettlemoyer, L., Bo, L., & Fox, D. (2012a). A joint model of language and perception for grounded attribute learning. arXiv:1206.6423.
[17] Matuszek, C.; Herbst, E.; Zettlemoyer, L.; Fox, D., Learning to parse natural language commands to a robot control system (2012)
[18] Mavridis, N.; Roy, D., Grounded situation models for robots: where words and percepts meet, 4690-4697 (2006), New York
[19] McCallum, A. K. (2002). MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu.
[20] Piantadosi, S.; Goodman, N.; Ellis, B.; Tenenbaum, J., A Bayesian model of the acquisition of compositional semantics (2008)
[21] Poon, H.; Domingos, P., Unsupervised semantic parsing, No. 1, 1-10 (2009), New York
[22] Rybski, P.; Yoon, K.; Stolarz, J.; Veloso, M., Interactive robot task training through dialog and demonstration, 56 (2007), New York
[23] Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 358.
[24] Skubic, M., Perzanowski, D., Blisard, S., Schultz, A., Adams, W., Bugajska, M., & Brock, D. (2004). Spatial language for human-robot dialogs. IEEE Transactions on Systems, Man and Cybernetics. Part C, Applications and Reviews, 34(2), 154-167. · doi:10.1109/TSMCC.2004.826273
[25] Tellex, S.; Kollar, T.; Dickerson, S.; Walter, M.; Banerjee, A.; Teller, S.; Roy, N., Understanding natural language commands for robotic navigation and mobile manipulation (2011)
[26] Tellex, S.; Thaker, P.; Deits, R.; Kollar, T.; Roy, N., Toward information theoretic human-robot dialog, Sydney, Australia, July 2012
[27] Thompson, C. A., & Mooney, R. J. (2003). Acquiring word-meaning mappings for natural language interfaces. The Journal of Artificial Intelligence Research, 18, 1-44. · Zbl 1045.68139
[28] Vogel, A.; Jurafsky, D., Learning to follow navigational directions, 806-814 (2010)
[29] Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language. PhD thesis, Massachusetts Institute of Technology.
[30] Wong, Y.; Mooney, R., Learning synchronous grammars for semantic parsing with lambda calculus, No. 45, 960 (2007)
[31] Zettlemoyer, L. S.; Collins, M., Learning to map sentences to logical form: structured classification with probabilistic categorial grammars, 658-666 (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.