×

Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. (English) Zbl 1100.92013

Summary: The brain’s most difficult computation in decision-making learning is searching for essential information related to rewards among vast multimodal inputs and then integrating it into beneficial behaviors. Contextual cues consisting of limbic, cognitive, visual, auditory, somatosensory, and motor signals need to be associated with both rewards and actions by utilizing an internal representation such as reward prediction and reward prediction error. Previous studies have suggested that a suitable brain structure for such integration is the neural circuitry associated with multiple cortico-striatal loops. However, computational exploration still remains into how the information in and around these multiple closed loops can be shared and transferred.
We propose a “heterarchical reinforcement learning” model, where reward prediction made by more limbic and cognitive loops is propagated to motor loops by spiral projections between the striatum and substantia nigra, assisted by cortical projections to the pedunculopontine tegmental nucleus, which sends excitatory input to the substantia nigra. The model makes several fMRI-testable predictions of brain activity during stimulus-action-reward association learning. The caudate nucleus and the cognitive cortical areas are correlated with reward prediction error, while the putamen and motor-related areas are correlated with stimulus-action-dependent reward prediction. Furthermore, a heterogeneous activity pattern within the striatum is predicted depending on learning difficulty, i.e., the anterior medial caudate nucleus will be correlated more with reward prediction error when learning becomes difficult, while the posterior putamen will be correlated more with stimulus-action-dependent reward prediction in easy learning. Our fMRI results revealed that different cortico-striatal loops are operating as suggested by the proposed model.

MSC:

92C20 Neural biology
91E10 Cognitive psychology
91E40 Memory and learning in psychology
Full Text: DOI

References:

[1] Alexander, G. E.; Crutcher, M. D.; Delong, M. R., Basal ganglia thalamocortical circuits: Parallel substrates for motor, oculomotor, “prefrontal” and “limbic” functions, Progress in Brain Research, 85, 119-146 (1990)
[2] Barraclough, D. J.; Conroy, M. L.; Lee, D., Prefrontal cortex and decision making in a mixed-strategy game, Nature Neuroscience, 7, 404-410 (2004)
[3] Barto, A. G.; Sutton, R. S.; Anderson, C. W., Neuron-like elements that can solve difficult learning control problems, IEEE Transactions on Systems Man and Cybernetics, 13, 835-846 (1983)
[4] Bertsekas, D. P.; Tsitsiklis, J. N., Neuro-dynamic programming (1996), Athena Scientific · Zbl 0924.68163
[5] Brown, J.; Bullock, D.; Grossberg, S., How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues, Journal of Neuroscience, 19, 10502-10511 (1999)
[6] Daw, N. D.; Niv, Y.; Dayan, P., Uncertainty-based competition between prefrontal and dorsolateral striatal system for behavioral control, Nature Neuroscience, 8, 1704-1711 (2005)
[7] Dayan, P.; Hinton, G. E., Feudal reinforcement learning, (Hanson, S. J.; Cowan, J. D.; Giles, C. L., Advances in neural information processing systems, Vol. 5 (1993)), 271-278 · Zbl 0876.68090
[8] Friston, K. J.; Holmes, A. P.; Worsley, K.; Poline, J. B.; Frith, C.; Frackowiak, R. S.J., Statistical parametric maps in functional brain imaging: A general linear approach, Human Brain Mapping, 2, 189-210 (1995)
[9] Gerardin, E.; Lehericy, S.; Pochon, J. B.; Tezenas du Montcel, S.; Mangin, J. F.; Poupon, F., Foot, hand, face and eye representation in the human striatum, Cerebral Cortex, 13, 162-169 (2003)
[10] Haber, S. N.; Fudge, J. L.; McFarland, N. R., Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum, Journal of Neuroscience, 20, 2369-2382 (2000)
[11] Haber, S. N., The primate basal ganglia: Parallel and integrative networks, Journal of Chemical Neuroanatomy, 26, 317-330 (2003)
[12] Haruno, M.; Wolpert, D. M.; Kawato, M., Mosaic model for sensorimotor learning and control, Neural Computation, 13, 2201-2220 (2001) · Zbl 0984.68151
[13] Haruno, M.; Kuroda, T.; Doya, K.; Toyama, K.; Kimura, M.; Samejima, K., A neural correlate of reward-based behavioral learning in caudate nucleus: A functional magnetic resonance imaging study of a stochastic decision task, Journal of Neuroscience, 24, 1660-1665 (2004)
[14] Haruno, M.; Kawato, M., Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning, Journal of Neurophysiology, 92, 948-959 (2006)
[15] Hikosaka, O.; Nakahara, H.; Rand, M. K.; Sakai, K.; Lu, X.; Nakamura, K., Parallel neural networks for learning sequential procedures, Trends in Neurosciences, 22, 464-471 (1999)
[16] Hikosaka, O.; Nakamura, K.; Sakai, K.; Nakahara, H., Central mechanisms of motor skill learning, Current Opinion in Neurobiology, 12, 217-222 (2002)
[17] Hollerman, J. R.; Schultz, W., Dopamine neurons report an error in the temporal prediction of reward during learning, Nature Neuroscience, 4, 304-309 (1998)
[18] Houk, J. C.; Adams, J. L.; Barto, A. G., (Houk, J. C.; Davis, J. L.; Beiser, D. G., Models of information processing in the basal ganglia (1995), MIT Press), 249-270
[19] Kobayashi, Y.; Inoue, Y.; Yamamoto, M.; Isa, T.; Aizawa, H., Contribution of pedunculopontine tegmental nucleus neurons to performance of visually guided saccade tasks in monkeys, Journal of Neurophysiology, 88, 715-731 (2002)
[20] Kobayashi, Y., Okada, K., Inoue, Y., Yamamoto, M., & Isa, T. (2005). Reward predicting activity of pedunculopontine tegmental nucleus neurons during visually guided saccade tasks. In Abstract of 35th annual meeting of society for neuroscience; Kobayashi, Y., Okada, K., Inoue, Y., Yamamoto, M., & Isa, T. (2005). Reward predicting activity of pedunculopontine tegmental nucleus neurons during visually guided saccade tasks. In Abstract of 35th annual meeting of society for neuroscience
[21] Middleton, F. A.; Strick, P. L., Basal ganglia and cerebellar loops: motor and cognitive circuits, Brain Research Brain Research Reviews, 31, 236-250 (2000)
[22] Miyachi, S.; Hikosaka, O.; Miyashita, K.; Karadi, Z.; Rand, M. K., Differential roles of monkey striatum in learning of sequential hand movement, Experimental Brain Research, 115, 1-5 (1997)
[23] Miyachi, S.; Hikosaka, O.; Lu, X., Differential activation of monkey striatal neurons in the early and late stages of procedural learning, Experimental Brain Research, 146, 122-126 (2002)
[24] Montague, P. R.; Dayan, P.; Sejnowski, T., A framework for mesencephalic dopamine systems based on predictive Hebbian learning, Journal of Neuroscience, 16, 1936-1947 (1996)
[25] Morimoto, J.; Doya, K., Hierarchical reinforcement learning for motion learning: learning “stand-up” trajectories, Advanced Robotics, 13, 267-268 (1999)
[26] Oakman, S. A.; Faris, P. L.; Kerr, P. E.; Cozzari, C.; Hartman, B. K., Distribution of pontomesencephalic cholinergic neurons projecting to substantia nigra differs significantly from those projecting to ventral tegmental area, Journal of Neuroscience, 15, 5859-5869 (1995)
[27] O’Doherty, J.; Dayan, P.; Schultz, J.; Deichmann, R.; Friston, K.; Dolan, R. J., Dissociable roles of ventral and dorsal striatum in instrumental conditioning, Science, 304, 452-454 (2004)
[28] Parthsarathy, H. B.; Schall, J. D.; Graybiel, A. M., Distributed but convergent ordering of corticostriatal projections: Analysis of the frontal eye field and the supplementary eye field in the macaque monkey, Journal of Neuroscience, 12, 4468-4488 (1992)
[29] Picard, N.; Strick, P. L., Motor areas of the medial wall: a review of their location and functional activation, Cerebral Cortex, 6, 342-353 (1996)
[30] Picard, N.; Strick, P. L., Imaging the premotor areas, Current Opinion in Neurobiology, 11, 663-672 (2001)
[31] Price, J. L.; Carmichael, S. T.; Drevets, W. C., Networks related to the orbital and medial prefrontal cortex; a substrate for emotional behavior?, Progress in Brain Research, 107, 523-536 (1996)
[32] Schultz, W.; Apicella, P.; Scarnati, E.; Ljungberg, T., Neuronal activity in monkey ventral striatum related to the expectation of reward, Journal of Neuroscience, 12, 4595-4610 (1992)
[33] Schultz, W.; Dickinson, A., Neuronal coding of prediction errors, Annual Review of Neuroscience, 23, 473-500 (2000)
[34] Selemon, L. D.; Goldman-Rakic, P. S., Longitudinal topography and interdigitation of corticostriatal projections in the rhesus monkey, Journal of Neuroscience, 5, 776-794 (1985)
[35] Singh, S. P. (1992) Reinforcement learning with a hierarchy of an abstract models. In Proceedings of the tenth national conference on artificial intelligence; Singh, S. P. (1992) Reinforcement learning with a hierarchy of an abstract models. In Proceedings of the tenth national conference on artificial intelligence
[36] Sutton, R. S.; Barto, A. G., Reinforcement learning (1998), The MIT Press
[37] Sutton, R. S.; Singh, S.; Precup, D.; Ravindran, B., Improved switching among temporally abstract actions, (Kearns, M. S.; Solla, S. A.; Cohn, D. A., Advances in neural information processing systems, Vol. 11 (1999)), 1066-1072
[38] Takada, M.; Tokuno, H.; Nambu, A.; Inase, M., Corticostriatal projections from the somatic motor areas of the frontal cortex in the macaque monkey: Segregation versus overlap of input zones from the primary motor cortex, the supplementary motor area, and the premotor cortex, Experimental Brain Research, 120, 114-128 (1998)
[39] Takikawa, Y.; Kawagoe, R.; Hikosaka, O., A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping, Journal of Neurophysiology, 92, 2520-2529 (2004)
[40] Talairach, J., & Tournoux, P. (1998). Co-planar stereotaxic atlas of the human brain; Talairach, J., & Tournoux, P. (1998). Co-planar stereotaxic atlas of the human brain
[41] Young, P., Recursive estimation and time series (1984), Springer-Verlag · Zbl 0544.62081
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.