Skip to main content

Showing 1–12 of 12 results for author: D'Oro, P

  1. arXiv:2406.00244  [pdf, other

    cs.CL

    Controlling Large Language Model Agents with Entropic Activation Steering

    Authors: Nate Rahn, Pierluca D'Oro, Marc G. Bellemare

    Abstract: The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment. But how do LLM agents explore, and how can we control their exploratory behaviors? To answer these questions, we take a representation-level perspec… ▽ More

    Submitted 10 October, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  2. arXiv:2405.04342  [pdf, other

    cs.LG

    The Curse of Diversity in Ensemble-Based Exploration

    Authors: Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin, Aaron Courville

    Abstract: We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated d… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICLR 2024

  3. arXiv:2403.07688  [pdf, other

    cs.LG cs.AI

    Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

    Authors: Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin

    Abstract: When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios. In this paper, we reassess this phenomenon, focusing on sparsity a… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  4. arXiv:2402.05290  [pdf, other

    cs.LG cs.AI

    Do Transformer World Models Give Better Policy Gradients?

    Authors: Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon

    Abstract: A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long… ▽ More

    Submitted 10 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Michel Ma and Pierluca D'Oro contributed equally

  5. arXiv:2310.00166  [pdf, other

    cs.AI cs.LG

    Motif: Intrinsic Motivation from Artificial Intelligence Feedback

    Authors: Martin Klissarov, Pierluca D'Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff

    Abstract: Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging. In this paper, we propose Motif, a general method to interface such prior knowledge from a Large Language Model (LLM) with an agent. Motif is based on the idea of grounding LLMs for decision-making without requiring them to interact with the environment: it elicits preferences from an LLM ove… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: The first two authors equally contributed - order decided by coin flip

  6. arXiv:2309.14597  [pdf, other

    cs.LG

    Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

    Authors: Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare

    Abstract: Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy param… ▽ More

    Submitted 10 April, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023 Accepted Paper. The first two authors contributed equally

  7. arXiv:2205.07802  [pdf, other

    cs.LG cs.AI stat.ML

    The Primacy Bias in Deep Reinforcement Learning

    Authors: Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

    Abstract: This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effec… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: ICML 2022; code at https://github.com/evgenii-nikishin/rl_with_resets

  8. arXiv:2012.08225  [pdf, other

    cs.LG cs.AI stat.ML

    Policy Optimization as Online Learning with Mediator Feedback

    Authors: Alberto Maria Metelli, Matteo Papini, Pierluca D'Oro, Marcello Restelli

    Abstract: Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space. The additional available information, compared to the standard bandit feedback, allows reusing samples generated by one policy to estimate the performance of other policies. Based on t… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

  9. arXiv:2004.14309  [pdf, other

    cs.AI cs.LG stat.ML

    How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

    Authors: Pierluca D'Oro, Wojciech Jaśkowski

    Abstract: Deterministic-policy actor-critic algorithms for continuous control improve the actor by plugging its actions into the critic and ascending the action-value gradient, which is obtained by chaining the actor's Jacobian matrix with the gradient of the critic with respect to input actions. However, instead of gradients, the critic is, typically, only trained to accurately predict expected returns, wh… ▽ More

    Submitted 22 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

  10. arXiv:2004.03156  [pdf, other

    cs.CV cs.NE

    Real-time Classification from Short Event-Camera Streams using Input-filtering Neural ODEs

    Authors: Giorgio Giannone, Asha Anoosheh, Alessio Quaglino, Pierluca D'Oro, Marco Gallieri, Jonathan Masci

    Abstract: Event-based cameras are novel, efficient sensors inspired by the human vision system, generating an asynchronous, pixel-wise stream of data. Learning from such data is generally performed through heavy preprocessing and event integration into images. This requires buffering of possibly long sequences and can limit the response time of the inference system. In this work, we instead propose to direc… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

    Comments: Submitted to ICML 2020

  11. arXiv:1909.04115  [pdf, other

    cs.LG cs.AI stat.ML

    Gradient-Aware Model-based Policy Search

    Authors: Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli

    Abstract: Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of th… ▽ More

    Submitted 20 November, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

  12. arXiv:1803.09092  [pdf, other

    cs.CV

    Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos

    Authors: C. Spampinato, S. Palazzo, P. D'Oro, D. Giordano, M. Shah

    Abstract: Human behavior understanding in videos is a complex, still unsolved problem and requires to accurately model motion at both the local (pixel-wise dense prediction) and global (aggregation of motion cues) levels. Current approaches based on supervised learning require large amounts of annotated data, whose scarce availability is one of the main limiting factors to the development of general solutio… ▽ More

    Submitted 17 September, 2019; v1 submitted 24 March, 2018; originally announced March 2018.