Skip to main content

Showing 1–6 of 6 results for author: Fickinger, A

  1. arXiv:2110.03684  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Cross-Domain Imitation Learning via Optimal Transport

    Authors: Arnaud Fickinger, Samuel Cohen, Stuart Russell, Brandon Amos

    Abstract: Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology. Comparing trajectories and stationary distributions between the expert and imitation agents is challenging because they live on different systems that may not even have the same dimensionality. We propose Gromov-Wasserstein Imitation Lear… ▽ More

    Submitted 25 April, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: ICLR 2022

  2. arXiv:2109.15316  [pdf, other

    cs.AI

    Scalable Online Planning via Reinforcement Learning Fine-Tuning

    Authors: Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown

    Abstract: Lookahead search has been a critical component of recent AI successes, such as in the games of chess, go, and poker. However, the search methods used in these games, and in many other settings, are tabular. Tabular search methods do not scale well with the size of the search space, and this problem is exacerbated by stochasticity and partial observability. In this work we replace tabular search wi… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

  3. arXiv:2107.07394  [pdf, other

    cs.LG cs.AI

    Explore and Control with Adversarial Surprise

    Authors: Arnaud Fickinger, Natasha Jaques, Samyak Parajuli, Michael Chang, Nicholas Rhinehart, Glen Berseth, Stuart Russell, Sergey Levine

    Abstract: Unsupervised reinforcement learning (RL) studies how to leverage environment statistics to learn useful behaviors without the cost of reward engineering. However, a central challenge in unsupervised RL is to extract behaviors that meaningfully affect the world and cover the range of possible outcomes, without getting distracted by inherently unpredictable, uncontrollable, and stochastic elements i… ▽ More

    Submitted 28 December, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

  4. arXiv:2012.14536  [pdf, other

    cs.GT cs.AI

    Multi-Principal Assistance Games: Definition and Collegial Mechanisms

    Authors: Arnaud Fickinger, Simon Zhuang, Andrew Critch, Dylan Hadfield-Menell, Stuart Russell

    Abstract: We introduce the concept of a multi-principal assistance game (MPAG), and circumvent an obstacle in social choice theory, Gibbard's theorem, by using a sufficiently collegial preference inference mechanism. In an MPAG, a single agent assists N human principals who may have widely different preferences. MPAGs generalize assistance games, also known as cooperative inverse reinforcement learning game… ▽ More

    Submitted 28 December, 2020; originally announced December 2020.

    Comments: arXiv admin note: text overlap with arXiv:2007.09540

  5. arXiv:2007.09540  [pdf, other

    cs.AI

    Multi-Principal Assistance Games

    Authors: Arnaud Fickinger, Simon Zhuang, Dylan Hadfield-Menell, Stuart Russell

    Abstract: Assistance games (also known as cooperative inverse reinforcement learning games) have been proposed as a model for beneficial AI, wherein a robotic agent must act on behalf of a human principal but is initially uncertain about the humans payoff function. This paper studies multi-principal assistance games, which cover the more general case in which the robot acts on behalf of N humans who may hav… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

  6. arXiv:1902.03517  [pdf, ps, other

    cs.LG stat.ML

    Biadversarial Variational Autoencoder

    Authors: Arnaud Fickinger

    Abstract: In the original version of the Variational Autoencoder, Kingma et al. assume Gaussian distributions for the approximate posterior during the inference and for the output during the generative process. This assumptions are good for computational reasons, e.g. we can easily optimize the parameters of a neural network using the reparametrization trick and the KL divergence between two Gaussians can b… ▽ More

    Submitted 12 February, 2019; v1 submitted 9 February, 2019; originally announced February 2019.