-
Using deep reinforcement learning to promote sustainable human behaviour on a common pool resource problem
Authors:
Raphael Koster,
Miruna Pîslar,
Andrea Tacchetti,
Jan Balaguer,
Leqi Liu,
Romuald Elie,
Oliver P. Hauser,
Karl Tuyls,
Matt Botvinick,
Christopher Summerfield
Abstract:
A canonical social dilemma arises when finite resources are allocated to a group of people, who can choose to either reciprocate with interest, or keep the proceeds for themselves. What resource allocation mechanisms will encourage levels of reciprocation that sustain the commons? Here, in an iterated multiplayer trust game, we use deep reinforcement learning (RL) to design an allocation mechanism…
▽ More
A canonical social dilemma arises when finite resources are allocated to a group of people, who can choose to either reciprocate with interest, or keep the proceeds for themselves. What resource allocation mechanisms will encourage levels of reciprocation that sustain the commons? Here, in an iterated multiplayer trust game, we use deep reinforcement learning (RL) to design an allocation mechanism that endogenously promotes sustainable contributions from human participants to a common pool resource. We first trained neural networks to behave like human players, creating a stimulated economy that allowed us to study how different mechanisms influenced the dynamics of receipt and reciprocation. We then used RL to train a social planner to maximise aggregate return to players. The social planner discovered a redistributive policy that led to a large surplus and an inclusive economy, in which players made roughly equal gains. The RL agent increased human surplus over baseline mechanisms based on unrestricted welfare or conditional cooperation, by conditioning its generosity on available resources and temporarily sanctioning defectors by allocating fewer resources to them. Examining the AI policy allowed us to develop an explainable mechanism that performed similarly and was more popular among players. Deep reinforcement learning can be used to discover mechanisms that promote sustainable human behaviour.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Authors:
Viorica Pătrăucean,
Lucas Smaira,
Ankush Gupta,
Adrià Recasens Continente,
Larisa Markeeva,
Dylan Banarse,
Skanda Koppula,
Joseph Heyward,
Mateusz Malinowski,
Yi Yang,
Carl Doersch,
Tatiana Matejovicova,
Yury Sulsky,
Antoine Miech,
Alex Frechette,
Hanna Klimczak,
Raphael Koster,
Junlin Zhang,
Stephanie Winkler,
Yusuf Aytar,
Simon Osindero,
Dima Damen,
Andrew Zisserman,
João Carreira
Abstract:
We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e.g. classification, detection or tracking), the Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning…
▽ More
We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e.g. classification, detection or tracking), the Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities, to provide a comprehensive and efficient evaluation tool. The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime. For these purposes, the Perception Test introduces 11.6k real-world videos, 23s average length, designed to show perceptually interesting situations, filmed by around 100 participants worldwide. The videos are densely annotated with six types of labels (multiple-choice and grounded video question-answers, object and point tracks, temporal action and sound segments), enabling both language and non-language evaluations. The fine-tuning and validation splits of the benchmark are publicly available (CC-BY license), in addition to a challenge server with a held-out test split. Human baseline results compared to state-of-the-art video QA models show a substantial gap in performance (91.4% vs 46.2%), suggesting that there is significant room for improvement in multimodal video understanding.
Dataset, baseline code, and challenge server are available at https://github.com/deepmind/perception_test
△ Less
Submitted 30 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Melting Pot 2.0
Authors:
John P. Agapiou,
Alexander Sasha Vezhnevets,
Edgar A. Duéñez-Guzmán,
Jayd Matyas,
Yiran Mao,
Peter Sunehag,
Raphael Köster,
Udari Madhushani,
Kavya Kopparapu,
Ramona Comanescu,
DJ Strouse,
Michael B. Johanson,
Sukhdeep Singh,
Julia Haas,
Igor Mordatch,
Dean Mobbs,
Joel Z. Leibo
Abstract:
Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures ge…
▽ More
Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a "substrate") with a reference set of co-players (a "background population"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.
△ Less
Submitted 30 October, 2023; v1 submitted 24 November, 2022;
originally announced November 2022.
-
The Good Shepherd: An Oracle Agent for Mechanism Design
Authors:
Jan Balaguer,
Raphael Koster,
Christopher Summerfield,
Andrea Tacchetti
Abstract:
From social networks to traffic routing, artificial learning agents are playing a central role in modern institutions. We must therefore understand how to leverage these systems to foster outcomes and behaviors that align with our own values and aspirations. While multiagent learning has received considerable attention in recent years, artificial agents have been primarily evaluated when interacti…
▽ More
From social networks to traffic routing, artificial learning agents are playing a central role in modern institutions. We must therefore understand how to leverage these systems to foster outcomes and behaviors that align with our own values and aspirations. While multiagent learning has received considerable attention in recent years, artificial agents have been primarily evaluated when interacting with fixed, non-learning co-players. While this evaluation scheme has merit, it fails to capture the dynamics faced by institutions that must deal with adaptive and continually learning constituents. Here we address this limitation, and construct agents ("mechanisms") that perform well when evaluated over the learning trajectory of their adaptive co-players ("participants"). The algorithm we propose consists of two nested learning loops: an inner loop where participants learn to best respond to fixed mechanisms; and an outer loop where the mechanism agent updates its policy based on experience. We report the performance of our mechanism agents when paired with both artificial learning agents and humans as co-players. Our results show that our mechanisms are able to shepherd the participants strategies towards favorable outcomes, indicating a path for modern institutions to effectively and automatically influence the strategies and behaviors of their constituents.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
HCMD-zero: Learning Value Aligned Mechanisms from Data
Authors:
Jan Balaguer,
Raphael Koster,
Ari Weinstein,
Lucy Campbell-Gillingham,
Christopher Summerfield,
Matthew Botvinick,
Andrea Tacchetti
Abstract:
Artificial learning agents are mediating a larger and larger number of interactions among humans, firms, and organizations, and the intersection between mechanism design and machine learning has been heavily investigated in recent years. However, mechanism design methods often make strong assumptions on how participants behave (e.g. rationality), on the kind of knowledge designers have access to a…
▽ More
Artificial learning agents are mediating a larger and larger number of interactions among humans, firms, and organizations, and the intersection between mechanism design and machine learning has been heavily investigated in recent years. However, mechanism design methods often make strong assumptions on how participants behave (e.g. rationality), on the kind of knowledge designers have access to a priori (e.g. access to strong baseline mechanisms), or on what the goal of the mechanism should be (e.g. total welfare). Here we introduce HCMD-zero, a general purpose method to construct mechanisms making none of these three assumptions. HCMD-zero learns to mediate interactions among participants and adjusts the mechanism parameters to make itself more likely to be preferred by participants. It does so by remaining engaged in an electoral contest with copies of itself, thereby accessing direct feedback from participants. We test our method on a stylized resource allocation game that highlights the tension between productivity, equality and the temptation to free ride. HCMD-zero produces a mechanism that is preferred by human participants over a strong baseline, it does so automatically, without requiring prior knowledge, and using human behavioral trajectories sparingly and effectively. Our analysis shows HCMD-zero consistently makes the mechanism policy more and more likely to be preferred by human participants over the course of training, and that it results in a mechanism with an interpretable and intuitive policy.
△ Less
Submitted 20 May, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
Human-centered mechanism design with Democratic AI
Authors:
Raphael Koster,
Jan Balaguer,
Andrea Tacchetti,
Ari Weinstein,
Tina Zhu,
Oliver Hauser,
Duncan Williams,
Lucy Campbell-Gillingham,
Phoebe Thacker,
Matthew Botvinick,
Christopher Summerfield
Abstract:
Building artificial intelligence (AI) that aligns with human values is an unsolved problem. Here, we developed a human-in-the-loop research pipeline called Democratic AI, in which reinforcement learning is used to design a social mechanism that humans prefer by majority. A large group of humans played an online investment game that involved deciding whether to keep a monetary endowment or to share…
▽ More
Building artificial intelligence (AI) that aligns with human values is an unsolved problem. Here, we developed a human-in-the-loop research pipeline called Democratic AI, in which reinforcement learning is used to design a social mechanism that humans prefer by majority. A large group of humans played an online investment game that involved deciding whether to keep a monetary endowment or to share it with others for collective benefit. Shared revenue was returned to players under two different redistribution mechanisms, one designed by the AI and the other by humans. The AI discovered a mechanism that redressed initial wealth imbalance, sanctioned free riders, and successfully won the majority vote. By optimizing for human preferences, Democratic AI may be a promising method for value-aligned policy innovation.
△ Less
Submitted 27 January, 2022;
originally announced January 2022.
-
Role of Human-AI Interaction in Selective Prediction
Authors:
Elizabeth Bondi,
Raphael Koster,
Hannah Sheahan,
Martin Chadwick,
Yoram Bachrach,
Taylan Cemgil,
Ulrich Paquet,
Krishnamurthy Dvijotham
Abstract:
Recent work has shown the potential benefit of selective prediction systems that can learn to defer to a human when the predictions of the AI are unreliable, particularly to improve the reliability of AI systems in high-stakes applications like healthcare or conservation. However, most prior work assumes that human behavior remains unchanged when they solve a prediction task as part of a human-AI…
▽ More
Recent work has shown the potential benefit of selective prediction systems that can learn to defer to a human when the predictions of the AI are unreliable, particularly to improve the reliability of AI systems in high-stakes applications like healthcare or conservation. However, most prior work assumes that human behavior remains unchanged when they solve a prediction task as part of a human-AI team as opposed to by themselves. We show that this is not the case by performing experiments to quantify human-AI interaction in the context of selective prediction. In particular, we study the impact of communicating different types of information to humans about the AI system's decision to defer. Using real-world conservation data and a selective prediction system that improves expected accuracy over that of the human or AI system working individually, we show that this messaging has a significant impact on the accuracy of human judgements. Our results study two components of the messaging strategy: 1) Whether humans are informed about the prediction of the AI system and 2) Whether they are informed about the decision of the selective prediction system to defer. By manipulating these messaging components, we show that it is possible to significantly boost human performance by informing the human of the decision to defer, but not revealing the prediction of the AI. We therefore show that it is vital to consider how the decision to defer is communicated to a human when designing selective prediction systems, and that the composite accuracy of a human-AI team must be carefully evaluated using a human-in-the-loop framework.
△ Less
Submitted 16 May, 2022; v1 submitted 13 December, 2021;
originally announced December 2021.
-
Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot
Authors:
Joel Z. Leibo,
Edgar Duéñez-Guzmán,
Alexander Sasha Vezhnevets,
John P. Agapiou,
Peter Sunehag,
Raphael Koster,
Jayd Matyas,
Charles Beattie,
Igor Mordatch,
Thore Graepel
Abstract:
Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite that fills this gap, and uses reinforcement learning to reduce the human labor required to create novel test scenarios. This works because one agent's b…
▽ More
Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite that fills this gap, and uses reinforcement learning to reduce the human labor required to create novel test scenarios. This works because one agent's behavior constitutes (part of) another agent's environment. To demonstrate scalability, we have created over 80 unique test scenarios covering a broad range of research topics such as social dilemmas, reciprocity, resource sharing, and task partitioning. We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings
Authors:
Eugene Vinitsky,
Raphael Köster,
John P. Agapiou,
Edgar Duéñez-Guzmán,
Alexander Sasha Vezhnevets,
Joel Z. Leibo
Abstract:
Society is characterized by the presence of a variety of social norms: collective patterns of sanctioning that can prevent miscoordination and free-riding. Inspired by this, we aim to construct learning dynamics where potentially beneficial social norms can emerge. Since social norms are underpinned by sanctioning, we introduce a training regime where agents can access all sanctioning events but l…
▽ More
Society is characterized by the presence of a variety of social norms: collective patterns of sanctioning that can prevent miscoordination and free-riding. Inspired by this, we aim to construct learning dynamics where potentially beneficial social norms can emerge. Since social norms are underpinned by sanctioning, we introduce a training regime where agents can access all sanctioning events but learning is otherwise decentralized. This setting is technologically interesting because sanctioning events may be the only available public signal in decentralized multi-agent systems where reward or policy-sharing is infeasible or undesirable. To achieve collective action in this setting we construct an agent architecture containing a classifier module that categorizes observed behaviors as approved or disapproved, and a motivation to punish in accord with the group. We show that social norms emerge in multi-agent systems containing this agent and investigate the conditions under which this helps them achieve socially beneficial outcomes.
△ Less
Submitted 27 September, 2022; v1 submitted 16 June, 2021;
originally announced June 2021.
-
A multi-agent reinforcement learning model of reputation and cooperation in human groups
Authors:
Kevin R. McKee,
Edward Hughes,
Tina O. Zhu,
Martin J. Chadwick,
Raphael Koster,
Antonio Garcia Castaneda,
Charlie Beattie,
Thore Graepel,
Matt Botvinick,
Joel Z. Leibo
Abstract:
Collective action demands that individuals efficiently coordinate how much, where, and when to cooperate. Laboratory experiments have extensively explored the first part of this process, demonstrating that a variety of social-cognitive mechanisms influence how much individuals choose to invest in group efforts. However, experimental research has been unable to shed light on how social cognitive me…
▽ More
Collective action demands that individuals efficiently coordinate how much, where, and when to cooperate. Laboratory experiments have extensively explored the first part of this process, demonstrating that a variety of social-cognitive mechanisms influence how much individuals choose to invest in group efforts. However, experimental research has been unable to shed light on how social cognitive mechanisms contribute to the where and when of collective action. We build and test a computational model of human behavior in Clean Up, a social dilemma task popular in multi-agent reinforcement learning research. We show that human groups effectively cooperate in Clean Up when they can identify group members and track reputations over time, but fail to organize under conditions of anonymity. A multi-agent reinforcement learning model of reputation demonstrates the same difference in cooperation under conditions of identifiability and anonymity. In addition, the model accurately predicts spatial and temporal patterns of group behavior: in this public goods dilemma, the intrinsic motivation for reputation catalyzes the development of a non-territorial, turn-taking strategy to coordinate collective action.
△ Less
Submitted 22 February, 2023; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences
Authors:
Raphael Köster,
Kevin R. McKee,
Richard Everett,
Laura Weidinger,
William S. Isaac,
Edward Hughes,
Edgar A. Duéñez-Guzmán,
Thore Graepel,
Matthew Botvinick,
Joel Z. Leibo
Abstract:
Game theoretic views of convention generally rest on notions of common knowledge and hyper-rational models of individual behavior. However, decades of work in behavioral economics have questioned the validity of both foundations. Meanwhile, computational neuroscience has contributed a modernized 'dual process' account of decision-making where model-free (MF) reinforcement learning trades off with…
▽ More
Game theoretic views of convention generally rest on notions of common knowledge and hyper-rational models of individual behavior. However, decades of work in behavioral economics have questioned the validity of both foundations. Meanwhile, computational neuroscience has contributed a modernized 'dual process' account of decision-making where model-free (MF) reinforcement learning trades off with model-based (MB) reinforcement learning. The former captures habitual and procedural learning while the latter captures choices taken via explicit planning and deduction. Some conventions (e.g. international treaties) are likely supported by cognition that resonates with the game theoretic and MB accounts. However, convention formation may also occur via MF mechanisms like habit learning; though this possibility has been understudied. Here, we demonstrate that complex, large-scale conventions can emerge from MF learning mechanisms. This suggests that some conventions may be supported by habit-like cognition rather than explicit reasoning. We apply MF multi-agent reinforcement learning to a temporo-spatially extended game with incomplete information. In this game, large parts of the state space are reachable only by collective action. However, heterogeneity of tastes makes such coordinated action difficult: multiple equilibria are desirable for all players, but subgroups prefer a particular equilibrium over all others. This creates a coordination problem that can be solved by establishing a convention. We investigate start-up and free rider subproblems as well as the effects of group size, intensity of intrinsic preference, and salience on the emergence dynamics of coordination conventions. Results of our simulations show agents establish and switch between conventions, even working against their own preferred outcome when doing so is necessary for effective coordination.
△ Less
Submitted 14 December, 2020; v1 submitted 18 October, 2020;
originally announced October 2020.
-
MEMO: A Deep Network for Flexible Combination of Episodic Memories
Authors:
Andrea Banino,
Adrià Puigdomènech Badia,
Raphael Köster,
Martin J. Chadwick,
Vinicius Zambaldi,
Demis Hassabis,
Caswell Barry,
Matthew Botvinick,
Dharshan Kumaran,
Charles Blundell
Abstract:
Recent research developing neural network architectures with external memory have often used the benchmark bAbI question and answering dataset which provides a challenging number of tasks requiring reasoning. Here we employed a classic associative inference task from the memory-based reasoning neuroscience literature in order to more carefully probe the reasoning capacity of existing memory-augmen…
▽ More
Recent research developing neural network architectures with external memory have often used the benchmark bAbI question and answering dataset which provides a challenging number of tasks requiring reasoning. Here we employed a classic associative inference task from the memory-based reasoning neuroscience literature in order to more carefully probe the reasoning capacity of existing memory-augmented architectures. This task is thought to capture the essence of reasoning -- the appreciation of distant relationships among elements distributed across multiple facts or memories. Surprisingly, we found that current architectures struggle to reason over long distance associations. Similar results were obtained on a more complex task involving finding the shortest path between nodes in a path. We therefore developed MEMO, an architecture endowed with the capacity to reason over longer distances. This was accomplished with the addition of two novel components. First, it introduces a separation between memories (facts) stored in external memory and the items that comprise these facts in external memory. Second, it makes use of an adaptive retrieval mechanism, allowing a variable number of "memory hops" before the answer is produced. MEMO is capable of solving our novel reasoning tasks, as well as match state of the art results in bAbI.
△ Less
Submitted 29 January, 2020;
originally announced January 2020.
-
Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors
Authors:
Raphael Köster,
Dylan Hadfield-Menell,
Gillian K. Hadfield,
Joel Z. Leibo
Abstract:
How can societies learn to enforce and comply with social norms? Here we investigate the learning dynamics and emergence of compliance and enforcement of social norms in a foraging game, implemented in a multi-agent reinforcement learning setting. In this spatiotemporally extended game, individuals are incentivized to implement complex berry-foraging policies and punish transgressions against soci…
▽ More
How can societies learn to enforce and comply with social norms? Here we investigate the learning dynamics and emergence of compliance and enforcement of social norms in a foraging game, implemented in a multi-agent reinforcement learning setting. In this spatiotemporally extended game, individuals are incentivized to implement complex berry-foraging policies and punish transgressions against social taboos covering specific berry types. We show that agents benefit when eating poisonous berries is taboo, meaning the behavior is punished by other agents, as this helps overcome a credit-assignment problem in discovering delayed health effects. Critically, however, we also show that introducing an additional taboo, which results in punishment for eating a harmless berry, improves the rate and stability with which agents learn to punish taboo violations and comply with taboos. Counterintuitively, our results show that an arbitrary taboo (a "silly rule") can enhance social learning dynamics and achieve better outcomes in the middle stages of learning. We discuss the results in the context of studying normativity as a group-level emergent phenomenon.
△ Less
Submitted 25 January, 2020;
originally announced January 2020.
-
Correlated Insulating and Superconducting States in Twisted Bilayer Graphene Below the Magic Angle
Authors:
Emilio Codecido,
Qiyue Wang,
Ryan Koester,
Shi Che,
Haidong Tian,
Rui Lv,
Son Tran,
Kenji Watanabe,
Takashi Taniguchi,
Fan Zhang,
Marc Bockrath,
Chun Ning Lau
Abstract:
The emergence of flat bands and correlated behaviors in 'magic angle' twisted bilayer graphene (tBLG) has sparked tremendous interest, though many aspects of the system are under intense debate. Here we report observation of both superconductivity and the Mott-like insulating state in a tBLG device with a twist angle of approximately 0.93, which is smaller than the magic angle by 15%. At an electr…
▽ More
The emergence of flat bands and correlated behaviors in 'magic angle' twisted bilayer graphene (tBLG) has sparked tremendous interest, though many aspects of the system are under intense debate. Here we report observation of both superconductivity and the Mott-like insulating state in a tBLG device with a twist angle of approximately 0.93, which is smaller than the magic angle by 15%. At an electron concentration of +/-5 electrons per moire unit cell, we observe a narrow resistance peak with an activation energy gap of approximately 0.1 meV, indicating the existence of an additional correlated insulating state. This is consistent with theory predicting the presence of a high-energy band with an energetically flat dispersion. At a doping of +/-12 electrons per moire unit cell we observe a resistance peak due to the presence of Dirac points in the spectrum. Our results reveal that the magic range of tBLG is in fact larger than what is previously expected, and provide a wealth of new information to help decipher the strongly correlated phenomena observed in tBLG.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Inequity aversion improves cooperation in intertemporal social dilemmas
Authors:
Edward Hughes,
Joel Z. Leibo,
Matthew G. Phillips,
Karl Tuyls,
Edgar A. Duéñez-Guzmán,
Antonio García Castañeda,
Iain Dunning,
Tina Zhu,
Kevin R. McKee,
Raphael Koster,
Heather Roff,
Thore Graepel
Abstract:
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However…
▽ More
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.
△ Less
Submitted 27 September, 2018; v1 submitted 23 March, 2018;
originally announced March 2018.
-
Decision Rules for Robotic Mobile Fulfillment Systems
Authors:
Marius Merschformann,
Tim Lamballais,
René de Koster,
Leena Suhl
Abstract:
The Robotic Mobile Fulfillment Systems (RMFS) is a new type of robotized, parts-to-picker material handling system, designed especially for e-commerce warehouses. Robots bring movable shelves, called pods, to workstations where inventory is put on or removed from the pods. This paper simulates both the pick and replenishment process and studies the order assignment, pod selection and pod storage a…
▽ More
The Robotic Mobile Fulfillment Systems (RMFS) is a new type of robotized, parts-to-picker material handling system, designed especially for e-commerce warehouses. Robots bring movable shelves, called pods, to workstations where inventory is put on or removed from the pods. This paper simulates both the pick and replenishment process and studies the order assignment, pod selection and pod storage assignment problems by evaluating multiple decision rules per problem. The discrete event simulation uses realistic robot movements and keeps track of every unit of inventory on every pod. We analyze seven performance measures, e.g. throughput capacity and order due time, and find that the unit throughput is strongly correlated with the other performance measures. We vary the number of robots, the number of pick stations, the number of SKUs (stock keeping units), the order size and whether returns need processing or not. The decision rules for pick order assignment have a strong impact on the unit throughput rate. This is not the case for replenishment order assignment, pod selection and pod storage. Furthermore, for warehouses with a large number of SKUs, more robots are needed for a high unit throughput rate, even if the number of pods and the dimensions of the storage area remain the same. Lastly, processing return orders only affects the unit throughput rate for warehouse with a large number of SKUs and large pick orders.
△ Less
Submitted 20 January, 2018;
originally announced January 2018.
-
Observing the carbon-climate system
Authors:
David Schimel,
Piers Sellers,
Berrien Moore III,
Abhishek Chatterjee,
David Baker,
Joe Berry,
Kevin Bowman,
Phillipe Ciais David Crisp,
Sean Crowell,
Scott Denning,
Riley Duren,
Pierre Friedlingstein,
Michelle Gierach,
Kevin Gurney,
Kathy Hibbard,
Richard A Houghton,
Deborah Huntzinger,
George Hurtt,
Ken Jucks,
Randy Kawa,
Randy Koster,
Charles Koven,
Yiqi Luo,
Jeff Masek,
Galen McKinley
, et al. (19 additional authors not shown)
Abstract:
Increases in atmospheric CO2 and CH4 result from a combination of forcing from anthropogenic emissions and Earth System feedbacks that reduce or amplify the effects of those emissions on atmospheric concentrations. Despite decades of research carbon-climate feedbacks remain poorly quantified. The impact of these uncertainties on future climate are of increasing concern, especially in the wake of r…
▽ More
Increases in atmospheric CO2 and CH4 result from a combination of forcing from anthropogenic emissions and Earth System feedbacks that reduce or amplify the effects of those emissions on atmospheric concentrations. Despite decades of research carbon-climate feedbacks remain poorly quantified. The impact of these uncertainties on future climate are of increasing concern, especially in the wake of recent climate negotiations. Emissions, long concentrated in the developed world, are now shifting to developing countries, where the emissions inventories have larger uncertainties. The fraction of anthropogenic CO2 remaining in the atmosphere has remained remarkably constant over the last 50 years. Will this change in the future as the climate evolves? Concentrations of CH4, the 2nd most important greenhouse gas, which had apparently stabilized, have recently resumed their increase, but the exact cause for this is unknown. While greenhouse gases affect the global atmosphere, their sources and sinks are remarkably heterogeneous in time and space, and traditional in situ observing systems do not provide the coverage and resolution to attribute the changes to these greenhouse gases to specific sources or sinks. In the past few years, space-based technologies have shown promise for monitoring carbon stocks and fluxes. Advanced versions of these capabilities could transform our understanding and provide the data needed to quantify carbon-climate feedbacks. A new observing system that allows resolving global high resolution fluxes will capture variations on time and space scales that allow the attribution of these fluxes to underlying mechanisms.
△ Less
Submitted 7 April, 2016;
originally announced April 2016.
-
A no-go for no-go theorems prohibiting cosmic acceleration in extra dimensional models
Authors:
Rik Koster,
Marieke Postma
Abstract:
A four-dimensional effective theory that arises as the low-energy limit of some extra-dimensional model is constrained by the higher dimensional Einstein equations. Steinhardt & Wesley use this to show that accelerated expansion in our four large dimensions can only be transient in a large class of Kaluza-Klein models that satisfy the (higher dimensional) null energy condition [1]. We point out th…
▽ More
A four-dimensional effective theory that arises as the low-energy limit of some extra-dimensional model is constrained by the higher dimensional Einstein equations. Steinhardt & Wesley use this to show that accelerated expansion in our four large dimensions can only be transient in a large class of Kaluza-Klein models that satisfy the (higher dimensional) null energy condition [1]. We point out that these no-go theorems are based on a rather ad-hoc assumption on the metric, without which no strong statements can be made.
△ Less
Submitted 28 November, 2011; v1 submitted 7 October, 2011;
originally announced October 2011.
-
Coherent Diffraction Imaging of Single 95nm Nanowires
Authors:
V. Favre-Nicolin,
J. Eymery,
R. K. Koster,
P. Gentile
Abstract:
Photonic or electronic confinement effects in nanostructures become significant when one of their dimension is in the 5-300 nm range. Improving their development requires the ability to study their structure - shape, strain field, interdiffusion maps - using novel techniques. We have used coherent diffraction imaging to record the 3-dimensionnal scattered intensity of single silicon nanowires wi…
▽ More
Photonic or electronic confinement effects in nanostructures become significant when one of their dimension is in the 5-300 nm range. Improving their development requires the ability to study their structure - shape, strain field, interdiffusion maps - using novel techniques. We have used coherent diffraction imaging to record the 3-dimensionnal scattered intensity of single silicon nanowires with a lateral size smaller than 100 nm. We show that this intensity can be used to recover the hexagonal shape of the nanowire with a 28nm resolution. The article also discusses limits of the method in terms of radiation damage.
△ Less
Submitted 18 December, 2008; v1 submitted 12 September, 2008;
originally announced September 2008.