-
Implementing a Nordic-Baltic Federated Health Data Network: a case report
Authors:
Taridzo Chomutare,
Aleksandar Babic,
Laura-Maria Peltonen,
Silja Elunurm,
Peter Lundberg,
Arne Jönsson,
Emma Eneling,
Ciprian-Virgil Gerstenberger,
Troels Siggaard,
Raivo Kolde,
Oskar Jerdhaf,
Martin Hansson,
Alexandra Makhlysheva,
Miroslav Muzny,
Erik Ylipää,
Søren Brunak,
Hercules Dalianis
Abstract:
Background: Centralized collection and processing of healthcare data across national borders pose significant challenges, including privacy concerns, data heterogeneity and legal barriers. To address some of these challenges, we formed an interdisciplinary consortium to develop a feder-ated health data network, comprised of six institutions across five countries, to facilitate Nordic-Baltic cooper…
▽ More
Background: Centralized collection and processing of healthcare data across national borders pose significant challenges, including privacy concerns, data heterogeneity and legal barriers. To address some of these challenges, we formed an interdisciplinary consortium to develop a feder-ated health data network, comprised of six institutions across five countries, to facilitate Nordic-Baltic cooperation on secondary use of health data. The objective of this report is to offer early insights into our experiences developing this network. Methods: We used a mixed-method ap-proach, combining both experimental design and implementation science to evaluate the factors affecting the implementation of our network. Results: Technically, our experiments indicate that the network functions without significant performance degradation compared to centralized simu-lation. Conclusion: While use of interdisciplinary approaches holds a potential to solve challeng-es associated with establishing such collaborative networks, our findings turn the spotlight on the uncertain regulatory landscape playing catch up and the significant operational costs.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
Tractable Offline Learning of Regular Decision Processes
Authors:
Ahana Deb,
Roberto Cipollone,
Anders Jonsson,
Alessandro Ronca,
Mohammad Sadegh Talebi
Abstract:
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs). In RDPs, the unknown dependency of future observations and rewards from the past interactions can be captured by some hidden finite-state automaton. For this reason, many RDP algorithms first reconstruct this unknown dependency using automata learning techniques.…
▽ More
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs). In RDPs, the unknown dependency of future observations and rewards from the past interactions can be captured by some hidden finite-state automaton. For this reason, many RDP algorithms first reconstruct this unknown dependency using automata learning techniques. In this paper, we show that it is possible to overcome two strong limitations of previous offline RL algorithms for RDPs, notably RegORL. This can be accomplished via the introduction of two original techniques: the development of a new pseudometric based on formal languages, which removes a problematic dependency on $L_\infty^\mathsf{p}$-distinguishability parameters, and the adoption of Count-Min-Sketch (CMS), instead of naive counting. The former reduces the number of samples required in environments that are characterized by a low complexity in language-theoretic terms. The latter alleviates the memory requirements for long planning horizons. We derive the PAC sample complexity bounds associated to each of these techniques, and we validate the approach experimentally.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Hierarchical Average-Reward Linearly-solvable Markov Decision Processes
Authors:
Guillermo Infante,
Anders Jonsson,
Vicenç Gómez
Abstract:
We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes (LMDPs) in the infinite-horizon average-reward setting. Unlike previous work, our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks. Our method relies on partitions of the state space that create sma…
▽ More
We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes (LMDPs) in the infinite-horizon average-reward setting. Unlike previous work, our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks. Our method relies on partitions of the state space that create smaller subtasks that are easier to solve, and the equivalence between such partitions to learn more efficiently. We then exploit the compositionality of low-level tasks to exactly represent the value function of the high-level task. Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Performance Evaluation of MLO for XR Streaming: Can Wi-Fi 7 Meet the Expectations?
Authors:
Marc Carrascosa-Zamacois,
Lorenzo Galati-Giordano,
Francesc Wilhelmi,
Gianluca Fontanesi,
Anders Jonsson,
Giovanni Geraci,
Boris Bellalta
Abstract:
Extended Reality (XR) has stringent throughput and delay requirements that are hard to meet with current wireless technologies. Missing these requirements can lead to worsened picture quality, perceived lag between user input and corresponding output, and even dizziness for the end user. In this paper, we study the capability of upcoming Wi-Fi 7, and its novel support for Multi-Link Operation (MLO…
▽ More
Extended Reality (XR) has stringent throughput and delay requirements that are hard to meet with current wireless technologies. Missing these requirements can lead to worsened picture quality, perceived lag between user input and corresponding output, and even dizziness for the end user. In this paper, we study the capability of upcoming Wi-Fi 7, and its novel support for Multi-Link Operation (MLO), to cope with these tight requirements. Our study is based on simulation results extracted from an MLO-compliant simulator that realistically reproduces VR traffic. Results show that MLO can sustain VR applications. By jointly using multiple links with independent channel access procedures, MLO can reduce the overall delay, which is especially useful in the uplink, as it has more stringent requirements than the downlink, and is instrumental in delivering the expected performance. We show that using MLO can allow more users per network than an equivalent number of links using SLO. We also show that while maintaining the same overall bandwidth, a higher number of MLO links with narrow channels leads to lower delays than a lower number of links with wider channels.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently
Authors:
Sergio Calo,
Anders Jonsson,
Gergely Neu,
Ludovic Schwartz,
Javier Segovia-Aguas
Abstract:
We propose a new framework for formulating optimal transport distances between Markov chains. Previously known formulations studied couplings between the entire joint distribution induced by the chains, and derived solutions via a reduction to dynamic programming (DP) in an appropriately defined Markov decision process. This formulation has, however, not led to particularly efficient algorithms so…
▽ More
We propose a new framework for formulating optimal transport distances between Markov chains. Previously known formulations studied couplings between the entire joint distribution induced by the chains, and derived solutions via a reduction to dynamic programming (DP) in an appropriately defined Markov decision process. This formulation has, however, not led to particularly efficient algorithms so far, since computing the associated DP operators requires fully solving a static optimal transport problem, and these operators need to be applied numerous times during the overall optimization process. In this work, we develop an alternative perspective by considering couplings between a flattened version of the joint distributions that we call discounted occupancy couplings, and show that calculating optimal transport distances in the full space of joint distributions can be equivalently formulated as solving a linear program (LP) in this reduced space. This LP formulation allows us to port several algorithmic ideas from other areas of optimal transport theory. In particular, our formulation makes it possible to introduce an appropriate notion of entropy regularization into the optimization problem, which in turn enables us to directly calculate optimal transport distances via a Sinkhorn-like method we call Sinkhorn Value Iteration (SVI). We show both theoretically and empirically that this method converges quickly to an optimal coupling, essentially at the same computational cost of running vanilla Sinkhorn in each pair of states. Along the way, we point out that our optimal transport distance exactly matches the common notion of bisimulation metrics between Markov chains, and thus our results also apply to computing such metrics, and in fact our algorithm turns out to be significantly more efficient than the best known methods developed so far for this purpose.
△ Less
Submitted 11 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Planning with a Learned Policy Basis to Optimally Solve Complex Tasks
Authors:
Guillermo Infante,
David Kuric,
Anders Jonsson,
Vicenç Gómez,
Herke van Hoof
Abstract:
Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subprob…
▽ More
Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these (sub)policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine (sub)policies via planning, our method asymptotically attains global optimality, even in stochastic environments.
△ Less
Submitted 3 June, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Asymmetric Norms to Approximate the Minimum Action Distance
Authors:
Lorenzo Steccanella,
Anders Jonsson
Abstract:
This paper presents a state representation for reward-free Markov decision processes. The idea is to learn, in a self-supervised manner, an embedding space where distances between pairs of embedded states correspond to the minimum number of actions needed to transition between them. Unlike previous methods, our approach incorporates an asymmetric norm parametrization, enabling accurate approximati…
▽ More
This paper presents a state representation for reward-free Markov decision processes. The idea is to learn, in a self-supervised manner, an embedding space where distances between pairs of embedded states correspond to the minimum number of actions needed to transition between them. Unlike previous methods, our approach incorporates an asymmetric norm parametrization, enabling accurate approximations of minimum action distances in environments with inherent asymmetry. We show how this representation can be leveraged to learn goal-conditioned policies, providing a notion of similarity between states and goals and a useful heuristic distance to guide planning. To validate our approach, we conduct empirical experiments on both symmetric and asymmetric environments. Our results show that our asymmetric norm parametrization performs comparably to symmetric norms in symmetric environments and surpasses symmetric norms in asymmetric environments.
△ Less
Submitted 19 December, 2023; v1 submitted 15 December, 2023;
originally announced December 2023.
-
On optimal control of reflected diffusions
Authors:
Adam Jonsson
Abstract:
We study a simple singular control problem for a Brownian motion with constant drift and variance reflected at the origin. Exerting control pushes the process towards the origin and generates a concave increasing state-dependent yield which is discounted at a fixed rate. The most interesting feature of the problem is that its solution can be more complicated than anticipated. Indeed, for some para…
▽ More
We study a simple singular control problem for a Brownian motion with constant drift and variance reflected at the origin. Exerting control pushes the process towards the origin and generates a concave increasing state-dependent yield which is discounted at a fixed rate. The most interesting feature of the problem is that its solution can be more complicated than anticipated. Indeed, for some parameter values, the optimal policy involves two reflecting barriers and one repelling boundary, the action region being the union of two disjoint intervals. We also show that the apparent anomaly can be understood as involving a switch between two strategies with different risk profiles: The risk-neutral decision maker initially gambles on the more risky strategy, but lowers risk if this strategy underperforms.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Generating Semantic Graph Corpora with Graph Expansion Grammar
Authors:
Eric Andersson,
Johanna Björklund,
Frank Drewes,
Anna Jonsson
Abstract:
We introduce Lovelace, a tool for creating corpora of semantic graphs. The system uses graph expansion grammar as a representational language, thus allowing users to craft a grammar that describes a corpus with desired properties. When given such grammar as input, the system generates a set of output graphs that are well-formed according to the grammar, i.e., a graph bank. The generation process…
▽ More
We introduce Lovelace, a tool for creating corpora of semantic graphs. The system uses graph expansion grammar as a representational language, thus allowing users to craft a grammar that describes a corpus with desired properties. When given such grammar as input, the system generates a set of output graphs that are well-formed according to the grammar, i.e., a graph bank. The generation process can be controlled via a number of configurable parameters that allow the user to, for example, specify a range of desired output graph sizes. Central use cases are the creation of synthetic data to augment existing corpora, and as a pedagogical tool for teaching formal language theory.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
ACROCPoLis: A Descriptive Framework for Making Sense of Fairness
Authors:
Andrea Aler Tubella,
Dimitri Coelho Mollo,
Adam Dahlgren Lindström,
Hannah Devinney,
Virginia Dignum,
Petter Ericson,
Anna Jonsson,
Timotheus Kampik,
Tom Lenaerts,
Julian Alfredo Mendez,
Juan Carlos Nieves
Abstract:
Fairness is central to the ethical and responsible development and use of AI systems, with a large number of frameworks and formal notions of algorithmic fairness being available. However, many of the fairness solutions proposed revolve around technical considerations and not the needs of and consequences for the most impacted communities. We therefore want to take the focus away from definitions…
▽ More
Fairness is central to the ethical and responsible development and use of AI systems, with a large number of frameworks and formal notions of algorithmic fairness being available. However, many of the fairness solutions proposed revolve around technical considerations and not the needs of and consequences for the most impacted communities. We therefore want to take the focus away from definitions and allow for the inclusion of societal and relational aspects to represent how the effects of AI systems impact and are experienced by individuals and social groups. In this paper, we do this by means of proposing the ACROCPoLis framework to represent allocation processes with a modeling emphasis on fairness aspects. The framework provides a shared vocabulary in which the factors relevant to fairness assessments for different situations and procedures are made explicit, as well as their interrelationships. This enables us to compare analogous situations, to highlight the differences in dissimilar situations, and to capture differing interpretations of the same situation by different stakeholders.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Generalized Planning as Heuristic Search: A new planning search-space that leverages pointers over objects
Authors:
Javier Segovia-Aguas,
Sergio Jiménez,
Anders Jonsson
Abstract:
Planning as heuristic search is one of the most successful approaches to classical planning but unfortunately, it does not extend trivially to Generalized Planning (GP). GP aims to compute algorithmic solutions that are valid for a set of classical planning instances from a given domain, even if these instances differ in the number of objects, the number of state variables, their domain size, or t…
▽ More
Planning as heuristic search is one of the most successful approaches to classical planning but unfortunately, it does not extend trivially to Generalized Planning (GP). GP aims to compute algorithmic solutions that are valid for a set of classical planning instances from a given domain, even if these instances differ in the number of objects, the number of state variables, their domain size, or their initial and goal configuration. The generalization requirements of GP make it impractical to perform the state-space search that is usually implemented by heuristic planners. This paper adapts the planning as heuristic search paradigm to the generalization requirements of GP, and presents the first native heuristic search approach to GP. First, the paper introduces a new pointer-based solution space for GP that is independent of the number of classical planning instances in a GP problem and the size of those instances (i.e. the number of objects, state variables and their domain sizes). Second, the paper defines a set of evaluation and heuristic functions for guiding a combinatorial search in our new GP solution space. The computation of these evaluation and heuristic functions does not require grounding states or actions in advance. Therefore our GP as heuristic search approach can handle large sets of state variables with large numerical domains, e.g.~integers. Lastly, the paper defines an upgraded version of our novel algorithm for GP called Best-First Generalized Planning (BFGP), that implements a best-first search in our pointer-based solution space, and that is guided by our evaluation/heuristic functions for GP.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
Understanding Multi-link Operation in Wi-Fi 7: Performance, Anomalies, and Solutions
Authors:
Marc Carrascosa-Zamacois,
Giovanni Geraci,
Lorenzo Galati-Giordano,
Anders Jonsson,
Boris Bellalta
Abstract:
Will Wi-Fi 7, conceived to support extremely high throughput, also deliver consistently low delay? The best hope seems to lie in allowing next-generation devices to access multiple channels via multi-link operation (MLO). In this paper, we aim to advance the understanding of MLO, placing the spotlight on its packet delay performance. We show that MLO devices can take advantage of multiple contenti…
▽ More
Will Wi-Fi 7, conceived to support extremely high throughput, also deliver consistently low delay? The best hope seems to lie in allowing next-generation devices to access multiple channels via multi-link operation (MLO). In this paper, we aim to advance the understanding of MLO, placing the spotlight on its packet delay performance. We show that MLO devices can take advantage of multiple contention-free links to significantly reduce their transmission time, but also that they can occasionally starve one another and surprisingly incur a higher delay than that of a well planned legacy single link operation. We next examine and explain this anomaly, also putting forth practical workarounds to circumvent it. We conclude by pointing to other disruptive features that, if successfully paired with MLO, can usher in exciting and unprecedented opportunities for Wi-Fi 8.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Hierarchies of Reward Machines
Authors:
Daniel Furelos-Blanco,
Mark Law,
Anders Jonsson,
Krysia Broda,
Alessandra Russo
Abstract:
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine whose edges encode subgoals of the task using high-level events. The structure of RMs enables the decomposition of a task into simpler and independently solvable subtasks that help tackle long-horizon and/or sparse reward tasks. We propose a formalism fo…
▽ More
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine whose edges encode subgoals of the task using high-level events. The structure of RMs enables the decomposition of a task into simpler and independently solvable subtasks that help tackle long-horizon and/or sparse reward tasks. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs, thus composing a hierarchy of RMs (HRM). We exploit HRMs by treating each call to an RM as an independently solvable subtask using the options framework, and describe a curriculum-based method to learn HRMs from traces observed by the agent. Our experiments reveal that exploiting a handcrafted HRM leads to faster convergence than with a flat HRM, and that learning an HRM is feasible in cases where its equivalent flat representation is not.
△ Less
Submitted 4 June, 2023; v1 submitted 31 May, 2022;
originally announced May 2022.
-
Performance and Coexistence Evaluation of IEEE 802.11be Multi-link Operation
Authors:
Marc Carrascosa-Zamacois,
Lorenzo Galati-Giordano,
Anders Jonsson,
Giovanni Geraci,
Boris Bellalta
Abstract:
Wi-Fi 7 is already in the making, and Multi-Link Operation (MLO) is one of the main features proposed in its correspondent IEEE 802.11be amendment. MLO will allow devices to coordinate multiple radio interfaces to access separate channels through a single association, aiming for improved throughput, network delay, and overall spectrum reuse efficiency. In this work, we study three reference scenar…
▽ More
Wi-Fi 7 is already in the making, and Multi-Link Operation (MLO) is one of the main features proposed in its correspondent IEEE 802.11be amendment. MLO will allow devices to coordinate multiple radio interfaces to access separate channels through a single association, aiming for improved throughput, network delay, and overall spectrum reuse efficiency. In this work, we study three reference scenarios to evaluate the performance of the two main MLO implementations -- Multi-Link Multi-Radio (MLMR) and Multi-Link Single-Radio (MLSR) -- , the interplay between multiple nodes employing them, and their coexistence with legacy Single-Link devices. Importantly, our results reveal that the potential of MLMR is mainly unleashed in isolated deployments or under unloaded network conditions. Instead, in medium- to high-load scenarios, MLSR may prove more effective in reducing the latency while guaranteeing fairness with contending Single-Link nodes.
△ Less
Submitted 5 August, 2022; v1 submitted 30 May, 2022;
originally announced May 2022.
-
Computing Programs for Generalized Planning as Heuristic Search
Authors:
Javier Segovia-Aguas,
Sergio Jiménez,
Anders Jonsson
Abstract:
Although heuristic search is one of the most successful approaches to classical planning, this planning paradigm does not apply straightforwardly to Generalized Planning (GP). This paper adapts the planning as heuristic search paradigm to the particularities of GP, and presents the first native heuristic search approach to GP. First, the paper defines a program-based solution space for GP that is…
▽ More
Although heuristic search is one of the most successful approaches to classical planning, this planning paradigm does not apply straightforwardly to Generalized Planning (GP). This paper adapts the planning as heuristic search paradigm to the particularities of GP, and presents the first native heuristic search approach to GP. First, the paper defines a program-based solution space for GP that is independent of the number of planning instances in a GP problem, and the size of these instances. Second, the paper defines the BFGP algorithm for GP, that implements a best-first search in our program-based solution space, and that is guided by different evaluation and heuristic functions.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Scaling-up Generalized Planning as Heuristic Search with Landmarks
Authors:
Javier Segovia-Aguas,
Sergio Jiménez,
Anders Jonsson,
Laura Sebastiá
Abstract:
Landmarks are one of the most effective search heuristics for classical planning, but largely ignored in generalized planning. Generalized planning (GP) is usually addressed as a combinatorial search in a given space of algorithmic solutions, where candidate solutions are evaluated w.r.t.~the instances they solve. This type of solution evaluation ignores any sub-goal information that is not explic…
▽ More
Landmarks are one of the most effective search heuristics for classical planning, but largely ignored in generalized planning. Generalized planning (GP) is usually addressed as a combinatorial search in a given space of algorithmic solutions, where candidate solutions are evaluated w.r.t.~the instances they solve. This type of solution evaluation ignores any sub-goal information that is not explicit in the representation of the planning instances, causing plateaus in the space of candidate generalized plans. Furthermore, node expansion in GP is a run-time bottleneck since it requires evaluating every child node over the entire batch of classical planning instances in a GP problem. In this paper we define a landmark counting heuristic for GP (that considers sub-goal information that is not explicitly represented in the planning instances), and a novel heuristic search algorithm for GP (that we call PGP) and that progressively processes subsets of the planning instances of a GP problem. Our two orthogonal contributions are analyzed in an ablation study, showing that both improve the state-of-the-art in GP as heuristic search, and that both benefit from each other when used in combination.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
State Representation Learning for Goal-Conditioned Reinforcement Learning
Authors:
Lorenzo Steccanella,
Anders Jonsson
Abstract:
This paper presents a novel state representation for reward-free Markov decision processes. The idea is to learn, in a self-supervised manner, an embedding space where distances between pairs of embedded states correspond to the minimum number of actions needed to transition between them. Compared to previous methods, our approach does not require any domain knowledge, learning from offline and un…
▽ More
This paper presents a novel state representation for reward-free Markov decision processes. The idea is to learn, in a self-supervised manner, an embedding space where distances between pairs of embedded states correspond to the minimum number of actions needed to transition between them. Compared to previous methods, our approach does not require any domain knowledge, learning from offline and unlabeled data. We show how this representation can be leveraged to learn goal-conditioned policies, providing a notion of similarity between states and goals and a useful heuristic distance to guide planning and reinforcement learning algorithms. Finally, we empirically validate our method in classic control domains and multi-goal environments, demonstrating that our method can successfully learn representations in large and/or continuous domains.
△ Less
Submitted 4 May, 2022;
originally announced May 2022.
-
Infinite utility: counterparts and ultimate locations
Authors:
Adam Jonsson
Abstract:
The locations problem in infinite ethics concerns the relative moral status of different categories of potential bearers of value, the primary examples of which are people and points in time. The challenge is to determine which category of value bearers are of ultimate moral significance: the ultimate locations, for short. This paper defends the view that the ultimate locations are 'people at time…
▽ More
The locations problem in infinite ethics concerns the relative moral status of different categories of potential bearers of value, the primary examples of which are people and points in time. The challenge is to determine which category of value bearers are of ultimate moral significance: the ultimate locations, for short. This paper defends the view that the ultimate locations are 'people at times'. A person at a time is not a specific person, but the person born at a specific point in time (de dicto). The main conclusion of the paper is that the unsettling implications of the time- and person-centered approaches to infinite ethics can be avoided. Most notably, a broad class of worlds that person-centered views deem incomparable can be strictly ranked.
△ Less
Submitted 10 April, 2023; v1 submitted 4 September, 2021;
originally announced September 2021.
-
Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes
Authors:
Guillermo Infante,
Anders Jonsson,
Vicenç Gómez
Abstract:
In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and the subtasks consist in moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in ea…
▽ More
In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and the subtasks consist in moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition. The policy is implicitly defined on these optimal value estimates, rather than being decomposed among the subtasks. As a consequence, our approach can learn the globally optimal policy, and does not suffer from the non-stationarity of high-level decisions. If several partitions have equivalent dynamics, the subtasks of those partitions can be shared. If the set of boundary states is smaller than the entire state space, our approach can have significantly smaller sample complexity than that of a flat learner, and we validate this empirically in several experiments.
△ Less
Submitted 28 April, 2022; v1 submitted 29 June, 2021;
originally announced June 2021.
-
Hierarchical Representation Learning for Markov Decision Processes
Authors:
Lorenzo Steccanella,
Simone Totaro,
Anders Jonsson
Abstract:
In this paper we present a novel method for learning hierarchical representations of Markov decision processes. Our method works by partitioning the state space into subsets, and defines subtasks for performing transitions between the partitions. We formulate the problem of partitioning the state space as an optimization problem that can be solved using gradient descent given a set of sampled traj…
▽ More
In this paper we present a novel method for learning hierarchical representations of Markov decision processes. Our method works by partitioning the state space into subsets, and defines subtasks for performing transitions between the partitions. We formulate the problem of partitioning the state space as an optimization problem that can be solved using gradient descent given a set of sampled trajectories, making our method suitable for high-dimensional problems with large state spaces. We empirically validate the method, by showing that it can successfully learn a useful hierarchical representation in a navigation domain. Once learned, the hierarchical representation can be used to solve different tasks in the given domain, thus generalizing knowledge across tasks.
△ Less
Submitted 19 December, 2021; v1 submitted 3 June, 2021;
originally announced June 2021.
-
Polynomial Graph Parsing with Non-Structural Reentrancies
Authors:
Johanna Björklund,
Frank Drewes,
Anna Jonsson
Abstract:
Graph-based semantic representations are valuable in natural language processing, where it is often simple and effective to represent linguistic concepts as nodes, and relations as edges between them. Several attempts has been made to find a generative device that is sufficiently powerful to represent languages of semantic graphs, while at the same allowing efficient parsing. We add to this line o…
▽ More
Graph-based semantic representations are valuable in natural language processing, where it is often simple and effective to represent linguistic concepts as nodes, and relations as edges between them. Several attempts has been made to find a generative device that is sufficiently powerful to represent languages of semantic graphs, while at the same allowing efficient parsing. We add to this line of work by introducing graph extension grammar, which consists of an algebra over graphs together with a regular tree grammar that generates expressions over the operations of the algebra. Due to the design of the operations, these grammars can generate graphs with non-structural reentrancies; a type of node-sharing that is excessively common in formalisms such as abstract meaning representation, but for which existing devices offer little support. We provide a parsing algorithm for graph extension grammars, which is proved to be correct and run in polynomial time.
△ Less
Submitted 7 May, 2021; v1 submitted 5 May, 2021;
originally announced May 2021.
-
Generalized Planning as Heuristic Search
Authors:
Javier Segovia-Aguas,
Sergio Jiménez,
Anders Jonsson
Abstract:
Although heuristic search is one of the most successful approaches to classical planning, this planning paradigm does not apply straightforwardly to Generalized Planning (GP). Planning as heuristic search traditionally addresses the computation of sequential plans by searching in a grounded state-space. On the other hand GP aims at computing algorithm-like plans, that can branch and loop, and that…
▽ More
Although heuristic search is one of the most successful approaches to classical planning, this planning paradigm does not apply straightforwardly to Generalized Planning (GP). Planning as heuristic search traditionally addresses the computation of sequential plans by searching in a grounded state-space. On the other hand GP aims at computing algorithm-like plans, that can branch and loop, and that generalize to a (possibly infinite) set of classical planning instances. This paper adapts the planning as heuristic search paradigm to the particularities of GP, and presents the first native heuristic search approach to GP. First, the paper defines a novel GP solution space that is independent of the number of planning instances in a GP problem, and the size of these instances. Second, the paper defines different evaluation and heuristic functions for guiding a combinatorial search in our GP solution space. Lastly the paper defines a GP algorithm, called Best-First Generalized Planning (BFGP), that implements a best-first search in the solution space guided by our evaluation/heuristic functions.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Accelerator Development at the FREIA Laboratory
Authors:
R. Ruber,
A. K. Bhattacharyya,
D. Dancila,
T. Ekelöf,
J. Eriksson,
K. Fransson,
K. Gajewski,
V. Goryashko,
L. Hermansson,
M. Jacewicz,
M. Jobs,
Å. Jönsson,
H. Li,
T. Lofnes,
A. Miyazaki,
M. Olvegård,
E. Pehlivan,
T. Peterson,
K. Pepitone,
A. Rydberg,
R. Santiago Kern,
R. Wedberg,
A. Wiren,
R. Yogi,
V. Ziemann
Abstract:
The FREIA Laboratory at Uppsala University focuses on superconducting technology and accelerator development. It actively supports the development of the European Spallation Source, CERN, and MAX IV, among others. FREIA has developed test facilities for superconducting accelerator technology such as a double-cavity horizontal test cryostat, a vertical cryostat with a novel magnetic field compensat…
▽ More
The FREIA Laboratory at Uppsala University focuses on superconducting technology and accelerator development. It actively supports the development of the European Spallation Source, CERN, and MAX IV, among others. FREIA has developed test facilities for superconducting accelerator technology such as a double-cavity horizontal test cryostat, a vertical cryostat with a novel magnetic field compensation scheme, and a test stand for short cryomodules. Accelerating cavities have been tested in the horizontal cryostat, crab-cavities in the vertical cryostat, and cryomodules for ESS on the cryomodule test stand. High power radio-frequency amplifier prototypes based on vacuum tube technology were developed for driving spoke cavities. Solid-state amplifiers and power combiners are under development for future projects. We present the status of the FREIA Laboratory complemented with results of recent projects and future prospects.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Hierarchical Width-Based Planning and Learning
Authors:
Miquel Junyent,
Vicenç Gómez,
Anders Jonsson
Abstract:
Width-based search methods have demonstrated state-of-the-art performance in a wide range of testbeds, from classical planning problems to image-based simulators such as Atari games. These methods scale independently of the size of the state-space, but exponentially in the problem width. In practice, running the algorithm with a width larger than 1 is computationally intractable, prohibiting IW fr…
▽ More
Width-based search methods have demonstrated state-of-the-art performance in a wide range of testbeds, from classical planning problems to image-based simulators such as Atari games. These methods scale independently of the size of the state-space, but exponentially in the problem width. In practice, running the algorithm with a width larger than 1 is computationally intractable, prohibiting IW from solving higher width problems. In this paper, we present a hierarchical algorithm that plans at two levels of abstraction. A high-level planner uses abstract features that are incrementally discovered from low-level pruning decisions. We illustrate this algorithm in classical planning PDDL domains as well as in pixel-based simulator domains. In classical planning, we show how IW(1) at two levels of abstraction can solve problems of width 2. For pixel-based domains, we show how in combination with a learned policy and a learned value function, the proposed hierarchical IW can outperform current flat IW-based planners in Atari games with sparse rewards.
△ Less
Submitted 1 September, 2021; v1 submitted 15 January, 2021;
originally announced January 2021.
-
Hierarchical reinforcement learning for efficient exploration and transfer
Authors:
Lorenzo Steccanella,
Simone Totaro,
Damien Allonsius,
Anders Jonsson
Abstract:
Sparse-reward domains are challenging for reinforcement learning algorithms since significant exploration is needed before encountering reward for the first time. Hierarchical reinforcement learning can facilitate exploration by reducing the number of decisions necessary before obtaining a reward. In this paper, we present a novel hierarchical reinforcement learning framework based on the compress…
▽ More
Sparse-reward domains are challenging for reinforcement learning algorithms since significant exploration is needed before encountering reward for the first time. Hierarchical reinforcement learning can facilitate exploration by reducing the number of decisions necessary before obtaining a reward. In this paper, we present a novel hierarchical reinforcement learning framework based on the compression of an invariant state space that is common to a range of tasks. The algorithm introduces subtasks which consist of moving between the state partitions induced by the compression. Results indicate that the algorithm can successfully solve complex sparse-reward domains, and transfer knowledge to solve new, previously unseen tasks more quickly.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Improved Exploration in Factored Average-Reward MDPs
Authors:
Mohammad Sadegh Talebi,
Anders Jonsson,
Odalric-Ambrym Maillard
Abstract:
We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where the state-action space $\mathcal X$ and the state-space $\mathcal S$ admit the respective factored forms of $\mathcal X = \otimes_{i=1}^n \mathcal X_i$ and $\mathcal S=\otimes_{i=1}^m \mathcal S_i$, and the transition and rewa…
▽ More
We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where the state-action space $\mathcal X$ and the state-space $\mathcal S$ admit the respective factored forms of $\mathcal X = \otimes_{i=1}^n \mathcal X_i$ and $\mathcal S=\otimes_{i=1}^m \mathcal S_i$, and the transition and reward functions are factored over $\mathcal X$ and $\mathcal S$. Assuming known factorization structure, we introduce a novel regret minimization strategy inspired by the popular UCRL2 strategy, called DBN-UCRL, which relies on Bernstein-type confidence sets defined for individual elements of the transition function. We show that for a generic factorization structure, DBN-UCRL achieves a regret bound, whose leading term strictly improves over existing regret bounds in terms of the dependencies on the size of $\mathcal S_i$'s and the involved diameter-related terms. We further show that when the factorization structure corresponds to the Cartesian product of some base MDPs, the regret of DBN-UCRL is upper bounded by the sum of regret of the base MDPs. We demonstrate, through numerical experiments on standard environments, that DBN-UCRL enjoys substantially improved regret empirically over existing algorithms that have frequentist regret guarantees.
△ Less
Submitted 11 March, 2021; v1 submitted 9 September, 2020;
originally announced September 2020.
-
Induction and Exploitation of Subgoal Automata for Reinforcement Learning
Authors:
Daniel Furelos-Blanco,
Mark Law,
Anders Jonsson,
Krysia Broda,
Alessandra Russo
Abstract:
In this paper we present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks. ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task's subgoals expressed as propositional logic formulas over a set of high-level events. A subgoal automaton also consists of two special states:…
▽ More
In this paper we present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks. ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task's subgoals expressed as propositional logic formulas over a set of high-level events. A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding. A state-of-the-art inductive logic programming system is used to learn a subgoal automaton that covers the traces of high-level events observed by the RL agent. When the currently exploited automaton does not correctly recognize a trace, the automaton learner induces a new automaton that covers that trace. The interleaving process guarantees the induction of automata with the minimum number of states, and applies a symmetry breaking mechanism to shrink the search space whilst remaining complete. We evaluate ISA in several gridworld and continuous state space problems using different RL algorithms that leverage the automaton structures. We provide an in-depth empirical analysis of the automaton learning performance in terms of the traces, the symmetry breaking and specific restrictions imposed on the final learnable automaton. For each class of RL problem, we show that the learned automata can be successfully exploited to learn policies that reach the goal, achieving an average reward comparable to the case where automata are not learned but handcrafted and given beforehand.
△ Less
Submitted 16 March, 2021; v1 submitted 8 September, 2020;
originally announced September 2020.
-
Fast active learning for pure exploration in reinforcement learning
Authors:
Pierre Ménard,
Omar Darwiche Domingues,
Anders Jonsson,
Emilie Kaufmann,
Edouard Leurent,
Michal Valko
Abstract:
Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on exploring efficiently. The exploration remains a challenge while it has been addressed with many hand-tuned heuristics with different levels of generality on one sid…
▽ More
Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on exploring efficiently. The exploration remains a challenge while it has been addressed with many hand-tuned heuristics with different levels of generality on one side, and a few theoretically-backed exploration strategies on the other. Many of them are incarnated by intrinsic motivation and in particular explorations bonuses. A common rule of thumb for exploration bonuses is to use $1/\sqrt{n}$ bonus that is added to the empirical estimates of the reward, where $n$ is a number of times this particular state (or a state-action pair) was visited. We show that, surprisingly, for a pure-exploration objective of reward-free exploration, bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon $H$. Furthermore, we show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting, which is another pure-exploration objective, where the environment provides rewards but the agent is not penalized for its behavior during the exploration phase.
△ Less
Submitted 10 October, 2020; v1 submitted 27 July, 2020;
originally announced July 2020.
-
Adaptive Reward-Free Exploration
Authors:
Emilie Kaufmann,
Pierre Ménard,
Omar Darwiche Domingues,
Anders Jonsson,
Edouard Leurent,
Michal Valko
Abstract:
Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel. In our work, we instead give a more natural adaptive approach for reward-free exploration which directly reduces upper bounds on the maximum MDP estimation error. We show that, interestingly, our reward-free UCRL algorithm can be…
▽ More
Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel. In our work, we instead give a more natural adaptive approach for reward-free exploration which directly reduces upper bounds on the maximum MDP estimation error. We show that, interestingly, our reward-free UCRL algorithm can be seen as a variant of an algorithm of Fiechter from 1994, originally proposed for a different objective that we call best-policy identification. We prove that RF-UCRL needs of order $({SAH^4}/{\varepsilon^2})(\log(1/δ) + S)$ episodes to output, with probability $1-δ$, an $\varepsilon$-approximation of the optimal policy for any reward function. This bound improves over existing sample-complexity bounds in both the small $\varepsilon$ and the small $δ$ regimes. We further investigate the relative complexities of reward-free exploration and best-policy identification.
△ Less
Submitted 7 October, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
Authors:
Anders Jonsson,
Emilie Kaufmann,
Pierre Ménard,
Omar Darwiche Domingues,
Edouard Leurent,
Michal Valko
Abstract:
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optima…
▽ More
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Usage of Network Simulators in Machine-Learning-Assisted 5G/6G Networks
Authors:
Francesc Wilhelmi,
Marc Carrascosa,
Cristina Cano,
Anders Jonsson,
Vishnu Ram,
Boris Bellalta
Abstract:
Without any doubt, Machine Learning (ML) will be an important driver of future communications due to its foreseen performance when applied to complex problems. However, the application of ML to networking systems raises concerns among network operators and other stakeholders, especially regarding trustworthiness and reliability. In this paper, we devise the role of network simulators for bridging…
▽ More
Without any doubt, Machine Learning (ML) will be an important driver of future communications due to its foreseen performance when applied to complex problems. However, the application of ML to networking systems raises concerns among network operators and other stakeholders, especially regarding trustworthiness and reliability. In this paper, we devise the role of network simulators for bridging the gap between ML and communications systems. In particular, we present an architectural integration of simulators in ML-aware networks for training, testing, and validating ML models before being applied to the operative network. Moreover, we provide insights on the main challenges resulting from this integration, and then give hints discussing how they can be overcome. Finally, we illustrate the integration of network simulators into ML-assisted communications through a proof-of-concept testbed implementation of a residential Wi-Fi network.
△ Less
Submitted 2 March, 2021; v1 submitted 17 May, 2020;
originally announced May 2020.
-
Lifelong Control of Off-grid Microgrid with Model Based Reinforcement Learning
Authors:
Simone Totaro,
Ioannis Boukas,
Anders Jonsson,
Bertrand Cornélusse
Abstract:
The lifelong control problem of an off-grid microgrid is composed of two tasks, namely estimation of the condition of the microgrid devices and operational planning accounting for the uncertainties by forecasting the future consumption and the renewable production. The main challenge for the effective control arises from the various changes that take place over time. In this paper, we present an o…
▽ More
The lifelong control problem of an off-grid microgrid is composed of two tasks, namely estimation of the condition of the microgrid devices and operational planning accounting for the uncertainties by forecasting the future consumption and the renewable production. The main challenge for the effective control arises from the various changes that take place over time. In this paper, we present an open-source reinforcement framework for the modeling of an off-grid microgrid for rural electrification. The lifelong control problem of an isolated microgrid is formulated as a Markov Decision Process (MDP). We categorize the set of changes that can occur in progressive and abrupt changes. We propose a novel model based reinforcement learning algorithm that is able to address both types of changes. In particular the proposed algorithm demonstrates generalisation properties, transfer capabilities and better robustness in case of fast-changing system dynamics. The proposed algorithm is compared against a rule-based policy and a model predictive controller with look-ahead. The results show that the trained agent is able to outperform both benchmarks in the lifelong setting where the system dynamics are changing over time.
△ Less
Submitted 16 May, 2020;
originally announced May 2020.
-
Induction of Subgoal Automata for Reinforcement Learning
Authors:
Daniel Furelos-Blanco,
Mark Law,
Alessandra Russo,
Krysia Broda,
Anders Jonsson
Abstract:
In this work we present ISA, a novel approach for learning and exploiting subgoals in reinforcement learning (RL). Our method relies on inducing an automaton whose transitions are subgoals expressed as propositional formulas over a set of observable events. A state-of-the-art inductive logic programming system is used to learn the automaton from observation traces perceived by the RL agent. The re…
▽ More
In this work we present ISA, a novel approach for learning and exploiting subgoals in reinforcement learning (RL). Our method relies on inducing an automaton whose transitions are subgoals expressed as propositional formulas over a set of observable events. A state-of-the-art inductive logic programming system is used to learn the automaton from observation traces perceived by the RL agent. The reinforcement learning and automaton learning processes are interleaved: a new refined automaton is learned whenever the RL agent generates a trace not recognized by the current automaton. We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events has on the learner's performance.
△ Less
Submitted 29 November, 2019;
originally announced November 2019.
-
Generalized Planning with Positive and Negative Examples
Authors:
Javier Segovia-Aguas,
Sergio Jiménez,
Anders Jonsson
Abstract:
Generalized planning aims at computing an algorithm-like structure (generalized plan) that solves a set of multiple planning instances. In this paper we define negative examples for generalized planning as planning instances that must not be solved by a generalized plan. With this regard the paper extends the notion of validation of a generalized plan as the problem of verifying that a given gener…
▽ More
Generalized planning aims at computing an algorithm-like structure (generalized plan) that solves a set of multiple planning instances. In this paper we define negative examples for generalized planning as planning instances that must not be solved by a generalized plan. With this regard the paper extends the notion of validation of a generalized plan as the problem of verifying that a given generalized plan solves the set of input positives instances while it fails to solve a given input set of negative examples. This notion of plan validation allows us to define quantitative metrics to asses the generalization capacity of generalized plans. The paper also shows how to incorporate this new notion of plan validation into a compilation for plan synthesis that takes both positive and negative instances as input. Experiments show that incorporating negative examples can accelerate plan synthesis in several domains and leverage quantitative metrics to evaluate the generalization capacity of the synthesized plans.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Hierarchical Finite State Controllers for Generalized Planning
Authors:
Javier Segovia-Aguas,
Sergio Jiménez,
Anders Jonsson
Abstract:
Finite State Controllers (FSCs) are an effective way to represent sequential plans compactly. By imposing appropriate conditions on transitions, FSCs can also represent generalized plans that solve a range of planning problems from a given domain. In this paper we introduce the concept of {\it hierarchical FSCs} for planning by allowing controllers to call other controllers. We show that hierarchi…
▽ More
Finite State Controllers (FSCs) are an effective way to represent sequential plans compactly. By imposing appropriate conditions on transitions, FSCs can also represent generalized plans that solve a range of planning problems from a given domain. In this paper we introduce the concept of {\it hierarchical FSCs} for planning by allowing controllers to call other controllers. We show that hierarchical FSCs can represent generalized plans more compactly than individual FSCs. Moreover, our call mechanism makes it possible to generate hierarchical FSCs in a modular fashion, or even to apply recursion. We also introduce a compilation that enables a classical planner to generate hierarchical FSCs that solve challenging generalized planning problems. The compilation takes as input a set of planning problems from a given domain and outputs a single classical planning problem, whose solution corresponds to a hierarchical FSC.
△ Less
Submitted 7 November, 2019;
originally announced November 2019.
-
Generalized Planning With Procedural Domain Control Knowledge
Authors:
Javier Segovia-Aguas,
Sergio Jiménez,
Anders Jonsson
Abstract:
Generalized planning is the task of generating a single solution that is valid for a set of planning problems. In this paper we show how to represent and compute generalized plans using procedural Domain Control Knowledge (DCK). We define a {\it divide and conquer} approach that first generates the procedural DCK solving a set of planning problems representative of certain subtasks and then compil…
▽ More
Generalized planning is the task of generating a single solution that is valid for a set of planning problems. In this paper we show how to represent and compute generalized plans using procedural Domain Control Knowledge (DCK). We define a {\it divide and conquer} approach that first generates the procedural DCK solving a set of planning problems representative of certain subtasks and then compile it as callable procedures of the overall generalized planning problem. Our procedure calling mechanism allows nested and recursive procedure calls and is implemented in PDDL so that classical planners can compute and exploit procedural DCK. Experiments show that an off-the-shelf classical planner, using procedural DCK as callable procedures, can compute generalized plans in a wide range of domains including non-trivial ones, such as sorting variable-size lists or DFS traversal of binary trees with variable size.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
A Flexible Machine Learning-Aware Architecture for Future WLANs
Authors:
Francesc Wilhelmi,
Sergio Barrachina-Muñoz,
Boris Bellalta,
Cristina Cano,
Anders Jonsson,
Vishnu Ram
Abstract:
Lots of hopes have been placed on Machine Learning (ML) as a key enabler of future wireless networks. By taking advantage of large volumes of data, ML is expected to deal with the ever-increasing complexity of networking problems. Unfortunately, current networks are not yet prepared to support the ensuing requirements of ML-based applications in terms of data collection, processing, and output dis…
▽ More
Lots of hopes have been placed on Machine Learning (ML) as a key enabler of future wireless networks. By taking advantage of large volumes of data, ML is expected to deal with the ever-increasing complexity of networking problems. Unfortunately, current networks are not yet prepared to support the ensuing requirements of ML-based applications in terms of data collection, processing, and output distribution. This article points out the architectural requirements that are needed to pervasively include ML as part of future wireless networks operation. Specifically, we look into Wireless Local Area Networks (WLANs), which, due to their nature can be found in multiple forms, ranging from cloud-based to edge-computing-like deployments. In particular, we propose to adopt the International Telecommunications Union (ITU) unified architecture for 5G and beyond. Based on ITU's architecture, we provide insights on the main requirements and the major challenges of introducing ML to the multiple modalities of WLANs. Finally, we showcase the superiority of the architecture through an ML-enabled use case for future networks.
△ Less
Submitted 17 February, 2020; v1 submitted 8 October, 2019;
originally announced October 2019.
-
Decision Tree Learning for Uncertain Clinical Measurements
Authors:
Cecília Nunes,
Hélène Langet,
Mathieu De Craene,
Oscar Camara,
Bart Bijnens,
Anders Jonsson
Abstract:
Clinical decision requires reasoning in the presence of imperfect data. DTs are a well-known decision support tool, owing to their interpretability, fundamental in safety-critical contexts such as medical diagnosis. However, learning DTs from uncertain data leads to poor generalization, and generating predictions for uncertain data hinders prediction accuracy. Several methods have suggested the po…
▽ More
Clinical decision requires reasoning in the presence of imperfect data. DTs are a well-known decision support tool, owing to their interpretability, fundamental in safety-critical contexts such as medical diagnosis. However, learning DTs from uncertain data leads to poor generalization, and generating predictions for uncertain data hinders prediction accuracy. Several methods have suggested the potential of probabilistic decisions at the internal nodes in making DTs robust to uncertainty. Some approaches only employ probabilistic thresholds during evaluation. Others also consider the uncertainty in the learning phase, at the expense of increased computational complexity or reduced interpretability. The existing methods have not clarified the merit of a probabilistic approach in the distinct phases of DT learning, nor when the uncertainty is present in the training or the test data. We present a probabilistic DT approach that models measurement uncertainty as a noise distribution, independently realized: (1) when searching for the split thresholds, (2) when splitting the training instances, and (3) when generating predictions for unseen data. The soft training approaches (1, 2) achieved a regularizing effect, leading to significant reductions in DT size, while maintaining accuracy, for increased noise. Soft evaluation (3) showed no benefit in handling noise.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Solving Multiagent Planning Problems with Concurrent Conditional Effects
Authors:
Daniel Furelos-Blanco,
Anders Jonsson
Abstract:
In this work we present a novel approach to solving concurrent multiagent planning problems in which several agents act in parallel. Our approach relies on a compilation from concurrent multiagent planning to classical planning, allowing us to use an off-the-shelf classical planner to solve the original multiagent problem. The solution can be directly interpreted as a concurrent plan that satisfie…
▽ More
In this work we present a novel approach to solving concurrent multiagent planning problems in which several agents act in parallel. Our approach relies on a compilation from concurrent multiagent planning to classical planning, allowing us to use an off-the-shelf classical planner to solve the original multiagent problem. The solution can be directly interpreted as a concurrent plan that satisfies a given set of concurrency constraints, while avoiding the exponential blowup associated with concurrent actions. Our planner is the first to handle action effects that are conditional on what other agents are doing. Theoretically, we show that the compilation is sound and complete. Empirically, we show that our compilation can solve challenging multiagent planning problems that require concurrent actions.
△ Less
Submitted 19 June, 2019;
originally announced June 2019.
-
Deep Policies for Width-Based Planning in Pixel Domains
Authors:
Miquel Junyent,
Anders Jonsson,
Vicenç Gómez
Abstract:
Width-based planning has demonstrated great success in recent years due to its ability to scale independently of the size of the state space. For example, Bandres et al. (2018) introduced a rollout version of the Iterated Width algorithm whose performance compares well with humans and learning methods in the pixel setting of the Atari games suite. In this setting, planning is done on-line using th…
▽ More
Width-based planning has demonstrated great success in recent years due to its ability to scale independently of the size of the state space. For example, Bandres et al. (2018) introduced a rollout version of the Iterated Width algorithm whose performance compares well with humans and learning methods in the pixel setting of the Atari games suite. In this setting, planning is done on-line using the "screen" states and selecting actions by looking ahead into the future. However, this algorithm is purely exploratory and does not leverage past reward information. Furthermore, it requires the state to be factored into features that need to be pre-defined for the particular task, e.g., the B-PROST pixel features. In this work, we extend width-based planning by incorporating an explicit policy in the action selection mechanism. Our method, called $π$-IW, interleaves width-based planning and policy learning using the state-actions visited by the planner. The policy estimate takes the form of a neural network and is in turn used to guide the planning step, thus reinforcing promising paths. Surprisingly, we observe that the representation learned by the neural network can be used as a feature space for the width-based planner without degrading its performance, thus removing the requirement of pre-defined features for the planner. We compare $π$-IW with previous width-based methods and with AlphaZero, a method that also interleaves planning and learning, in simple environments, and show that $π$-IW has superior performance. We also show that $π$-IW algorithm outperforms previous width-based methods in the pixel setting of Atari games suite.
△ Less
Submitted 5 October, 2021; v1 submitted 12 April, 2019;
originally announced April 2019.
-
Modeling Text Complexity using a Multi-Scale Probit
Authors:
Johan Falkenjack,
Mattias Villani,
Arne Jönsson
Abstract:
We present a novel model for text complexity analysis which can be fitted to ordered categorical data measured on multiple scales, e.g. a corpus with binary responses mixed with a corpus with more than two ordered outcomes. The multiple scales are assumed to be driven by the same underlying latent variable describing the complexity of the text. We propose an easily implemented Gibbs sampler to sam…
▽ More
We present a novel model for text complexity analysis which can be fitted to ordered categorical data measured on multiple scales, e.g. a corpus with binary responses mixed with a corpus with more than two ordered outcomes. The multiple scales are assumed to be driven by the same underlying latent variable describing the complexity of the text. We propose an easily implemented Gibbs sampler to sample from the posterior distribution by a direct extension of established data augmentation schemes. By being able to combine multiple corpora with different annotation schemes we can get around the common problem of having more text features than annotated documents, i.e. an example of the $p>n$ problem. The predictive performance of the model is evaluated using both simulated and real world readability data with very promising results.
△ Less
Submitted 12 November, 2018;
originally announced November 2018.
-
Power regulation and electromigration in platinum microwires
Authors:
Ottó Elíasson,
Gabriel Vasile,
Sigurður Ægir Jónsson,
G. I. Gudjonsson,
Mustafa Arikan,
Snorri Ingvarsson
Abstract:
We introduce a new experimental setup with a biasing circuit and computer control for electrical power regulation under reversing polarity in Pt microwires with dimensions of $1\times10$ μm$^2$. The circuit is computer controlled via a data acquisition board. It amplifies a control signal from the computer and drives current of alternating polarity through the sample in question. Time-to-failure i…
▽ More
We introduce a new experimental setup with a biasing circuit and computer control for electrical power regulation under reversing polarity in Pt microwires with dimensions of $1\times10$ μm$^2$. The circuit is computer controlled via a data acquisition board. It amplifies a control signal from the computer and drives current of alternating polarity through the sample in question. Time-to-failure investigations under DC and AC current stress are performed. We confirm that AC current stress can improve the life time of microwires at least by a factor of $10^3$ compared to the corresponding time-to-failure under DC current stress.
△ Less
Submitted 4 November, 2018;
originally announced November 2018.
-
Improving width-based planning with compact policies
Authors:
Miquel Junyent,
Anders Jonsson,
Vicenç Gómez
Abstract:
Optimal action selection in decision problems characterized by sparse, delayed rewards is still an open challenge. For these problems, current deep reinforcement learning methods require enormous amounts of data to learn controllers that reach human-level performance. In this work, we propose a method that interleaves planning and learning to address this issue. The planning step hinges on the Ite…
▽ More
Optimal action selection in decision problems characterized by sparse, delayed rewards is still an open challenge. For these problems, current deep reinforcement learning methods require enormous amounts of data to learn controllers that reach human-level performance. In this work, we propose a method that interleaves planning and learning to address this issue. The planning step hinges on the Iterated-Width (IW) planner, a state of the art planner that makes explicit use of the state representation to perform structured exploration. IW is able to scale up to problems independently of the size of the state space. From the state-actions visited by IW, the learning step estimates a compact policy, which in turn is used to guide the planning step. The type of exploration used by our method is radically different than the standard random exploration used in RL. We evaluate our method in simple problems where we show it to have superior performance than the state-of-the-art reinforcement learning algorithms A2C and Alpha Zero. Finally, we present preliminary results in a subset of the Atari games suite.
△ Less
Submitted 15 June, 2018;
originally announced June 2018.
-
Assessing the impact of machine intelligence on human behaviour: an interdisciplinary endeavour
Authors:
Emilia Gómez,
Carlos Castillo,
Vicky Charisi,
Verónica Dahl,
Gustavo Deco,
Blagoj Delipetrev,
Nicole Dewandre,
Miguel Ángel González-Ballester,
Fabien Gouyon,
José Hernández-Orallo,
Perfecto Herrera,
Anders Jonsson,
Ansgar Koene,
Martha Larson,
Ramón López de Mántaras,
Bertin Martens,
Marius Miron,
Rubén Moreno-Bote,
Nuria Oliver,
Antonio Puertas Gallardo,
Heike Schweitzer,
Nuria Sebastian,
Xavier Serra,
Joan Serrà,
Songül Tolan
, et al. (1 additional authors not shown)
Abstract:
This document contains the outcome of the first Human behaviour and machine intelligence (HUMAINT) workshop that took place 5-6 March 2018 in Barcelona, Spain. The workshop was organized in the context of a new research programme at the Centre for Advanced Studies, Joint Research Centre of the European Commission, which focuses on studying the potential impact of artificial intelligence on human b…
▽ More
This document contains the outcome of the first Human behaviour and machine intelligence (HUMAINT) workshop that took place 5-6 March 2018 in Barcelona, Spain. The workshop was organized in the context of a new research programme at the Centre for Advanced Studies, Joint Research Centre of the European Commission, which focuses on studying the potential impact of artificial intelligence on human behaviour. The workshop gathered an interdisciplinary group of experts to establish the state of the art research in the field and a list of future research challenges to be addressed on the topic of human and machine intelligence, algorithm's potential impact on human cognitive capabilities and decision making, and evaluation and regulation needs. The document is made of short position statements and identification of challenges provided by each expert, and incorporates the result of the discussions carried out during the workshop. In the conclusion section, we provide a list of emerging research topics and strategies to be addressed in the near future.
△ Less
Submitted 7 June, 2018;
originally announced June 2018.
-
Potential and Pitfalls of Multi-Armed Bandits for Decentralized Spatial Reuse in WLANs
Authors:
Francesc Wilhelmi,
Sergio Barrachina-Muñoz,
Cristina Cano,
Boris Bellalta,
Anders Jonsson,
Gergely Neu
Abstract:
Spatial Reuse (SR) has recently gained attention to maximize the performance of IEEE 802.11 Wireless Local Area Networks (WLANs). Decentralized mechanisms are expected to be key in the development of SR solutions for next-generation WLANs, since many deployments are characterized by being uncoordinated by nature. However, the potential of decentralized mechanisms is limited by the significant lack…
▽ More
Spatial Reuse (SR) has recently gained attention to maximize the performance of IEEE 802.11 Wireless Local Area Networks (WLANs). Decentralized mechanisms are expected to be key in the development of SR solutions for next-generation WLANs, since many deployments are characterized by being uncoordinated by nature. However, the potential of decentralized mechanisms is limited by the significant lack of knowledge with respect to the overall wireless environment. To shed some light on this subject, we show the main considerations and possibilities of applying online learning to address the SR problem in uncoordinated WLANs. In particular, we provide a solution based on Multi-Armed Bandits (MABs) whereby independent WLANs dynamically adjust their frequency channel, transmit power and sensitivity threshold. To that purpose, we provide two different strategies, which refer to selfish and environment-aware learning. While the former stands for pure individual behavior, the second one considers the performance experienced by surrounding networks, thus taking into account the impact of individual actions on the environment. Through these two strategies we delve into practical issues of applying MABs in wireless networks, such as convergence guarantees or adversarial effects. Our simulation results illustrate the potential of the proposed solutions for enabling SR in future WLANs. We show that substantial improvements on network performance can be achieved regarding throughput and fairness.
△ Less
Submitted 14 December, 2018; v1 submitted 28 May, 2018;
originally announced May 2018.
-
A New Model for the Distribution of Observable Earthquake Magnitudes and Applications to $b$-value Estimation
Authors:
Jesper Martinsson,
Adam Jonsson
Abstract:
The $b$-value in the Gutenberg-Richter (GR) law contains information that is essential for evaluating earthquake hazard and predicting the occurrence of large earthquakes. Estimates of $b$ are often based on seismic events whose magnitude exceed a certain threshold, the so called magnitude of completeness. Such estimates are sensitive to the choice of threshold and often ignore a substantial porti…
▽ More
The $b$-value in the Gutenberg-Richter (GR) law contains information that is essential for evaluating earthquake hazard and predicting the occurrence of large earthquakes. Estimates of $b$ are often based on seismic events whose magnitude exceed a certain threshold, the so called magnitude of completeness. Such estimates are sensitive to the choice of threshold and often ignore a substantial portion of available data. We present a general model for the distribution of observable earthquake magnitudes and an estimation procedure that takes all measurements into account. The model is obtained by generalizing previous probabilistic descriptions of sensor network limitations and by using a generalization of the GR law. We show that our model is flexible enough to handle spatio-temporal variations in the seismic environment and captures valuable information about sensor network coverage. We also show that the model leads to significantly improved $b$-value estimates compared with established methods relying on the magnitude of completeness.
△ Less
Submitted 15 March, 2018;
originally announced March 2018.
-
Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits
Authors:
Francesc Wilhelmi,
Cristina Cano,
Gergely Neu,
Boris Bellalta,
Anders Jonsson,
Sergio Barrachina-Muñoz
Abstract:
Next-generation wireless deployments are characterized by being dense and uncoordinated, which often leads to inefficient use of resources and poor performance. To solve this, we envision the utilization of completely decentralized mechanisms to enable Spatial Reuse (SR). In particular, we focus on dynamic channel selection and Transmission Power Control (TPC). We rely on Reinforcement Learning (R…
▽ More
Next-generation wireless deployments are characterized by being dense and uncoordinated, which often leads to inefficient use of resources and poor performance. To solve this, we envision the utilization of completely decentralized mechanisms to enable Spatial Reuse (SR). In particular, we focus on dynamic channel selection and Transmission Power Control (TPC). We rely on Reinforcement Learning (RL), and more specifically on Multi-Armed Bandits (MABs), to allow networks to learn their best configuration. In this work, we study the exploration-exploitation trade-off by means of the $\varepsilon$-greedy, EXP3, UCB and Thompson sampling action-selection, and compare their performance. In addition, we study the implications of selecting actions simultaneously in an adversarial setting (i.e., concurrently), and compare it with a sequential approach. Our results show that optimal proportional fairness can be achieved, even when no information about neighboring networks is available to the learners and Wireless Networks (WNs) operate selfishly. However, there is high temporal variability in the throughput experienced by the individual networks, specially for $\varepsilon$-greedy and EXP3. These strategies, contrary to UCB and Thompson sampling, base their operation on the absolute experienced reward, rather than on its distribution. We identify the cause of this variability to be the adversarial setting of our setup in which the set of most played actions provide intermittent good/poor performance depending on the neighboring decisions. We also show that learning sequentially, even if using a selfish strategy, contributes to minimize this variability. The sequential approach is therefore shown to effectively deal with the challenges posed by the adversarial settings that are typically found in decentralized WNs.
△ Less
Submitted 13 November, 2018; v1 submitted 31 October, 2017;
originally announced October 2017.
-
Implications of Decentralized Q-learning Resource Allocation in Wireless Networks
Authors:
Francesc Wilhelmi,
Boris Bellalta,
Cristina Cano,
Anders Jonsson
Abstract:
Reinforcement Learning is gaining attention by the wireless networking community due to its potential to learn good-performing configurations only from the observed results. In this work we propose a stateless variation of Q-learning, which we apply to exploit spatial reuse in a wireless network. In particular, we allow networks to modify both their transmission power and the channel used solely b…
▽ More
Reinforcement Learning is gaining attention by the wireless networking community due to its potential to learn good-performing configurations only from the observed results. In this work we propose a stateless variation of Q-learning, which we apply to exploit spatial reuse in a wireless network. In particular, we allow networks to modify both their transmission power and the channel used solely based on the experienced throughput. We concentrate in a completely decentralized scenario in which no information about neighbouring nodes is available to the learners. Our results show that although the algorithm is able to find the best-performing actions to enhance aggregate throughput, there is high variability in the throughput experienced by the individual networks. We identify the cause of this variability as the adversarial setting of our setup, in which the most played actions provide intermittent good/poor performance depending on the neighbouring decisions. We also evaluate the effect of the intrinsic learning parameters of the algorithm on this variability.
△ Less
Submitted 29 August, 2017; v1 submitted 30 May, 2017;
originally announced May 2017.
-
A unified view of entropy-regularized Markov decision processes
Authors:
Gergely Neu,
Anders Jonsson,
Vicenç Gómez
Abstract:
We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yi…
▽ More
We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the entropy-regularized policy gradient methods of Mnih et al. (2016) may fail to converge to a fixed point. Finally, we illustrate empirically the effects of using various regularization techniques on learning performance in a simple reinforcement learning setup.
△ Less
Submitted 22 May, 2017;
originally announced May 2017.
-
An axiomatic approach to Markov decision processes
Authors:
Adam Jonsson
Abstract:
This paper presents an axiomatic approach to finite Markov decision processes where the discount rate is zero. One of the principal difficulties in the no discounting case is that, even if attention is restricted to stationary policies, a strong overtaking optimal policy need not exists. We provide preference foundations for two criteria that do admit optimal policies: $0$-discount optimality and…
▽ More
This paper presents an axiomatic approach to finite Markov decision processes where the discount rate is zero. One of the principal difficulties in the no discounting case is that, even if attention is restricted to stationary policies, a strong overtaking optimal policy need not exists. We provide preference foundations for two criteria that do admit optimal policies: $0$-discount optimality and average overtaking optimality. As a corollary of our results, we obtain conditions on a decision maker's preferences which ensure that an optimal policy exists. These results have implications for disciplines where stochastic dynamic programming problems arise, including automatic control, dynamic games, and economic development.
△ Less
Submitted 22 November, 2022; v1 submitted 11 January, 2017;
originally announced January 2017.