subscribe to arXiv mailings

Reinforcement Learning and Regret Bounds for Admission Control

Authors: Lucas Weber, Ana Bušić, Jiamin Zhu

Abstract: The expected regret of any reinforcement learning algorithm is lower bounded by $Ω\left(\sqrt{DXAT}\right)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific… ▽ More The expected regret of any reinforcement learning algorithm is lower bounded by $Ω\left(\sqrt{DXAT}\right)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific knowledge of the problem structure. In this article, we consider an admission control problem to an $M/M/c/S$ queue with $m$ job classes and class-dependent rewards and holding costs. Queuing systems often have a diameter that is exponential in the buffer size $S$, making the previous lower bound prohibitive for any practical use. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by $O(S\log T + \sqrt{mT \log T})$ in the finite server case. In the infinite server case, we prove that the dependence of the regret on $S$ disappears. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2402.07525 [pdf, other]

Reinforcement learning based demand charge minimization using energy storage

Authors: Lucas Weber, Ana Bušić, Jiamin Zhu

Abstract: Utilities have introduced demand charges to encourage customers to reduce their demand peaks, since a high peak may cause very high costs for both the utility and the consumer. We herein study the bill minimization problem for customers equipped with an energy storage device and a self-owned renewable energy production. A model-free reinforcement learning algorithm is carefully designed to reduce… ▽ More Utilities have introduced demand charges to encourage customers to reduce their demand peaks, since a high peak may cause very high costs for both the utility and the consumer. We herein study the bill minimization problem for customers equipped with an energy storage device and a self-owned renewable energy production. A model-free reinforcement learning algorithm is carefully designed to reduce both the energy charge and the demand charge of the consumer. The proposed algorithm does not need forecasting models for the energy demand and the renewable energy production. The resulting controller can be used online, and progressively improved with newly gathered data. The algorithm is validated on real data from an office building of IFPEN Solaize site. Numerical results show that our algorithm can reduce electricity bills with both daily and monthly demand charges. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Journal ref: 2023 IEEE 62nd Conference on Decision and Control (CDC), 2023

arXiv:2305.00187 [pdf, other]

On the sub-additivity of stochastic matching

Authors: Pascal Moyal, Ana Busic, Jean Mairesse

Abstract: We consider a stochastic matching model with a general compatibility graph, as introduced in \cite{MaiMoy16}. We prove that most common matching policies (including FCFM, priorities and random) satisfy a particular sub-additive property, which we exploit to show in many cases, the coupling-from-the-past to the steady state, using a backwards scheme {\em à la} Loynes. We then use these results to e… ▽ More We consider a stochastic matching model with a general compatibility graph, as introduced in \cite{MaiMoy16}. We prove that most common matching policies (including FCFM, priorities and random) satisfy a particular sub-additive property, which we exploit to show in many cases, the coupling-from-the-past to the steady state, using a backwards scheme {\em à la} Loynes. We then use these results to explicitly construct perfect bi-infinite matchings, and to build a perfect simulation algorithm in the case where the buffer of the system is finite. △ Less

Submitted 6 May, 2023; v1 submitted 29 April, 2023; originally announced May 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:1711.02620

MSC Class: 60J10; 05C70; 05C90

arXiv:2208.01958 [pdf, other]

Feature Projection for Optimal Transport

Authors: Thomas Le Corre, Ana Busic, Sean Meyn

Abstract: Optimal transport is now a standard tool for solving many problems in statistics and machine learning. The optimal "transport of probability measures" is also a recurring theme in stochastic control and distributed control, where in the latter application the probability measure corresponds to an empirical distribution associated with a large collection of distributed agents, subject to local and… ▽ More Optimal transport is now a standard tool for solving many problems in statistics and machine learning. The optimal "transport of probability measures" is also a recurring theme in stochastic control and distributed control, where in the latter application the probability measure corresponds to an empirical distribution associated with a large collection of distributed agents, subject to local and global control. The goal of this paper is to make precise these connections, which inspires new relaxations of optimal transport for application in new and traditional domains. The proposed relaxation replaces a target measure with a "moment class": a set of probability measures defined by generalized moment constraints. This is motivated by applications to control, outlier detection, and to address computational complexity. The main conclusions are (i) A characterization of the solution is obtained, similar to Kantorovich duality, in which one of the dual functions in the classical theory is replaced by a linear combination of the features defining the generalized moments. Hence the dimension of the optimization problem coincides with the number of constraints, even with an uncountable state space; (ii) By introducing regularization in the form of relative entropy, the solution can be interpreted as replacing a maximum with a softmax in the dual; (iii) In applications such as control for which it is not known a-priori if the moment class is non-empty, a relaxation is proposed whose solution admits a similar characterization; (iv) The gradient of the dual function can be expressed in terms of the expectation of the features under a tilted probability measure, which motivates Monte-Carlo techniques for computation. △ Less

Submitted 3 August, 2022; originally announced August 2022.

arXiv:2207.10248 [pdf, other]

Can locational disparity of prosumer energy optimization due to inverter rules be limited?

Authors: Md Umar Hashmi, Deepjyoti Deka, Ana Bušić, Dirk Van Hertem

Abstract: To mitigate issues related to the growth of variable smart loads and distributed generation, distribution system operators (DSO) now make it binding for prosumers with inverters to operate under pre-set rules. In particular, the maximum active and reactive power set points for prosumers are based on local voltage measurements to ensure that inverter output does not cause voltage violations. Howeve… ▽ More To mitigate issues related to the growth of variable smart loads and distributed generation, distribution system operators (DSO) now make it binding for prosumers with inverters to operate under pre-set rules. In particular, the maximum active and reactive power set points for prosumers are based on local voltage measurements to ensure that inverter output does not cause voltage violations. However, such actions, as observed in this work, restrict the range available for local energy management, with more adverse losses on arbitrage profits for prosumers located farther away from the substation. The goal of the paper is three-fold: (a) to develop an optimal local energy optimization algorithm for activation of load flexibility and inverter-interfaced solar PV and energy storage under time-varying electricity prices; (b) to quantify the locational impact on prosumer arbitrage gains due to inverter injection rules prevalent in different energy markets; (c) to propose a computationally efficient hybrid inverter control policy which provides voltage regulation while substantially reducing locational disparity. Using numerical simulations on three identical prosumers located at different parts of a radial feeder, we show that our control policy is able to minimize locational disparity in arbitrage gains between customers at the beginning and end of the feeder to 1.4%, while PV curtailment is reduced by 91.7% compared to the base case with restrictive volt-Var and volt-watt policy. △ Less

Submitted 18 November, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

arXiv:2112.14457 [pdf, other]

Stochastic dynamic matching: A mixed graph-theory and linear-algebra approach

Authors: Céline Comte, Fabien Mathieu, Ana Bušić

Abstract: The stochastic dynamic matching problem has recently drawn attention in the stochastic-modeling community due to its numerous applications, ranging from supply-chain management to kidney exchange programs. In this paper, we consider a matching problem in which items of different classes arrive according to independent Poisson processes. Unmatched items are stored in a queue, and compatibility cons… ▽ More The stochastic dynamic matching problem has recently drawn attention in the stochastic-modeling community due to its numerous applications, ranging from supply-chain management to kidney exchange programs. In this paper, we consider a matching problem in which items of different classes arrive according to independent Poisson processes. Unmatched items are stored in a queue, and compatibility constraints are described by a simple graph on the classes, so that two items can be matched if their classes are neighbors in the graph. We analyze the efficiency of matching policies, not only in terms of system stability, but also in terms of matching rates between different classes. Our results rely on the observation that, under any stable policy, the matching rates satisfy a conservation equation that equates the arrival and departure rates of each item class. Our main contributions are threefold. We first introduce a mapping between the dimension of the solution set of this conservation equation, the structure of the compatibility graph, and the existence of a stable policy. In particular, this allows us to derive a necessary and sufficient stability condition that is verifiable in polynomial time. Secondly, we describe the convex polytope of non-negative solutions of the conservation equation. When this polytope is reduced to a single point, we give a closed-form expression of the solution; in general, we characterize the vertices of this polytope using again the graph structure. Lastly, we show that greedy policies cannot, in general, achieve every point in the polytope. In contrast, non-greedy policies can reach any point of the interior of this polytope, and we give a condition for these policies to also reach the boundary of the polytope. △ Less

Submitted 26 June, 2023; v1 submitted 29 December, 2021; originally announced December 2021.

arXiv:2108.05840 [pdf, other]

A unified framework for coordination of thermostatically controlled loads

Authors: Austin Coffman, Ana Bušić, Prabir Barooah

Abstract: A collection of thermostatically controlled loads (TCLs) -- such as air conditioners and water heaters -- can vary their power consumption within limits to help the balancing authority of a power grid maintain demand supply balance. Doing so requires loads to coordinate their on/off decisions so that the aggregate power consumption profile tracks a grid-supplied reference. At the same time, each c… ▽ More A collection of thermostatically controlled loads (TCLs) -- such as air conditioners and water heaters -- can vary their power consumption within limits to help the balancing authority of a power grid maintain demand supply balance. Doing so requires loads to coordinate their on/off decisions so that the aggregate power consumption profile tracks a grid-supplied reference. At the same time, each consumer's quality of service (QoS) must be maintained. While there is a large body of work on TCL coordination, there are several limitations. One is that they do not provide guarantees on the reference tracking performance and QoS maintenance. A second limitation of past work is that they do not provide a means to compute a suitable reference signal for power demand of a collection of TCLs. In this work we provide a framework that addresses these weaknesses. The framework enables coordination of an arbitrary number of TCLs that: (i) is computationally efficient, (ii) is implementable at the TCLs with local feedback and low communication, and (iii) enables reference tracking by the collection while ensuring that temperature and cycling constraints are satisfied at every TCL at all times. The framework is based on a Markov model obtained by discretizing a pair of Fokker-Planck equations derived in earlier work by Malhame and Chong [21]. We then use this model to design randomized policies for TCLs. The balancing authority broadcasts the same policy to all TCLs, and each TCL implements this policy which requires only local measurement to make on/off decisions. Simulation results are provided to support these claims. △ Less

Submitted 12 August, 2021; originally announced August 2021.

arXiv:2101.06922 [pdf, other]

Privacy Impact on Generalized Nash Equilibrium in Peer-to-Peer Electricity Market

Authors: Ilia Shilov, Hélène Le Cadre, Ana Bušic

Abstract: We consider a peer-to-peer electricity market, where agents hold private information that they might not want to share. The problem is modeled as a noncooperative communication game, which takes the form of a Generalized Nash Equilibrium Problem, where the agents determine their randomized reports to share with the other market players, while anticipating the form of the peer-to-peer market equili… ▽ More We consider a peer-to-peer electricity market, where agents hold private information that they might not want to share. The problem is modeled as a noncooperative communication game, which takes the form of a Generalized Nash Equilibrium Problem, where the agents determine their randomized reports to share with the other market players, while anticipating the form of the peer-to-peer market equilibrium. In the noncooperative game, each agent decides on the deterministic and random parts of the report, such that (a) the distance between the deterministic part of the report and the truthful private information is bounded and (b) the expectation of the privacy loss random variable is bounded. This allows each agent to change her privacy level. We characterize the equilibrium of the game, prove the uniqueness of the Variational Equilibria and provide a closed form expression of the privacy price. In addition, we provide a closed form expression to measure the impact of the privacy preservation caused by inclusion of random noise and deterministic deviation from agents' true values. Numerical illustrations are presented on the 14-bus IEEE network. △ Less

Submitted 18 January, 2021; originally announced January 2021.

arXiv:2009.12960 [pdf, other]

Control oriented modeling of TCLs

Authors: Austin R. Coffman, Ana Bušić, Prabir Barooah

Abstract: Thermostatically controlled loads (TCLs) have the potential to be a valuable resource for the Balancing Authority (BA) of the future. Examples of TCLs include household appliances such as air conditioners, water heaters, and refrigerators. Since the rated power of each TCL is on the order of kilowatts, to provide meaningful service for the BA, it is necessary to control large collections of TCLs.… ▽ More Thermostatically controlled loads (TCLs) have the potential to be a valuable resource for the Balancing Authority (BA) of the future. Examples of TCLs include household appliances such as air conditioners, water heaters, and refrigerators. Since the rated power of each TCL is on the order of kilowatts, to provide meaningful service for the BA, it is necessary to control large collections of TCLs. To perform design of a distributed coordination/control algorithm, the BA requires a control oriented model that describes the relevant dynamics of an ensemble. Works focusing on solely modeling the ensemble date back to the 1980's, while works focusing on control oriented modeling are more recent. In this work, we contribute to the control oriented modeling literature. We leverage techniques from computational fluid dynamics (CFD) to discretize a pair of Fokker-Planck equations derived in earlier work [1]. The discretized equations are shown to admit a certain factorization, which makes the developed model useful for control design. In particular, the effects of weather and control are shown to independently effect the system dynamics. △ Less

Submitted 27 September, 2020; originally announced September 2020.

Comments: 9 pages, 5 figures

arXiv:2009.10009 [pdf, other]

Flexibility can hurt dynamic matching system performance

Authors: Arnaud Cadas, Josu Doncel, Jean-Michel Fourneau, Ana Bušić

Abstract: We study the performance of general dynamic matching models. This model is defined by a connected graph, where nodes represent the class of items and the edges the compatibilities between items. Items of different classes arrive one by one to the system according to a given probability distribution. Upon arrival, an item is matched with a compatible item according to the First Come First Served di… ▽ More We study the performance of general dynamic matching models. This model is defined by a connected graph, where nodes represent the class of items and the edges the compatibilities between items. Items of different classes arrive one by one to the system according to a given probability distribution. Upon arrival, an item is matched with a compatible item according to the First Come First Served discipline and leave the system immediately, whereas it is enqueued with other items of the same class, if any. We show that such a model may exhibit a non intuitive behavior: increasing the services ability by adding new edges in the matching graph may lead to a larger average population. This is similar to a Braess paradox. We first consider a quasicomplete graph with four nodes and we provide values of the probability distribution of the arrivals such that when we add an edge the mean number of items is larger. Then, we consider an arbitrary matching graph and we show sufficient conditions for the existence or non-existence of this paradox. We conclude that the analog to the Braess paradox in matching models is given when specific independent sets are in saturation, i.e., the system is close to the stability condition. △ Less

Submitted 21 September, 2020; originally announced September 2020.

arXiv:2004.08770 [pdf, other]

Energy Storage Optimization for Grid Reliability

Authors: Md Umar Hashmi, Deepjyoti Deka, Lucas Pereira, Ana Busic

Abstract: Large scale renewable energy integration is being planned for multiple power grids around the world. To achieve secure and stable grid operations, additional resources/reserves are needed to mitigate the inherent intermittency of renewable energy sources (RES). In this paper, we present formulations to understand the effect of fast storage reserves in improving grid reliability under different cos… ▽ More Large scale renewable energy integration is being planned for multiple power grids around the world. To achieve secure and stable grid operations, additional resources/reserves are needed to mitigate the inherent intermittency of renewable energy sources (RES). In this paper, we present formulations to understand the effect of fast storage reserves in improving grid reliability under different cost functions. Our formulations and solution schemes not only aim to minimize imbalance but also maintain state-of-charge (SoC) of storage. In particular, we show that accounting for system response due to inertia and local governor response enables a more realistic quantification of storage requirements for damping net load fluctuations. The storage requirement is significantly lower than values determined when such traditional response are not accounted for. We demonstrate the performance of our designed policies through studies using real data from the Elia TSO in Belgium and BPA agency in the USA. The numerical results enable us to benchmark the marginal effect on reliability due to increasing storage size under different system responses and associated cost functions. △ Less

Submitted 19 April, 2020; originally announced April 2020.

arXiv:2004.02470 [pdf, other]

Risk-Averse Equilibrium Analysis and Computation

Authors: Ilia Shilov, Hélène Le Cadre, Ana Busic

Abstract: We consider two market designs for a network of prosumers, trading energy: (i) a centralized design which acts as a benchmark, and (ii) a peer-to-peer market design. High renewable energy penetration requires that the energy market design properly handles uncertainty. To that purpose, we consider risk neutral models for market designs (i), (ii), and their risk-averse interpretations in which prosu… ▽ More We consider two market designs for a network of prosumers, trading energy: (i) a centralized design which acts as a benchmark, and (ii) a peer-to-peer market design. High renewable energy penetration requires that the energy market design properly handles uncertainty. To that purpose, we consider risk neutral models for market designs (i), (ii), and their risk-averse interpretations in which prosumers are endowed with coherent risk measures reflecting heterogeneity in their risk attitudes. We characterize analytically risk-neutral and risk-averse equilibrium in terms of existence and uniqueness , relying on Generalized Nash Equilibrium and Variational Equilibrium as solution concepts. To hedge their risk towards uncertainty and complete the market, prosumers can trade financial contracts. We provide closed form characterisations of the risk-adjusted probabilities under different market regimes and a distributed algorithm for risk trading mechanism relying on the Generalized potential game structure of the problem. The impact of risk heterogeneity and financial contracts on the prosumers' expected costs are analysed numerically in a three node network and the IEEE 14-bus network. △ Less

Submitted 6 April, 2020; originally announced April 2020.

arXiv:2004.01798 [pdf, other]

Kullback-Leibler-Quadratic Optimal Control

Authors: Neil Cammardella, Ana Bušić, Sean Meyn

Abstract: This paper presents approaches to mean-field control, motivated by distributed control of multi-agent systems. Control solutions are based on a convex optimization problem, whose domain is a convex set of probability mass functions (pmfs). The main contributions follow: 1. Kullback-Leibler-Quadratic (KLQ) optimal control is a special case, in which the objective function is composed of a control c… ▽ More This paper presents approaches to mean-field control, motivated by distributed control of multi-agent systems. Control solutions are based on a convex optimization problem, whose domain is a convex set of probability mass functions (pmfs). The main contributions follow: 1. Kullback-Leibler-Quadratic (KLQ) optimal control is a special case, in which the objective function is composed of a control cost in the form of Kullback-Leibler divergence between a candidate pmf and the nominal, plus a quadratic cost on the sequence of marginals. Theory in this paper extends prior work on deterministic control systems, establishing that the optimal solution is an exponential tilting of the nominal pmf. Transform techniques are introduced to reduce complexity of the KLQ solution, motivated by the need to consider time horizons that are much longer than the inter-sampling times required for reliable control. 2. Infinite-horizon KLQ leads to a state feedback control solution with attractive properties. It can be expressed as either state feedback, in which the state is the sequence of marginal pmfs, or an open loop solution is obtained that is more easily computed. 3. Numerical experiments are surveyed in an application of distributed control of residential loads to provide grid services, similar to utility-scale battery storage. The results show that KLQ optimal control enables the aggregate power consumption of a collection of flexible loads to track a time-varying reference signal, while simultaneously ensuring each individual load satisfies its own quality of service constraints. Keywords: Mean field games, distributed control, Markov decision processes, Demand Dispatch. △ Less

Submitted 8 June, 2023; v1 submitted 3 April, 2020; originally announced April 2020.

Comments: Extended version of paper to appear in SIAM Journal on Control and Optimization (SICON)

MSC Class: 90C40; 93E20; 90C46; 93E35; 60J20

arXiv:2002.04192 [pdf, other]

Energy storage applications for low voltage consumers in Uruguay

Authors: Md Umar Hashmi, Jose Horta, Diego Kiedanski, Ana Bušić, Daniel Kofman

Abstract: Energy storage can be used for many applications in the Smart Grid such as energy arbitrage, peak demand shaving, power factor correction, energy backup to name a few, and can play a major role at increasing the capacity of power networks to host renewable energy sources. Often, storage control algorithms will need to be \textit{tailored} according to power networks billing structure, reliability… ▽ More Energy storage can be used for many applications in the Smart Grid such as energy arbitrage, peak demand shaving, power factor correction, energy backup to name a few, and can play a major role at increasing the capacity of power networks to host renewable energy sources. Often, storage control algorithms will need to be \textit{tailored} according to power networks billing structure, reliability restrictions, and other local power networks norms. In this paper we explore residential energy storage applications in Uruguay, one of the global leaders in renewable energies, where new low-voltage consumer contracts were recently introduced. Based on these billing mechanisms, we focus on energy arbitrage and reactive energy compensation with the aim of minimizing the cost of consumption of an end-user. Given that in the new contacts the buying and selling price of electricity are equal and that reactive power compensation is primarily governed by the installed converter, the storage operation is not sensitive to parameter uncertainties and, therefore, no lookahead is required for decision making. A threshold-based \textit{hierarchical} controller is proposed which decides on the optimal active energy for arbitrage and uses the remaining converter capacity for reactive power compensation, which is shown to increase end-user profit. Numerical results indicate that storage could be profitable, even considering battery degradation, under some but not all of the studied contracts. For the cases in which it is not, we propose the best-suited contract. Results presented here can be naturally applied whenever the tariff structure satisfies the hypothesis considered in this work. △ Less

Submitted 10 February, 2020; originally announced February 2020.

arXiv:2002.04177 [pdf, other]

Towards Phase Balancing using Energy Storage

Authors: Md Umar Hashmi, José Horta, Lucas Pereira, Zachary Lee, Ana Bušić, Daniel Kofman

Abstract: Ad-hoc growth of single-phase-connected distributed energy resources, such as solar generation and electric vehicles, can lead to network unbalance with negative consequences on the quality and efficiency of electricity supply. Case-studies are presented for a substation in Madeira, Portugal and an EV charging facility in Pasadena, California. These case studies show that phase imbalance can happe… ▽ More Ad-hoc growth of single-phase-connected distributed energy resources, such as solar generation and electric vehicles, can lead to network unbalance with negative consequences on the quality and efficiency of electricity supply. Case-studies are presented for a substation in Madeira, Portugal and an EV charging facility in Pasadena, California. These case studies show that phase imbalance can happen due to a large amount of distributed generation (DG) and electric vehicle (EV) integration. We conducted stylized load-flow analysis on a radial distribution network using an openDSS-based simulator to understand such negative effects of phase imbalance on neutral and phase conductor losses, and in voltage drop/rise. We evaluate the integration of storage in the distribution network as a possible solution for mitigating effects caused by imbalance. We present control architectures of storage operation for phase balancing. Numerically we show that relatively small-sized storage (compared to unbalance magnitude) can significantly reduce network imbalance. We identify the end node of the feeder as the best location to install storage. △ Less

Submitted 10 February, 2020; originally announced February 2020.

arXiv:2002.02584 [pdf, other]

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

Authors: Shuhang Chen, Adithya M. Devraj, Ana Bušić, Sean Meyn

Abstract: This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequenc… ▽ More This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm design. △ Less

Submitted 6 February, 2020; originally announced February 2020.

arXiv:2002.01524 [pdf, other]

Storage Optimal Control under Net Metering Policies

Authors: Md Umar Hashmi, Arpan Mukhopadhyay, Ana Bušić, Jocelyne Elias

Abstract: Electricity prices and the end user net load vary with time. Electricity consumers equipped with energy storage devices can perform energy arbitrage, i.e., buy when energy is cheap or when there is a deficit of energy, and sell it when it is expensive or in excess, taking into account future variations in price and net load. Net metering policies indicate that many of the utilities apply a {custom… ▽ More Electricity prices and the end user net load vary with time. Electricity consumers equipped with energy storage devices can perform energy arbitrage, i.e., buy when energy is cheap or when there is a deficit of energy, and sell it when it is expensive or in excess, taking into account future variations in price and net load. Net metering policies indicate that many of the utilities apply a {customer selling} rate lower than or equal to the retail {customer buying rate} in order to compensate excess energy generated by end users. In this paper, we formulate the optimal control problem for an end user energy storage device in presence of net metering. We propose a computationally efficient algorithm, with worst case run time complexity of quadratic in terms of number of samples in lookahead horizon, that computes the optimal energy ramping rates in a time horizon. The proposed algorithm exploits the problem's piecewise linear structure and convexity properties for the \textit{discretization} of optimal Lagrange multipliers. The solution has a \textit{threshold-based structure} in which optimal control decisions are independent of past or future price as well as of net load values beyond a certain time horizon, defined as a \textit{sub-horizon}. Numerical results show the effectiveness of the proposed model and algorithm. Furthermore, we investigate the impact of forecasting errors on the proposed technique. We consider an Auto-Regressive Moving Average (ARMA) based forecasting of net load together with the Model Predictive Control (MPC). We numerically show that adaptive forecasting and MPC significantly mitigate the effects of forecast error on energy arbitrage gains. △ Less

Submitted 4 February, 2020; originally announced February 2020.

arXiv:1911.10458 [pdf, other]

Sizing and Profitability of Energy Storage for Prosumers in Madeira, Portugal

Authors: Md Umar Hashmi, Jonathan Cavaleiro, Lucas Pereira, Ana Bušić

Abstract: This paper proposes a framework to select the best-suited battery for co-optimizing for peak demand shaving, energy arbitrage and increase self-sufficiency in the context of power network in Madeira, Portugal. Feed-in-tariff for electricity network in Madeira is zero, which implies consumers with excess production should locally consume the excess generation rather than wasting it. Further, the po… ▽ More This paper proposes a framework to select the best-suited battery for co-optimizing for peak demand shaving, energy arbitrage and increase self-sufficiency in the context of power network in Madeira, Portugal. Feed-in-tariff for electricity network in Madeira is zero, which implies consumers with excess production should locally consume the excess generation rather than wasting it. Further, the power network {operator} applies a peak power contract for consumers which imposes an upper bound on the peak power seen by the power grid interfaced by energy meter. We investigate the value of storage in Madeira, using four different types of prosumers, categorized based on the relationship between their inelastic load and renewable generation. We observe that the marginal increase in the value of storage deteriorates with increase in size and ramping capabilities. We propose the use of profit per cycle per unit of battery capacity and expected payback period as indices for selecting the best-suited storage parameters to ensure profitability. This mechanism takes into account the consumption and generation patterns, profit, storage degradation, and cycle and calendar life of the battery. We also propose the inclusion of a friction coefficient in the original co-optimization formulation to increase the value of storage by reducing the operational cycles and eliminate low returning transactions. △ Less

Submitted 23 November, 2019; originally announced November 2019.

arXiv:1910.05405 [pdf, other]

Zap Q-Learning With Nonlinear Function Approximation

Authors: Shuhang Chen, Adithya M. Devraj, Fan Lu, Ana Bušić, Sean P. Meyn

Abstract: Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general the… ▽ More Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture. △ Less

Submitted 15 July, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

arXiv:1905.00418 [pdf, other]

Optimal Storage Arbitrage under Net Metering using Linear Programming

Authors: Md Umar Hashmi, Arpan Mukhopadhyay, Ana Bušić, Jocelyne Elias, Diego Kiedanski

Abstract: We formulate the optimal energy arbitrage problem for a piecewise linear cost function for energy storage devices using linear programming (LP). The LP formulation is based on the equivalent minimization of the epigraph. This formulation considers ramping and capacity constraints, charging and discharging efficiency losses of the storage, inelastic consumer load and local renewable generation in p… ▽ More We formulate the optimal energy arbitrage problem for a piecewise linear cost function for energy storage devices using linear programming (LP). The LP formulation is based on the equivalent minimization of the epigraph. This formulation considers ramping and capacity constraints, charging and discharging efficiency losses of the storage, inelastic consumer load and local renewable generation in presence of net-metering which facilitates selling of energy to the grid and incentivizes consumers to install renewable generation and energy storage. We consider the case where the consumer loads, electricity prices, and renewable generations at different instances are uncertain. These uncertain quantities are predicted using an Auto-Regressive Moving Average (ARMA) model and used in a model predictive control (MPC) framework to obtain the arbitrage decision at each instance. In numerical results we present the sensitivity analysis of storage performing arbitrage with varying ramping batteries and different ratio of selling and buying price of electricity. △ Less

Submitted 15 August, 2019; v1 submitted 1 May, 2019; originally announced May 2019.

arXiv:1904.11538 [pdf, other]

Zap Q-Learning for Optimal Stopping Time Problems

Authors: Shuhang Chen, Adithya M. Devraj, Ana Bušić, Sean P. Meyn

Abstract: The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on a compact subset of $\mathbb{R}^n$. We build on the dynamic programming approach taken by Tsitsikilis and Van Roy, wherein they propose a Q-learning algorithm to estimate… ▽ More The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on a compact subset of $\mathbb{R}^n$. We build on the dynamic programming approach taken by Tsitsikilis and Van Roy, wherein they propose a Q-learning algorithm to estimate the optimal state-action value function, which then defines an optimal stopping rule. We provide insights as to why the convergence rate of this algorithm can be slow, and propose a fast-converging alternative, the "Zap-Q-learning" algorithm, designed to achieve optimal rate of convergence. For the first time, we prove the convergence of the Zap-Q-learning algorithm under the assumption of linear function approximation setting. We use ODE analysis for the proof, and the optimal asymptotic variance property of the algorithm is reflected via fast convergence in a finance example. △ Less

Submitted 30 September, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

arXiv:1904.00463 [pdf, other]

Energy Storage in Madeira, Portugal: Co-optimizing for Arbitrage, Self-Sufficiency, Peak Shaving and Energy Backup

Authors: Md Umar Hashmi, Lucas Pereira, Ana Bušić

Abstract: Energy storage applications are explored from a prosumer (consumers with generation) perspective for the island of Madeira in Portugal. These applications could also be relevant to other power networks. We formulate a convex co-optimization problem for performing arbitrage under zero feed-in tariff, increasing self-sufficiency by increasing self-consumption of locally generated renewable energy, p… ▽ More Energy storage applications are explored from a prosumer (consumers with generation) perspective for the island of Madeira in Portugal. These applications could also be relevant to other power networks. We formulate a convex co-optimization problem for performing arbitrage under zero feed-in tariff, increasing self-sufficiency by increasing self-consumption of locally generated renewable energy, provide peak shaving and act as a backup power source during anticipated and scheduled power outages. Using real data from Madeira we perform short and long time-scale simulations in order to select end-user contract which maximizes their gains considering storage degradation based on operational cycles. We observe energy storage ramping capability decides peak shaving potential, fast ramping batteries can significantly reduce peak demand charge. The numerical experiment indicates that storage providing backup does not significantly reduce gains performing arbitrage and peak demand shaving. Furthermore, we also use AutoRegressive Moving Average (ARMA) forecasting along with Model Predictive Control (MPC) for real-time implementation of the proposed optimization problem in the presence of uncertainty. △ Less

Submitted 19 August, 2019; v1 submitted 31 March, 2019; originally announced April 2019.

arXiv:1903.06132 [pdf, other]

Arbitrage with Power Factor Correction using Energy Storage

Authors: Md Umar Hashmi, Deepjyoti Deka, Ana Busic, Lucas Pereira, Scott Backhaus

Abstract: The importance of reactive power compensation for power factor (PF) correction will significantly increase with the large-scale integration of distributed generation interfaced via inverters producing only active power. In this work, we focus on co-optimizing energy storage for performing energy arbitrage as well as local power factor correction. The joint optimization problem is non-convex, but c… ▽ More The importance of reactive power compensation for power factor (PF) correction will significantly increase with the large-scale integration of distributed generation interfaced via inverters producing only active power. In this work, we focus on co-optimizing energy storage for performing energy arbitrage as well as local power factor correction. The joint optimization problem is non-convex, but can be solved efficiently using a McCormick relaxation along with penalty-based schemes. Using numerical simulations on real data and realistic storage profiles, we show that energy storage can correct PF locally without reducing arbitrage profit. It is observed that active and reactive power control is largely decoupled in nature for performing arbitrage and PF correction (PFC). Furthermore, we consider a real-time implementation of the problem with uncertain load, renewable and pricing profiles. We develop a model predictive control based storage control policy using auto-regressive forecast for the uncertainty. We observe that PFC is primarily governed by the size of the converter and therefore, look-ahead in time in the online setting does not affect PFC noticeably. However, arbitrage profit are more sensitive to uncertainty for batteries with faster ramp rates compared to slow ramping batteries. △ Less

Submitted 11 January, 2020; v1 submitted 14 March, 2019; originally announced March 2019.

Comments: 10 pages, 8 figures

arXiv:1903.01803 [pdf, other]

The power disaggregation algorithms and their applications to demand dispatch

Authors: Arnaud Cadas, Ana Busic

Abstract: We were interested in solving a power disaggregation problem which comes down to estimating the power consumption of each device given the total power consumption of the whole house. We started by looking at the Factorial Hierarchical Dirichlet Process - Hidden Semi-Markov Model. However, the inference method had a complexity which scales withthe number of observations. Thus, we developed an onlin… ▽ More We were interested in solving a power disaggregation problem which comes down to estimating the power consumption of each device given the total power consumption of the whole house. We started by looking at the Factorial Hierarchical Dirichlet Process - Hidden Semi-Markov Model. However, the inference method had a complexity which scales withthe number of observations. Thus, we developed an online algorithm based on particle filters. We applied the method to data from Pecan Street https://dataport.cloud/ using Python. We applied the disaggregation algorithm to the control techniques used in Demand Dispatch. △ Less

Submitted 5 March, 2019; originally announced March 2019.

arXiv:1810.08541 [pdf, other]

Optimal Control of Dynamic Bipartite Matching Models

Authors: Arnaud Cadas, Ana Bušić, Josu Doncel

Abstract: A dynamic bipartite matching model is given by a bipartite matching graph which determines the possible matchings between the various types of supply and demand items. Both supply and demand items arrive to the system according to a stochastic process. Matched pairs leave the system and the others wait in the queues, which induces a holding cost. We model this problem as a Markov Decision Process… ▽ More A dynamic bipartite matching model is given by a bipartite matching graph which determines the possible matchings between the various types of supply and demand items. Both supply and demand items arrive to the system according to a stochastic process. Matched pairs leave the system and the others wait in the queues, which induces a holding cost. We model this problem as a Markov Decision Process and study the discounted cost and the average cost problem. We fully characterize the optimal matching policy for complete matching graphs and for the N -shaped matching graph. In the former case, the optimal policy consists of matching everything and, in the latter case, it prioritizes the matchings in the extreme edges and is of threshold type for the diagonal edge. In addition, for the average cost problem, we compute the optimal threshold value. For more general graphs, we need to consider some assumptions on the cost of the nodes. For complete graphs minus one edge, we provide conditions on the cost of the nodes such that the optimal policy of the N-shaped matching graph extends to this case. For acyclic graphs, we show that, when the cost of the extreme edges is large, the optimal matching policy prioritizes the matchings in the extreme edges. We also study the W-shaped matching graph and, using simulations, we show that there are cases where it is not optimal to prioritize to matchings in the extreme edges. △ Less

Submitted 10 September, 2020; v1 submitted 19 October, 2018; originally announced October 2018.

MSC Class: 93E20; 91B68 (Primary); 60J10; 68M20 (Secondary)

arXiv:1809.06277 [pdf, other]

Optimal Matrix Momentum Stochastic Approximation and Applications to Q-learning

Authors: Adithya M. Devraj, Ana Bušić, Sean Meyn

Abstract: Acceleration is an increasingly common theme in the stochastic optimization literature. The two most common examples are Nesterov's method, and Polyak's momentum technique. In this paper two new algorithms are introduced for root finding problems: 1) PolSA is a root finding algorithm with specially designed matrix momentum, and 2) NeSA can be regarded as a variant of Nesterov's algorithm, or a sim… ▽ More Acceleration is an increasingly common theme in the stochastic optimization literature. The two most common examples are Nesterov's method, and Polyak's momentum technique. In this paper two new algorithms are introduced for root finding problems: 1) PolSA is a root finding algorithm with specially designed matrix momentum, and 2) NeSA can be regarded as a variant of Nesterov's algorithm, or a simplification of PolSA. The PolSA algorithm is new even in the context of optimization (when cast as a root finding problem). The research surveyed in this paper is motivated by applications to reinforcement learning. It is well known that most variants of TD- and Q-learning may be cast as SA (stochastic approximation) algorithms, and the tools from general SA theory can be used to investigate convergence and bounds on convergence rate. In particular, the asymptotic variance is a common metric of performance for SA algorithms, and is also one among many metrics used in assessing the performance of stochastic optimization algorithms. There are two well known SA techniques that are known to have optimal asymptotic variance: the Ruppert-Polyak averaging technique, and stochastic Newton-Raphson (SNR). The former algorithm can have extremely bad transient performance, and the latter can be computationally expensive. It is demonstrated here that parameter estimates from the new PolSA algorithm couple with those of the ideal (but more complex) SNR algorithm. The new algorithm is thus a third approach to obtain optimal asymptotic covariance. These strong results require assumptions on the model. A linearized model is considered, and the noise is assumed to be a martingale difference sequence. Numerical results are obtained in a non-linear setting that is the motivation for this work: In PolSA implementations of Q-learning it is observed that coupling occurs with SNR in this non-ideal setting. △ Less

Submitted 5 February, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

arXiv:1807.10244 [pdf, other]

Action-Constrained Markov Decision Processes With Kullback-Leibler Cost

Authors: Ana Bušić, Sean Meyn

Abstract: This paper concerns computation of optimal policies in which the one-step reward function contains a cost term that models Kullback-Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-reward optimality equations reduce to a simple eigenvector problem. Since then many auth… ▽ More This paper concerns computation of optimal policies in which the one-step reward function contains a cost term that models Kullback-Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-reward optimality equations reduce to a simple eigenvector problem. Since then many authors have sought to apply this technique to control problems and models of bounded rationality in economics. A crucial assumption is that the input process is essentially unconstrained. For example, if the nominal dynamics include randomness from nature (e.g., the impact of wind on a moving vehicle), then the optimal control solution does not respect the exogenous nature of this disturbance. This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in which the parameter is a scalar weighting the one-step reward function. The approach is new and practical even in the original unconstrained formulation. △ Less

Submitted 26 July, 2018; originally announced July 2018.

Journal ref: Proceedings of the 31st Conference On Learning Theory (COLT), PMLR 75:1431-1444, 2018

arXiv:1803.02788 [pdf, other]

Loynes construction for the extended bipartite matching

Authors: Pascal Moyal, Ana Busic, Jean Mairesse

Abstract: We propose an explicit construction of the stationary state of Extended Bipartite Matching (EBM) models, as defined in (Busic et. al., 2013). We use a Loynes-type backwards scheme similar in flavor to that in (Moyal et al., 2017), allowing to show the existence and uniqueness of a bi-infinite perfect matching under various conditions, for a large class of matching policies and of bipartite matchin… ▽ More We propose an explicit construction of the stationary state of Extended Bipartite Matching (EBM) models, as defined in (Busic et. al., 2013). We use a Loynes-type backwards scheme similar in flavor to that in (Moyal et al., 2017), allowing to show the existence and uniqueness of a bi-infinite perfect matching under various conditions, for a large class of matching policies and of bipartite matching structures. The key algebraic element of our construction is the sub-additivity of a suitable stochastic recursive representation of the model, satisfied under most usual matching policies. By doing so, we also derive stability conditions for the system under general stationary ergodic assumptions, subsuming the classical markovian settings. △ Less

Submitted 7 March, 2018; originally announced March 2018.

MSC Class: Primary 60J10; 60G10; Secondary 60K25; 05C38; 05C70; 90B15

arXiv:1711.02620 [pdf, other]

doi 10.1017/jpr.2020.100

A product form for the general stochastic matching model

Authors: Pascal Moyal, Ana Busic, Jean Mairesse

Abstract: We consider a stochastic matching model with a general compatibility graph, as introduced in \cite{MaiMoy16}. We show that the natural necessary condition of stability of the system is also sufficient for the natural matching policy 'First Come, First Matched' (FCFM). For doing so, we derive the stationary distribution under a remarkable product form, by using an original dynamic reversibility pro… ▽ More We consider a stochastic matching model with a general compatibility graph, as introduced in \cite{MaiMoy16}. We show that the natural necessary condition of stability of the system is also sufficient for the natural matching policy 'First Come, First Matched' (FCFM). For doing so, we derive the stationary distribution under a remarkable product form, by using an original dynamic reversibility property related to that of \cite{ABMW17} for the bipartite matching model. △ Less

Submitted 5 February, 2020; v1 submitted 7 November, 2017; originally announced November 2017.

MSC Class: 60J10; 60K25; 68R10; 47A35

Journal ref: J. Appl. Probab. 58 (2021) 449-468

arXiv:1610.00813 [pdf, other]

doi 10.24251/HICSS.2017.380

Demand Dispatch with Heterogeneous Intelligent Loads

Authors: Joel Mathias, Ana Bušić, Sean Meyn

Abstract: A distributed control architecture is presented that is intended to make a collection of heterogeneous loads appear to the grid operator as a nearly perfect battery. Local control is based on randomized decision rules advocated in prior research, and extended in this paper to any load with a discrete number of power states. Additional linear filtering at the load ensures that the input-output dyna… ▽ More A distributed control architecture is presented that is intended to make a collection of heterogeneous loads appear to the grid operator as a nearly perfect battery. Local control is based on randomized decision rules advocated in prior research, and extended in this paper to any load with a discrete number of power states. Additional linear filtering at the load ensures that the input-output dynamics of the aggregate has a nearly flat input-output response: the behavior of an ideal, multi-GW battery system. △ Less

Submitted 23 October, 2019; v1 submitted 3 October, 2016; originally announced October 2016.

Comments: Extended version of the paper that was published in Proc. 50th Annual Hawaii International Conference on System Sciences (HICSS), 2017. This version contains an extended appendix that provides details relevant to the simulations, including: (i) design of the optimal linear inverse filter in Appendix A2, and (ii) the creation of nominal transition matrices for TCLs using Monte Carlo in Appendix A3

MSC Class: 90C40; 93E20; 60J22

Journal ref: Proc. 50th Annual Hawaii International Conference on System Sciences (HICSS), 2017

arXiv:1609.00051 [pdf, other]

Estimation and Control of Quality of Service in Demand Dispatch

Authors: Yue Chen, Ana Bušić, Sean Meyn

Abstract: It is now well known that flexibility of energy consumption can be harnessed for the purposes of grid-level ancillary services. In particular, through distributed control of a collection of loads, a balancing authority regulation signal can be tracked accurately, while ensuring that the quality of service (QoS) for each load is acceptable {\it on average}. In this paper it is argued that a histogr… ▽ More It is now well known that flexibility of energy consumption can be harnessed for the purposes of grid-level ancillary services. In particular, through distributed control of a collection of loads, a balancing authority regulation signal can be tracked accurately, while ensuring that the quality of service (QoS) for each load is acceptable {\it on average}. In this paper it is argued that a histogram of QoS is approximately Gaussian, and consequently each load will eventually receive poor service. Statistical techniques are developed to estimate the mean and variance of QoS as a function of the power spectral density of the regulation signal. It is also shown that additional local control can eliminate risk: The histogram of QoS is {\it truncated} through this local control, so that strict bounds on service quality are guaranteed. While there is a tradeoff between the grid-level tracking performance (capacity and accuracy) and the bounds imposed on QoS, it is found that the loss of capacity is minor in typical cases. △ Less

Submitted 31 August, 2016; originally announced September 2016.

Comments: Submitted for publication, August 2016. arXiv admin note: text overlap with arXiv:1409.6941

MSC Class: 60J20; 68M20

arXiv:1605.04591 [pdf, other]

doi 10.1137/16M1100204

Ordinary Differential Equation Methods For Markov Decision Processes and Application to Kullback-Leibler Control Cost

Authors: Ana Bušić, Sean Meyn

Abstract: A new approach to computation of optimal policies for MDP (Markov decision process) models is introduced. The main idea is to solve not one, but an entire family of MDPs, parameterized by a weighting factor $ζ$ that appears in the one-step reward function. For an MDP with $d$ states, the family of value functions $\{ h^*_ζ: ζ\in\Re\}$ is the solution to an ODE,… ▽ More A new approach to computation of optimal policies for MDP (Markov decision process) models is introduced. The main idea is to solve not one, but an entire family of MDPs, parameterized by a weighting factor $ζ$ that appears in the one-step reward function. For an MDP with $d$ states, the family of value functions $\{ h^*_ζ: ζ\in\Re\}$ is the solution to an ODE, $$ \frac{d}{dζ} h^*_ζ= {\cal V}(h^*_ζ) $$ where the vector field ${\cal V}\colon\Re^d\to\Re^d$ has a simple form, based on a matrix inverse. This general methodology is applied to a family of average-cost optimal control models in which the one-step reward function is defined by Kullback-Leibler divergence. The motivation for this reward function in prior work is computation: The solution to the MDP can be expressed in terms of the Perron-Frobenius eigenvector for an associated positive matrix. The drawback with this approach is that no hard constraints on the control are permitted. It is shown here that it is possible to extend this framework to model randomness from nature that cannot be modified by the controller. Perron-Frobenius theory is no longer applicable -- the resulting dynamic programming equations appear as complex as a completely unstructured MDP model. Despite this apparent complexity, it is shown that this class of MDPs admits a solution via this new ODE technique. This approach is new and practical even for the simpler problem in which randomness from nature is absent. △ Less

Submitted 22 October, 2016; v1 submitted 15 May, 2016; originally announced May 2016.

Comments: Submitted to SICON

MSC Class: 90C40; 93E20; 60J22; 93E35; 60J20; 90C46

arXiv:1604.04013 [pdf, other]

doi 10.1214/17-AAP1300

Ergodic Theory for Controlled Markov Chains with Stationary Inputs

Authors: Yue Chen, Ana Bušić, Sean Meyn

Abstract: Consider a stochastic process $\{X(t)\}$ on a finite state space $ {\sf X}=\{1,\dots, d\}$. It is conditionally Markov, given a real-valued `input process' $\{ζ(t)\}$. This is assumed to be small, which is modeled through the scaling, \[ ζ_t = \varepsilon ζ^1_t, \qquad 0\le \varepsilon \le 1\,, \] where $\{ζ^1(t)\}$ is a bounded stationary process. The following conclusions are obtained, subject t… ▽ More Consider a stochastic process $\{X(t)\}$ on a finite state space $ {\sf X}=\{1,\dots, d\}$. It is conditionally Markov, given a real-valued `input process' $\{ζ(t)\}$. This is assumed to be small, which is modeled through the scaling, \[ ζ_t = \varepsilon ζ^1_t, \qquad 0\le \varepsilon \le 1\,, \] where $\{ζ^1(t)\}$ is a bounded stationary process. The following conclusions are obtained, subject to smoothness assumptions on the controlled transition matrix and a mixing condition on $\{ζ(t)\}$: (i) A stationary version of the process is constructed, that is coupled with a stationary version of the Markov chain $\{X^\bullet$(t)\}obtained with $\{ζ(t)\}\equiv 0$. The triple $(\{X(t)\}, \{X^\bullet(t)\},\{ζ(t)\})$ is a jointly stationary process satisfying \[ {\sf P}\{X(t) \neq X^\bullet(t)\} = O(\varepsilon) \] Moreover, a second-order Taylor-series approximation is obtained: \[ {\sf P}\{X(t) =i \} ={\sf P}\{X^\bullet(t) =i \} + \varepsilon^2 \varrho(i) + o(\varepsilon^2),\quad 1\le i\le d, \] with an explicit formula for the vector $\varrho\in\mathbb{R}^d$. (ii) For any $m\ge 1$ and any function $f\colon \{1,\dots,d\}\times \mathbb{R}\to\mathbb{R}^m$, the stationary stochastic process $Y(t) = f(X(t),ζ(t))$ has a power spectral density $\text{S}_f$ that admits a second order Taylor series expansion: A function $\text{S}^{(2)}_f\colon [-π,π] \to \mathbb{C}^{ m\times m}$ is constructed such that \[ \text{S}_f(θ) = \text{S}^\bullet_f(θ) + \varepsilon^2 \text{S}_f^{(2)}(θ) + o(\varepsilon^2),\quad θ\in [-π,π] . \] An explicit formula for the function $\text{S}_f^{(2)}$ is obtained, based in part on the bounds in (i). The results are illustrated using a version of the timing channel of Anantharam and Verdu. △ Less

Submitted 18 June, 2016; v1 submitted 13 April, 2016; originally announced April 2016.

MSC Class: 60J20; 60G10; 68M20; 94A15

arXiv:1603.05966 [pdf, other]

Distributed Randomized Control for Demand Dispatch

Authors: Ana Bušić, Sean Meyn

Abstract: The paper concerns design of control systems for Demand Dispatch to obtain ancillary services to the power grid by harnessing inherent flexibility in many loads. The role of "local intelligence" at the load has been advocated in prior work, randomized local controllers that manifest this intelligence are convenient for loads with a finite number of states. The present work introduces two new desig… ▽ More The paper concerns design of control systems for Demand Dispatch to obtain ancillary services to the power grid by harnessing inherent flexibility in many loads. The role of "local intelligence" at the load has been advocated in prior work, randomized local controllers that manifest this intelligence are convenient for loads with a finite number of states. The present work introduces two new design techniques for these randomized controllers: (i) The Individual Perspective Design (IPD) is based on the solution to a one-dimensional family of Markov Decision Processes, whose objective function is formulated from the point of view of a single load. The family of dynamic programming equation appears complex, but it is shown that it is obtained through the solution of a single ordinary differential equation. (ii) The System Perspective Design (SPD) is motivated by a single objective of the grid operator: Passivity of any linearization of the aggregate input-output model. A solution is obtained that can again be computed through the solution of a single ordinary differential equation. Numerical results complement these theoretical results. △ Less

Submitted 18 March, 2016; originally announced March 2016.

Comments: Submitted to IEEE CDC, March 2016

MSC Class: 90C40; 93E20; 60J22

arXiv:1509.01531 [pdf, other]

doi 10.1109/HICSS.2016.312

Smart Fridge / Dumb Grid? Demand Dispatch for the Power Grid of 2020

Authors: Joel Mathias, Rim Kaddah, Ana Bušić, Sean Meyn

Abstract: In discussions at the 2015 HICSS meeting, it was argued that loads can provide most of the ancillary services required today and in the future. Through load-level and grid-level control design, high-quality ancillary service for the grid is obtained without impacting quality of service delivered to the consumer. This approach to grid regulation is called demand dispatch: loads are providing servic… ▽ More In discussions at the 2015 HICSS meeting, it was argued that loads can provide most of the ancillary services required today and in the future. Through load-level and grid-level control design, high-quality ancillary service for the grid is obtained without impacting quality of service delivered to the consumer. This approach to grid regulation is called demand dispatch: loads are providing service continuously and automatically, without consumer interference. In this paper we ask, what intelligence is required at the grid-level? In particular, does the grid-operator require more than one-way communication to the loads? Our main conclusion: risk is not great in lower frequency ranges, e.g., PJM's RegA or BPA's balancing reserves. In particular, ancillary services from refrigerators and pool-pumps can be obtained successfully with only one-way communication. This requires intelligence at the loads, and much less intelligence at the grid level. △ Less

Submitted 4 September, 2015; originally announced September 2015.

MSC Class: 93E20; 47N70

Journal ref: Proc. 49th Annual Hawaii International Conference on System Sciences (HICSS), 2016

arXiv:1507.05939 [pdf, other]

Reversibility and further properties of FCFS infinite bipartite matching

Authors: Ivo Adan, Ana Busic, Jean Mairesse, Gideon Weiss

Abstract: The model of FCFS infinite bipartite matching was introduced in caldentey-kaplan-weiss 2009. In this model there is a sequence of items that are chosen i.i.d. from $\mathcal{C}=\{c_1,\ldots,c_I\}$ and an independent sequence of items that are chosen i.i.d. from $\mathcal{S}=\{s_1,\ldots,s_J\}$, and a bipartite compatibility graph $G$ between $\mathcal{C}$ and $\mathcal{S}$. Items of the two sequen… ▽ More The model of FCFS infinite bipartite matching was introduced in caldentey-kaplan-weiss 2009. In this model there is a sequence of items that are chosen i.i.d. from $\mathcal{C}=\{c_1,\ldots,c_I\}$ and an independent sequence of items that are chosen i.i.d. from $\mathcal{S}=\{s_1,\ldots,s_J\}$, and a bipartite compatibility graph $G$ between $\mathcal{C}$ and $\mathcal{S}$. Items of the two sequences are matched according to the compatibility graph, and the matching is FCFS, each item in the one sequence is matched to the earliest compatible unmatched item in the other sequence. In adan-weiss 2011 a Markov chain associated with the matching was analyzed, a condition for stability was verified, a product form stationary distribution was derived and the rates $r_{c_i,s_j}$ of matches between compatible types $c_i$ and $s_j$ were calculated. In the current paper, we present several new results that unveil the fundamental structure of the model. First, we provide a pathwise Loynes' type construction which enables to prove the existence of a unique matching for the model defined over all the integers. Second, we prove that the model is dynamically reversible: we define an exchange transformation in which we interchange the positions of each matched pair, and show that the items in the resulting permuted sequences are again independent and i.i.d., and the matching between them is FCFS in reversed time. Third, we obtain product form stationary distributions of several new Markov chains associated with the model. As a by product, we compute useful performance measures, for instance the link lengths between matched items. △ Less

Submitted 22 June, 2017; v1 submitted 21 July, 2015; originally announced July 2015.

Comments: 33 pages, 12 figures

MSC Class: Primary 60J10; Secondary 60K25; 68M20; 90B22

arXiv:1504.04517 [pdf, other]

Speeding up Glauber Dynamics for Random Generation of Independent Sets

Authors: Rémi Varloot, Ana Bušić, Anne Bouillard

Abstract: The maximum independent set (MIS) problem is a well-studied combinatorial optimization problem that naturally arises in many applications, such as wireless communication, information theory and statistical mechanics. MIS problem is NP-hard, thus many results in the literature focus on fast generation of maximal independent sets of high cardinality. One possibility is to combine Gibbs sampling wi… ▽ More The maximum independent set (MIS) problem is a well-studied combinatorial optimization problem that naturally arises in many applications, such as wireless communication, information theory and statistical mechanics. MIS problem is NP-hard, thus many results in the literature focus on fast generation of maximal independent sets of high cardinality. One possibility is to combine Gibbs sampling with coupling from the past arguments to detect convergence to the stationary regime. This results in a sampling procedure with time complexity that depends on the mixing time of the Glauber dynamics Markov chain. We propose an adaptive method for random event generation in the Glauber dynamics that considers only the events that are effective in the coupling from the past scheme, accelerating the convergence time of the Gibbs sampling algorithm. △ Less

Submitted 17 April, 2015; originally announced April 2015.

ACM Class: G.3; G.2.2

arXiv:1504.00088 [pdf, other]

doi 10.1109/TAC.2016.2572880

State Estimation for the Individual and the Population in Mean Field Control with Application to Demand Dispatch

Authors: Yue Chen, Ana Bušić, Sean Meyn

Abstract: This paper concerns state estimation problems in a mean field control setting. In a finite population model, the goal is to estimate the joint distribution of the population state and the state of a typical individual. The observation equations are a noisy measurement of the population. The general results are applied to demand dispatch for regulation of the power grid, based on randomized local… ▽ More This paper concerns state estimation problems in a mean field control setting. In a finite population model, the goal is to estimate the joint distribution of the population state and the state of a typical individual. The observation equations are a noisy measurement of the population. The general results are applied to demand dispatch for regulation of the power grid, based on randomized local control algorithms. In prior work by the authors it has been shown that local control can be carefully designed so that the aggregate of loads behaves as a controllable resource with accuracy matching or exceeding traditional sources of frequency regulation. The operational cost is nearly zero in many cases. The information exchange between grid and load is minimal, but it is assumed in the overall control architecture that the aggregate power consumption of loads is available to the grid operator. It is shown that the Kalman filter can be constructed to reduce these communication requirements, △ Less

Submitted 30 May, 2016; v1 submitted 31 March, 2015; originally announced April 2015.

Comments: To appear, IEEE Trans. Auto. Control. Preliminary version appeared in the 54rd IEEE Conference on Decision and Control, 2015

MSC Class: 93E20; 62M05; 93E10; 68W15

arXiv:1411.1044 [pdf, other]

Approximate optimality with bounded regret in dynamic matching models

Authors: Ana Bušić, Sean Meyn

Abstract: We consider a discrete-time bipartite matching model with random arrivals of units of supply and demand that can wait in queues located at the nodes in the network. A control policy determines which are matched at each time. The focus is on the infinite-horizon average-cost optimal control problem. A relaxation of the stochastic control problem is proposed, which is found to be a special case of a… ▽ More We consider a discrete-time bipartite matching model with random arrivals of units of supply and demand that can wait in queues located at the nodes in the network. A control policy determines which are matched at each time. The focus is on the infinite-horizon average-cost optimal control problem. A relaxation of the stochastic control problem is proposed, which is found to be a special case of an inventory model, as treated in the classical theory of Clark and Scarf. The optimal policy for the relaxation admits a closed-form expression. Based on the policy for this relaxation, a new matching policy is proposed. For a parameterized family of models in which the network load approaches capacity, this policy is shown to be approximately optimal, with bounded regret, even though the average cost grows without bound. △ Less

Submitted 24 June, 2016; v1 submitted 4 November, 2014; originally announced November 2014.

MSC Class: 90B15; 93E20; 91B68; 68M20

arXiv:1409.6941 [pdf, other]

doi 10.1109/CDC.2014.7040397

Individual risk in mean-field control models for decentralized control, with application to automated demand response

Authors: Yue Chen, Ana Bušić, Sean Meyn

Abstract: Flexibility of energy consumption can be harnessed for the purposes of ancillary services in a large power grid. In prior work by the authors a randomized control architecture is introduced for individual loads for this purpose. In examples it is shown that the control architecture can be designed so that control of the loads is easy at the grid level: Tracking of a balancing authority reference s… ▽ More Flexibility of energy consumption can be harnessed for the purposes of ancillary services in a large power grid. In prior work by the authors a randomized control architecture is introduced for individual loads for this purpose. In examples it is shown that the control architecture can be designed so that control of the loads is easy at the grid level: Tracking of a balancing authority reference signal is possible, while ensuring that the quality of service (QoS) for each load is acceptable on average. The analysis was based on a mean field limit (as the number of loads approaches infinity), combined with an LTI-system approximation of the aggregate nonlinear model. This paper examines in depth the issue of individual risk in these systems. The main contributions of the paper are of two kinds: Risk is modeled and quantified: (i) The average performance is not an adequate measure of success. It is found empirically that a histogram of QoS is approximately Gaussian, and consequently each load will eventually receive poor service. (ii) The variance can be estimated from a refinement of the LTI model that includes a white-noise disturbance; variance is a function of the randomized policy, as well as the power spectral density of the reference signal. Additional local control can eliminate risk: (iii) The histogram of QoS is truncated through this local control, so that strict bounds on service quality are guaranteed. (iv) This has insignificant impact on the grid-level performance, beyond a modest reduction in capacity of ancillary service. △ Less

Submitted 24 September, 2014; originally announced September 2014.

Comments: Publication without appendix to appear in the 53rd IEEE Conf. on Decision and Control, December, 2014

Journal ref: Proc. of the 53rd IEEE Conference on Decision and Control, 2014

arXiv:1402.5391 [pdf, other]

Exact Simulation for Assemble-To-Order Systems

Authors: Ana Bušić, Emilie Coupechoux

Abstract: We develop exact simulation (also known as perfect sampling) algorithms for a family of assemble-to-order systems. Due to the finite capacity, and coupling in demands and replenishments, known solving techniques are inefficient for larger problem instances. We first consider the case with individual replenishments of items, and derive an event based representation of the Markov chain that allows a… ▽ More We develop exact simulation (also known as perfect sampling) algorithms for a family of assemble-to-order systems. Due to the finite capacity, and coupling in demands and replenishments, known solving techniques are inefficient for larger problem instances. We first consider the case with individual replenishments of items, and derive an event based representation of the Markov chain that allows applying existing exact simulation techniques, using the monotonicity properties or bounding chains. In the case of joint replenishments, the state space becomes intractable for the existing methods. We propose new exact simulation algorithms, based on aggregation and bounding chains, that allow a significant reduction of the state space of the Markov chain. We also discuss the coupling times of considered models and provide sufficient conditions for linear (in the single server replenishment case) or quadratic (many server case) complexity of our algorithms in terms of the total capacity in the system. △ Less

Submitted 21 February, 2014; originally announced February 2014.

arXiv:1402.4618 [pdf, other]

doi 10.1109/CDC.2014.7039805

Passive Dynamics in Mean Field Control

Authors: Ana Bušić, Sean Meyn

Abstract: Mean-field models are a popular tool in a variety of fields. They provide an understanding of the impact of interactions among a large number of particles or people or other "self-interested agents", and are an increasingly popular tool in distributed control. This paper considers a particular randomized distributed control architecture introduced in our own recent work. In numerical results it… ▽ More Mean-field models are a popular tool in a variety of fields. They provide an understanding of the impact of interactions among a large number of particles or people or other "self-interested agents", and are an increasingly popular tool in distributed control. This paper considers a particular randomized distributed control architecture introduced in our own recent work. In numerical results it was found that the associated mean-field model had attractive properties for purposes of control. In particular, when viewed as an input-output system, its linearization was found to be minimum phase. In this paper we take a closer look at the control model. The results are summarized as follows: (i) The Markov Decision Process framework of Todorov is extended to continuous time models, in which the "control cost" is based on relative entropy. This is the basis of the construction of a family of controlled Markovian generators. (ii) A decentralized control architecture is proposed in which each agent evolves as a controlled Markov process. A central authority broadcasts a common control signal to each agent. The central authority chooses this signal based on an aggregate scalar output of the Markovian agents. (iii) Provided the control-free system is a reversible Markov process, the following identity holds for the linearization, \[ \text{Real} (G(jω)) = \text{PSD}_Y(ω)\ge 0, \quad ω\in\Re, \] where the right hand side denotes the power spectral density for the output of any one of the individual (control-free) Markov processes. △ Less

Submitted 24 September, 2014; v1 submitted 19 February, 2014; originally announced February 2014.

Comments: To appear IEEE CDC, 2014

Journal ref: Proc. 53rd IEEE Conference on Decision and Control, 2014

arXiv:1402.4600 [pdf, other]

doi 10.1109/TAC.2015.2414772

Ancillary Service to the Grid Using Intelligent Deferrable Loads

Authors: Sean Meyn, Prabir Barooah, Ana Bušić, Yue Chen, Jordan Ehren

Abstract: Renewable energy sources such as wind and solar power have a high degree of unpredictability and time-variation, which makes balancing demand and supply challenging. One possible way to address this challenge is to harness the inherent flexibility in demand of many types of loads. Introduced in this paper is a technique for decentralized control for automated demand response that can be used by gr… ▽ More Renewable energy sources such as wind and solar power have a high degree of unpredictability and time-variation, which makes balancing demand and supply challenging. One possible way to address this challenge is to harness the inherent flexibility in demand of many types of loads. Introduced in this paper is a technique for decentralized control for automated demand response that can be used by grid operators as ancillary service for maintaining demand-supply balance. A Markovian Decision Process (MDP) model is introduced for an individual load. A randomized control architecture is proposed, motivated by the need for decentralized decision making, and the need to avoid synchronization that can lead to large and detrimental spikes in demand. An aggregate model for a large number of loads is then developed by examining the mean field limit. A key innovation is an LTI-system approximation of the aggregate nonlinear model, with a scalar signal as the input and a measure of the aggregate demand as the output. This makes the approximation particularly convenient for control design at the grid level. The second half of the paper contains a detailed application of these results to a network of residential pools. Simulations are provided to illustrate the accuracy of the approximations and effectiveness of the proposed control approach. △ Less

Submitted 19 February, 2014; originally announced February 2014.

MSC Class: 93E20; 90C40;

arXiv:1111.4582 [pdf, other]

Density classification on infinite lattices and trees

Authors: Ana Busic, Nazim Fates, Jean Mairesse, Irene Marcovici

Abstract: Consider an infinite graph with nodes initially labeled by independent Bernoulli random variables of parameter p. We address the density classification problem, that is, we want to design a (probabilistic or deterministic) cellular automaton or a finite-range interacting particle system that evolves on this graph and decides whether p is smaller or larger than 1/2. Precisely, the trajectories shou… ▽ More Consider an infinite graph with nodes initially labeled by independent Bernoulli random variables of parameter p. We address the density classification problem, that is, we want to design a (probabilistic or deterministic) cellular automaton or a finite-range interacting particle system that evolves on this graph and decides whether p is smaller or larger than 1/2. Precisely, the trajectories should converge to the uniform configuration with only 0's if p<1/2, and only 1's if p>1/2. We present solutions to that problem on the d-dimensional lattice, for any d>1, and on the regular infinite trees. For Z, we propose some candidates that we back up with numerical simulations. △ Less

Submitted 19 November, 2011; originally announced November 2011.

arXiv:1012.2910 [pdf, other]

Perfect Sampling of Markov Chains with Piecewise Homogeneous Events

Authors: Ana Bušić, Bruno Gaujal, Furcy Pin

Abstract: Perfect sampling is a technique that uses coupling arguments to provide a sample from the stationary distribution of a Markov chain in a finite time without ever computing the distribution. This technique is very efficient if all the events in the system have monotonicity property. However, in the general (non-monotone) case, this technique needs to consider the whole state space, which limits i… ▽ More Perfect sampling is a technique that uses coupling arguments to provide a sample from the stationary distribution of a Markov chain in a finite time without ever computing the distribution. This technique is very efficient if all the events in the system have monotonicity property. However, in the general (non-monotone) case, this technique needs to consider the whole state space, which limits its application only to chains with a state space of small cardinality. We propose here a new approach for the general case that only needs to consider two trajectories. Instead of the original chain, we use two bounding processes (envelopes) and we show that, whenever they couple, one obtains a sample under the stationary distribution of the original chain. We show that this new approach is particularly effective when the state space can be partitioned into pieces where envelopes can be easily computed. We further show that most Markovian queueing networks have this property and we propose efficient algorithms for some of them. △ Less

Submitted 18 January, 2011; v1 submitted 13 December, 2010; originally announced December 2010.

MSC Class: Primary: 60J10; 65C05. Secondary: 68M20

arXiv:1010.3133 [pdf, other]

Probabilistic cellular automata, invariant measures, and perfect sampling

Authors: Ana Busic, Jean Mairesse, Irene Marcovici

Abstract: A probabilistic cellular automaton (PCA) can be viewed as a Markov chain. The cells are updated synchronously and independently, according to a distribution depending on a finite neighborhood. We investigate the ergodicity of this Markov chain. A classical cellular automaton is a particular case of PCA. For a 1-dimensional cellular automaton, we prove that ergodicity is equivalent to nilpotency, a… ▽ More A probabilistic cellular automaton (PCA) can be viewed as a Markov chain. The cells are updated synchronously and independently, according to a distribution depending on a finite neighborhood. We investigate the ergodicity of this Markov chain. A classical cellular automaton is a particular case of PCA. For a 1-dimensional cellular automaton, we prove that ergodicity is equivalent to nilpotency, and is therefore undecidable. We then propose an efficient perfect sampling algorithm for the invariant measure of an ergodic PCA. Our algorithm does not assume any monotonicity property of the local rule. It is based on a bounding process which is shown to be also a PCA. Last, we focus on the PCA Majority, whose asymptotic behavior is unknown, and perform numerical experiments using the perfect sampling procedure. △ Less

Submitted 16 March, 2011; v1 submitted 15 October, 2010; originally announced October 2010.

MSC Class: Primary: 37B15; 60J05; 60J22. Secondary: 37A25; 60K35; 68Q80 ACM Class: G.3; F.1.2

arXiv:1003.3477 [pdf, other]

Stability of the bipartite matching model

Authors: Ana Bušić, Varun Gupta, Jean Mairesse

Abstract: We consider the bipartite matching model of customers and servers introduced by Caldentey, Kaplan, and Weiss (Adv. Appl. Probab., 2009). Customers and servers play symmetrical roles. There is a finite set C resp. S, of customer, resp. server, classes. Time is discrete and at each time step, one customer and one server arrive in the system according to a joint probability measure on CxS, independen… ▽ More We consider the bipartite matching model of customers and servers introduced by Caldentey, Kaplan, and Weiss (Adv. Appl. Probab., 2009). Customers and servers play symmetrical roles. There is a finite set C resp. S, of customer, resp. server, classes. Time is discrete and at each time step, one customer and one server arrive in the system according to a joint probability measure on CxS, independently of the past. Also, at each time step, pairs of matched customer and server, if they exist, depart from the system. Authorized matchings are given by a fixed bipartite graph. A matching policy is chosen, which decides how to match when there are several possibilities. Customers/servers that cannot be matched are stored in a buffer. The evolution of the model can be described by a discrete time Markov chain. We study its stability under various admissible matching policies including: ML (Match the Longest), MS (Match the Shortest), FIFO (match the oldest), priorities. There exist natural necessary conditions for stability (independent of the matching policy) defining the maximal possible stability region. For some bipartite graphs, we prove that the stability region is indeed maximal for any admissible matching policy. For the ML policy, we prove that the stability region is maximal for any bipartite graph. For the MS and priority policies, we exhibit a bipartite graph with a non-maximal stability region. △ Less

Submitted 17 March, 2010; originally announced March 2010.

MSC Class: 60J10; 60K25; 68M20; 05C21

Showing 1–47 of 47 results for author: Bušić, A