-
SARC: Soft Actor Retrospective Critic
Authors:
Sukriti Verma,
Ayush Chopra,
Jayakumar Subramanian,
Mausoom Sarkar,
Nikaash Puri,
Piyush Gupta,
Balaji Krishnamurthy
Abstract:
The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures eventual consistency between the two. Various strategies have been introduced in literature to learn better gradient estimates to help achieve better convergence.…
▽ More
The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures eventual consistency between the two. Various strategies have been introduced in literature to learn better gradient estimates to help achieve better convergence. Since gradient estimates depend upon the critic, we posit that improving the critic can provide a better gradient estimate for the actor at each time. Utilizing this, we propose Soft Actor Retrospective Critic (SARC), where we augment the SAC critic loss with another loss term - retrospective loss - leading to faster critic convergence and consequently, better policy gradient estimates for the actor. An existing implementation of SAC can be easily adapted to SARC with minimal modifications. Through extensive experimentation and analysis, we show that SARC provides consistent improvement over SAC on benchmark environments. We plan to open-source the code and all experiment data at: https://github.com/sukritiverma1996/SARC.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
HyHTM: Hyperbolic Geometry based Hierarchical Topic Models
Authors:
Simra Shahid,
Tanay Anand,
Nikitha Srikanth,
Sumit Bhatia,
Balaji Krishnamurthy,
Nikaash Puri
Abstract:
Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents. However, traditional HTMs often produce hierarchies where lowerlevel topics are unrelated and not specific enough to their higher-level topics. Additionally, these methods can be computationally expensive. We present HyHTM - a Hyperbolic geometry based Hierarchical Topic Models - that addres…
▽ More
Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents. However, traditional HTMs often produce hierarchies where lowerlevel topics are unrelated and not specific enough to their higher-level topics. Additionally, these methods can be computationally expensive. We present HyHTM - a Hyperbolic geometry based Hierarchical Topic Models - that addresses these limitations by incorporating hierarchical information from hyperbolic geometry to explicitly model hierarchies in topic models. Experimental results with four baselines show that HyHTM can better attend to parent-child relationships among topics. HyHTM produces coherent topic hierarchies that specialise in granularity from generic higher-level topics to specific lowerlevel topics. Further, our model is significantly faster and leaves a much smaller memory footprint than our best-performing baseline.We have made the source code for our algorithm publicly accessible.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Q-means using variational quantum feature embedding
Authors:
Arvind S Menon,
Nikaash Puri
Abstract:
This paper proposes a hybrid quantum-classical algorithm that learns a suitable quantum feature map that separates unlabelled data that is originally non linearly separable in the classical space using a Variational quantum feature map and q-means as a subroutine for unsupervised learning. The objective of the Variational circuit is to maximally separate the clusters in the quantum feature Hilbert…
▽ More
This paper proposes a hybrid quantum-classical algorithm that learns a suitable quantum feature map that separates unlabelled data that is originally non linearly separable in the classical space using a Variational quantum feature map and q-means as a subroutine for unsupervised learning. The objective of the Variational circuit is to maximally separate the clusters in the quantum feature Hilbert space. First part of the circuit embeds the classical data into quantum states. Second part performs unsupervised learning on the quantum states in the quantum feature Hilbert space using the q-means quantum circuit. The output of the quantum circuit are characteristic cluster quantum states that represent a superposition of all quantum states belonging to a particular cluster. The final part of the quantum circuit performs measurements on the characteristic cluster quantum states to output the inter-cluster overlap based on fidelity. The output of the complete quantum circuit is used to compute the value of the cost function that is based on the Hilbert-Schmidt distance between the density matrices of the characteristic cluster quantum states. The gradient of the expectation value is used to optimize the parameters of the variational circuit to learn a better quantum feature map.
△ Less
Submitted 11 December, 2021;
originally announced December 2021.
-
Status-quo policy gradient in Multi-Agent Reinforcement Learning
Authors:
Pinkesh Badjatiya,
Mausoom Sarkar,
Nikaash Puri,
Jayakumar Subramanian,
Abhishek Sinha,
Siddharth Singh,
Balaji Krishnamurthy
Abstract:
Individual rationality, which involves maximizing expected individual returns, does not always lead to high-utility individual or group outcomes in multi-agent problems. For instance, in multi-agent social dilemmas, Reinforcement Learning (RL) agents trained to maximize individual rewards converge to a low-utility mutually harmful equilibrium. In contrast, humans evolve useful strategies in such s…
▽ More
Individual rationality, which involves maximizing expected individual returns, does not always lead to high-utility individual or group outcomes in multi-agent problems. For instance, in multi-agent social dilemmas, Reinforcement Learning (RL) agents trained to maximize individual rewards converge to a low-utility mutually harmful equilibrium. In contrast, humans evolve useful strategies in such social dilemmas. Inspired by ideas from human psychology that attribute this behavior to the status-quo bias, we present a status-quo loss (SQLoss) and the corresponding policy gradient algorithm that incorporates this bias in an RL agent. We demonstrate that agents trained with SQLoss learn high-utility policies in several social dilemma matrix games (Prisoner's Dilemma, Stag Hunt matrix variant, Chicken Game). We show how SQLoss outperforms existing state-of-the-art methods to obtain high-utility policies in visual input non-matrix games (Coin Game and Stag Hunt visual input variant) using pre-trained cooperation and defection oracles. Finally, we show that SQLoss extends to a 4-agent setting by demonstrating the emergence of cooperative behavior in the popular Braess' paradox.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Mixed Reality using Illumination-aware Gradient Mixing in Surgical Telepresence: Enhanced Multi-layer Visualization
Authors:
Nirakar Puri,
Abeer Alsadoon,
P. W. C. Prasad,
Nada Alsalami,
Tarik A. Rashid
Abstract:
Background and aim: Surgical telepresence using augmented perception has been applied, but mixed reality is still being researched and is only theoretical. The aim of this work is to propose a solution to improve the visualization in the final merged video by producing globally consistent videos when the intensity of illumination in the input source and target video varies. Methodology: The propos…
▽ More
Background and aim: Surgical telepresence using augmented perception has been applied, but mixed reality is still being researched and is only theoretical. The aim of this work is to propose a solution to improve the visualization in the final merged video by producing globally consistent videos when the intensity of illumination in the input source and target video varies. Methodology: The proposed system uses an enhanced multi-layer visualization with illumination-aware gradient mixing using Illumination Aware Video Composition algorithm. Particle Swarm Optimization Algorithm is used to find the best sample pair from foreground and background region and image pixel correlation to estimate the alpha matte. Particle Swarm Optimization algorithm helps to get the original colour and depth of the unknown pixel in the unknown region. Result: Our results showed improved accuracy caused by reducing the Mean squared Error for selecting the best sample pair for unknown region in 10 each sample for bowel, jaw and breast. The amount of this reduction is 16.48% from the state of art system. As a result, the visibility accuracy is improved from 89.4 to 97.7% which helped to clear the hand vision even in the difference of light. Conclusion: Illumination effect and alpha pixel correlation improves the visualization accuracy and produces a globally consistent composition results and maintains the temporal coherency when compositing two videos with high and inverse illumination effect. In addition, this paper provides a solution for selecting the best sampling pair for the unknown region to obtain the original colour and depth.
△ Less
Submitted 21 August, 2021;
originally announced October 2021.
-
Video2Skill: Adapting Events in Demonstration Videos to Skills in an Environment using Cyclic MDP Homomorphisms
Authors:
Sumedh A Sontakke,
Sumegh Roychowdhury,
Mausoom Sarkar,
Nikaash Puri,
Balaji Krishnamurthy,
Laurent Itti
Abstract:
Humans excel at learning long-horizon tasks from demonstrations augmented with textual commentary, as evidenced by the burgeoning popularity of tutorial videos online. Intuitively, this capability can be separated into 2 distinct subtasks - first, dividing a long-horizon demonstration sequence into semantically meaningful events; second, adapting such events into meaningful behaviors in one's own…
▽ More
Humans excel at learning long-horizon tasks from demonstrations augmented with textual commentary, as evidenced by the burgeoning popularity of tutorial videos online. Intuitively, this capability can be separated into 2 distinct subtasks - first, dividing a long-horizon demonstration sequence into semantically meaningful events; second, adapting such events into meaningful behaviors in one's own environment. Here, we present Video2Skill (V2S), which attempts to extend this capability to artificial agents by allowing a robot arm to learn from human cooking videos. We first use sequence-to-sequence Auto-Encoder style architectures to learn a temporal latent space for events in long-horizon demonstrations. We then transfer these representations to the robotic target domain, using a small amount of offline and unrelated interaction data (sequences of state-action pairs of the robot arm controlled by an expert) to adapt these events into actionable representations, i.e., skills. Through experiments, we demonstrate that our approach results in self-supervised analogy learning, where the agent learns to draw analogies between motions in human demonstration data and behaviors in the robotic environment. We also demonstrate the efficacy of our approach on model learning - demonstrating how Video2Skill utilizes prior knowledge from human demonstration to outperform traditional model learning of long-horizon dynamics. Finally, we demonstrate the utility of our approach for non-tabula rasa decision-making, i.e, utilizing video demonstration for zero-shot skill generation.
△ Less
Submitted 9 September, 2021; v1 submitted 8 September, 2021;
originally announced September 2021.
-
Information-theoretic Evolution of Model Agnostic Global Explanations
Authors:
Sukriti Verma,
Nikaash Puri,
Piyush Gupta,
Balaji Krishnamurthy
Abstract:
Explaining the behavior of black box machine learning models through human interpretable rules is an important research area. Recent work has focused on explaining model behavior locally i.e. for specific predictions as well as globally across the fields of vision, natural language, reinforcement learning and data science. We present a novel model-agnostic approach that derives rules to globally e…
▽ More
Explaining the behavior of black box machine learning models through human interpretable rules is an important research area. Recent work has focused on explaining model behavior locally i.e. for specific predictions as well as globally across the fields of vision, natural language, reinforcement learning and data science. We present a novel model-agnostic approach that derives rules to globally explain the behavior of classification models trained on numerical and/or categorical data. Our approach builds on top of existing local model explanation methods to extract conditions important for explaining model behavior for specific instances followed by an evolutionary algorithm that optimizes an information theory based fitness function to construct rules that explain global model behavior. We show how our approach outperforms existing approaches on a variety of datasets. Further, we introduce a parameter to evaluate the quality of interpretation under the scenario of distributional shift. This parameter evaluates how well the interpretation can predict model behavior for previously unseen data distributions. We show how existing approaches for interpreting models globally lack distributional robustness. Finally, we show how the quality of the interpretation can be improved under the scenario of distributional shift by adding out of distribution samples to the dataset used to learn the interpretation and thereby, increase robustness. All of the datasets used in our paper are open and publicly available. Our approach has been deployed in a leading digital marketing suite of products.
△ Less
Submitted 14 May, 2021;
originally announced May 2021.
-
SHERLock: Self-Supervised Hierarchical Event Representation Learning
Authors:
Sumegh Roychowdhury,
Sumedh A. Sontakke,
Nikaash Puri,
Mausoom Sarkar,
Milan Aggarwal,
Pinkesh Badjatiya,
Balaji Krishnamurthy,
Laurent Itti
Abstract:
Temporal event representations are an essential aspect of learning among humans. They allow for succinct encoding of the experiences we have through a variety of sensory inputs. Also, they are believed to be arranged hierarchically, allowing for an efficient representation of complex long-horizon experiences. Additionally, these representations are acquired in a self-supervised manner. Analogously…
▽ More
Temporal event representations are an essential aspect of learning among humans. They allow for succinct encoding of the experiences we have through a variety of sensory inputs. Also, they are believed to be arranged hierarchically, allowing for an efficient representation of complex long-horizon experiences. Additionally, these representations are acquired in a self-supervised manner. Analogously, here we propose a model that learns temporal representations from long-horizon visual demonstration data and associated textual descriptions, without explicit temporal supervision. Our method produces a hierarchy of representations that align more closely with ground-truth human-annotated events (+15.3) than state-of-the-art unsupervised baselines.
Our results are comparable to heavily-supervised baselines in complex visual domains such as Chess Openings, YouCook2 and TutorialVQA datasets. Finally, we perform ablation studies illustrating the robustness of our approach. We release our code and demo visualizations in the Supplementary Material.
△ Less
Submitted 22 August, 2022; v1 submitted 6 October, 2020;
originally announced October 2020.
-
MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance
Authors:
Anubha Kabra,
Ayush Chopra,
Nikaash Puri,
Pinkesh Badjatiya,
Sukriti Verma,
Piyush Gupta,
Balaji K
Abstract:
Training a classification model on a dataset where the instances of one class outnumber those of the other class is a challenging problem. Such imbalanced datasets are standard in real-world situations such as fraud detection, medical diagnosis, and computational advertising. We propose an iterative data augmentation method, MixBoost, which intelligently selects (Boost) and then combines (Mix) ins…
▽ More
Training a classification model on a dataset where the instances of one class outnumber those of the other class is a challenging problem. Such imbalanced datasets are standard in real-world situations such as fraud detection, medical diagnosis, and computational advertising. We propose an iterative data augmentation method, MixBoost, which intelligently selects (Boost) and then combines (Mix) instances from the majority and minority classes to generate synthetic hybrid instances that have characteristics of both classes. We evaluate MixBoost on 20 benchmark datasets, show that it outperforms existing approaches, and test its efficacy through significance testing. We also present ablation studies to analyze the impact of the different components of MixBoost.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss
Authors:
Pinkesh Badjatiya,
Mausoom Sarkar,
Abhishek Sinha,
Siddharth Singh,
Nikaash Puri,
Jayakumar Subramanian,
Balaji Krishnamurthy
Abstract:
In social dilemma situations, individual rationality leads to sub-optimal group outcomes. Several human engagements can be modeled as a sequential (multi-step) social dilemmas. However, in contrast to humans, Deep Reinforcement Learning agents trained to optimize individual rewards in sequential social dilemmas converge to selfish, mutually harmful behavior. We introduce a status-quo loss (SQLoss)…
▽ More
In social dilemma situations, individual rationality leads to sub-optimal group outcomes. Several human engagements can be modeled as a sequential (multi-step) social dilemmas. However, in contrast to humans, Deep Reinforcement Learning agents trained to optimize individual rewards in sequential social dilemmas converge to selfish, mutually harmful behavior. We introduce a status-quo loss (SQLoss) that encourages an agent to stick to the status quo, rather than repeatedly changing its policy. We show how agents trained with SQLoss evolve cooperative behavior in several social dilemma matrix games. To work with social dilemma games that have visual input, we propose GameDistill. GameDistill uses self-supervision and clustering to automatically extract cooperative and selfish policies from a social dilemma game. We combine GameDistill and SQLoss to show how agents evolve socially desirable cooperative behavior in the Coin Game.
△ Less
Submitted 13 February, 2020; v1 submitted 15 January, 2020;
originally announced January 2020.
-
Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution
Authors:
Nikaash Puri,
Sukriti Verma,
Piyush Gupta,
Dhruv Kayastha,
Shripad Deshmukh,
Balaji Krishnamurthy,
Sameer Singh
Abstract:
As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant t…
▽ More
As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our proposed approach, SARFA (Specific and Relevant Feature Attribution), generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweighs irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare SARFA with existing approaches on agents trained to play board games (Chess and Go) and Atari games (Breakout, Pong and Space Invaders). We show through illustrative examples (Chess, Atari, Go), human studies (Chess), and automated evaluation methods (Chess) that SARFA generates saliency maps that are more interpretable for humans than existing approaches. For the code release and demo videos, see https://nikaashpuri.github.io/sarfa-saliency/.
△ Less
Submitted 3 April, 2020; v1 submitted 23 December, 2019;
originally announced December 2019.
-
OpticalGAN : Generative Adversarial Networks for Continuous Variable Quantum Computation
Authors:
Nilay Shrivastava,
Nikaash Puri,
Piyush Gupta,
Balaji Krishnamurthy,
Sukriti Verma
Abstract:
We present OpticalGAN, an extension of quantum generative adversarial networks for continuous-variable quantum computation. OpticalGAN consists of photonic variational circuits comprising of optical Gaussian and Kerr gates. Photonic quantum computation is a realization of continuous variable quantum computing which involves encoding and processing information in the continuous quadrature amplitude…
▽ More
We present OpticalGAN, an extension of quantum generative adversarial networks for continuous-variable quantum computation. OpticalGAN consists of photonic variational circuits comprising of optical Gaussian and Kerr gates. Photonic quantum computation is a realization of continuous variable quantum computing which involves encoding and processing information in the continuous quadrature amplitudes of quantized electromagnetic field such as light. Information processing in photonic quantum computers is performed using optical gates on squeezed light. Both the generator and discriminator of OpticalGAN are short depth variational circuits composed of gaussian and non-gaussian gates. We demonstrate our approach by using OpticalGAN to generate energy eigenstates and coherent states. All of our code is available at https://github.com/abcd1729/opticalgan.
△ Less
Submitted 15 September, 2019;
originally announced September 2019.
-
MAGIX: Model Agnostic Globally Interpretable Explanations
Authors:
Nikaash Puri,
Piyush Gupta,
Pratiksha Agarwal,
Sukriti Verma,
Balaji Krishnamurthy
Abstract:
Explaining the behavior of a black box machine learning model at the instance level is useful for building trust. However, it is also important to understand how the model behaves globally. Such an understanding provides insight into both the data on which the model was trained and the patterns that it learned. We present here an approach that learns if-then rules to globally explain the behavior…
▽ More
Explaining the behavior of a black box machine learning model at the instance level is useful for building trust. However, it is also important to understand how the model behaves globally. Such an understanding provides insight into both the data on which the model was trained and the patterns that it learned. We present here an approach that learns if-then rules to globally explain the behavior of black box machine learning models that have been used to solve classification problems. The approach works by first extracting conditions that were important at the instance level and then evolving rules through a genetic algorithm with an appropriate fitness function. Collectively, these rules represent the patterns followed by the model for decisioning and are useful for understanding its behavior. We demonstrate the validity and usefulness of the approach by interpreting black box models created using publicly available data sets as well as a private digital marketing data set.
△ Less
Submitted 15 June, 2018; v1 submitted 21 June, 2017;
originally announced June 2017.
-
Surface wake field model of beam-foil circular Rydberg states
Authors:
Gaurav Sharma,
Nitin Kumar Puri,
Adya Prasad Mishra,
Tapan Nandi
Abstract:
Production of projectile Rydberg states in fast ion-solid collisions in H-like ions exhibits a pronounce target thickness dependence in spite of these states forming at the last layers. This occurs due to important role of the surface wake field which varies with the target foil thickness. Further, according to the proposed model Rydberg states with low angular momentum are transformed into a circ…
▽ More
Production of projectile Rydberg states in fast ion-solid collisions in H-like ions exhibits a pronounce target thickness dependence in spite of these states forming at the last layers. This occurs due to important role of the surface wake field which varies with the target foil thickness. Further, according to the proposed model Rydberg states with low angular momentum are transformed into a circular Rydberg states while passing through the field. The transfer occurs by a single multiphoton process with high probability depending upon the projectile ion velocity with respect to the Fermi velocity of the target electrons.
△ Less
Submitted 29 December, 2015;
originally announced December 2015.
-
X-ray spectroscopy technique for the pile-up region
Authors:
Gaurav Sharma,
Deepak Swami,
Basu Kumar,
Nitin Kumar Puri,
Tapan Nandi
Abstract:
We report a pile-up rejection technique based on X-ray absorption concept of Beer-Lambert law for measuring true events in the pile-up region. We have detected a 10^4 times weaker peak in the pile-up region. This technique also enables one to resolve the weak peaks adjacent to an intense peak provided the later lies in the lower energy side, and the peaks are at least theoretically resolvable by t…
▽ More
We report a pile-up rejection technique based on X-ray absorption concept of Beer-Lambert law for measuring true events in the pile-up region. We have detected a 10^4 times weaker peak in the pile-up region. This technique also enables one to resolve the weak peaks adjacent to an intense peak provided the later lies in the lower energy side, and the peaks are at least theoretically resolvable by the detector used. We have resolved such peaks by reducing the intensity ratios in our experiment. The technique allows us to obtain the actual intensities of the observed peaks to have been measured without any attenuator. The possible applications of this technique can be to study the physics of two electron one-photon transition as well as the properties of projectile-like or target-like ions
△ Less
Submitted 19 January, 2016; v1 submitted 28 December, 2015;
originally announced December 2015.