subscribe to arXiv mailings

SARC: Soft Actor Retrospective Critic

Authors: Sukriti Verma, Ayush Chopra, Jayakumar Subramanian, Mausoom Sarkar, Nikaash Puri, Piyush Gupta, Balaji Krishnamurthy

Abstract: The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures eventual consistency between the two. Various strategies have been introduced in literature to learn better gradient estimates to help achieve better convergence.… ▽ More The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures eventual consistency between the two. Various strategies have been introduced in literature to learn better gradient estimates to help achieve better convergence. Since gradient estimates depend upon the critic, we posit that improving the critic can provide a better gradient estimate for the actor at each time. Utilizing this, we propose Soft Actor Retrospective Critic (SARC), where we augment the SAC critic loss with another loss term - retrospective loss - leading to faster critic convergence and consequently, better policy gradient estimates for the actor. An existing implementation of SAC can be easily adapted to SARC with minimal modifications. Through extensive experimentation and analysis, we show that SARC provides consistent improvement over SAC on benchmark environments. We plan to open-source the code and all experiment data at: https://github.com/sukritiverma1996/SARC. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: Accepted at RLDM 2022

arXiv:2305.09258 [pdf, other]

HyHTM: Hyperbolic Geometry based Hierarchical Topic Models

Authors: Simra Shahid, Tanay Anand, Nikitha Srikanth, Sumit Bhatia, Balaji Krishnamurthy, Nikaash Puri

Abstract: Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents. However, traditional HTMs often produce hierarchies where lowerlevel topics are unrelated and not specific enough to their higher-level topics. Additionally, these methods can be computationally expensive. We present HyHTM - a Hyperbolic geometry based Hierarchical Topic Models - that addres… ▽ More Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents. However, traditional HTMs often produce hierarchies where lowerlevel topics are unrelated and not specific enough to their higher-level topics. Additionally, these methods can be computationally expensive. We present HyHTM - a Hyperbolic geometry based Hierarchical Topic Models - that addresses these limitations by incorporating hierarchical information from hyperbolic geometry to explicitly model hierarchies in topic models. Experimental results with four baselines show that HyHTM can better attend to parent-child relationships among topics. HyHTM produces coherent topic hierarchies that specialise in granularity from generic higher-level topics to specific lowerlevel topics. Further, our model is significantly faster and leaves a much smaller memory footprint than our best-performing baseline.We have made the source code for our algorithm publicly accessible. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: This paper is accepted in Findings of the Association for Computational Linguistics (2023)

arXiv:2112.05969 [pdf, other]

Q-means using variational quantum feature embedding

Authors: Arvind S Menon, Nikaash Puri

Abstract: This paper proposes a hybrid quantum-classical algorithm that learns a suitable quantum feature map that separates unlabelled data that is originally non linearly separable in the classical space using a Variational quantum feature map and q-means as a subroutine for unsupervised learning. The objective of the Variational circuit is to maximally separate the clusters in the quantum feature Hilbert… ▽ More This paper proposes a hybrid quantum-classical algorithm that learns a suitable quantum feature map that separates unlabelled data that is originally non linearly separable in the classical space using a Variational quantum feature map and q-means as a subroutine for unsupervised learning. The objective of the Variational circuit is to maximally separate the clusters in the quantum feature Hilbert space. First part of the circuit embeds the classical data into quantum states. Second part performs unsupervised learning on the quantum states in the quantum feature Hilbert space using the q-means quantum circuit. The output of the quantum circuit are characteristic cluster quantum states that represent a superposition of all quantum states belonging to a particular cluster. The final part of the quantum circuit performs measurements on the characteristic cluster quantum states to output the inter-cluster overlap based on fidelity. The output of the complete quantum circuit is used to compute the value of the cost function that is based on the Hilbert-Schmidt distance between the density matrices of the characteristic cluster quantum states. The gradient of the expectation value is used to optimize the parameters of the variational circuit to learn a better quantum feature map. △ Less

Submitted 11 December, 2021; originally announced December 2021.

arXiv:2111.11692 [pdf, other]

Status-quo policy gradient in Multi-Agent Reinforcement Learning

Authors: Pinkesh Badjatiya, Mausoom Sarkar, Nikaash Puri, Jayakumar Subramanian, Abhishek Sinha, Siddharth Singh, Balaji Krishnamurthy

Abstract: Individual rationality, which involves maximizing expected individual returns, does not always lead to high-utility individual or group outcomes in multi-agent problems. For instance, in multi-agent social dilemmas, Reinforcement Learning (RL) agents trained to maximize individual rewards converge to a low-utility mutually harmful equilibrium. In contrast, humans evolve useful strategies in such s… ▽ More Individual rationality, which involves maximizing expected individual returns, does not always lead to high-utility individual or group outcomes in multi-agent problems. For instance, in multi-agent social dilemmas, Reinforcement Learning (RL) agents trained to maximize individual rewards converge to a low-utility mutually harmful equilibrium. In contrast, humans evolve useful strategies in such social dilemmas. Inspired by ideas from human psychology that attribute this behavior to the status-quo bias, we present a status-quo loss (SQLoss) and the corresponding policy gradient algorithm that incorporates this bias in an RL agent. We demonstrate that agents trained with SQLoss learn high-utility policies in several social dilemma matrix games (Prisoner's Dilemma, Stag Hunt matrix variant, Chicken Game). We show how SQLoss outperforms existing state-of-the-art methods to obtain high-utility policies in visual input non-matrix games (Coin Game and Stag Hunt visual input variant) using pre-trained cooperation and defection oracles. Finally, we show that SQLoss extends to a 4-agent setting by demonstrating the emergence of cooperative behavior in the popular Braess' paradox. △ Less

Submitted 23 November, 2021; originally announced November 2021.

arXiv:2110.09318 [pdf]

Mixed Reality using Illumination-aware Gradient Mixing in Surgical Telepresence: Enhanced Multi-layer Visualization

Authors: Nirakar Puri, Abeer Alsadoon, P. W. C. Prasad, Nada Alsalami, Tarik A. Rashid

Abstract: Background and aim: Surgical telepresence using augmented perception has been applied, but mixed reality is still being researched and is only theoretical. The aim of this work is to propose a solution to improve the visualization in the final merged video by producing globally consistent videos when the intensity of illumination in the input source and target video varies. Methodology: The propos… ▽ More Background and aim: Surgical telepresence using augmented perception has been applied, but mixed reality is still being researched and is only theoretical. The aim of this work is to propose a solution to improve the visualization in the final merged video by producing globally consistent videos when the intensity of illumination in the input source and target video varies. Methodology: The proposed system uses an enhanced multi-layer visualization with illumination-aware gradient mixing using Illumination Aware Video Composition algorithm. Particle Swarm Optimization Algorithm is used to find the best sample pair from foreground and background region and image pixel correlation to estimate the alpha matte. Particle Swarm Optimization algorithm helps to get the original colour and depth of the unknown pixel in the unknown region. Result: Our results showed improved accuracy caused by reducing the Mean squared Error for selecting the best sample pair for unknown region in 10 each sample for bowel, jaw and breast. The amount of this reduction is 16.48% from the state of art system. As a result, the visibility accuracy is improved from 89.4 to 97.7% which helped to clear the hand vision even in the difference of light. Conclusion: Illumination effect and alpha pixel correlation improves the visualization accuracy and produces a globally consistent composition results and maintains the temporal coherency when compositing two videos with high and inverse illumination effect. In addition, this paper provides a solution for selecting the best sampling pair for the unknown region to obtain the original colour and depth. △ Less

Submitted 21 August, 2021; originally announced October 2021.

Comments: 24 pages

Journal ref: Multimedia Tools and Applications, 2021

arXiv:2109.03813 [pdf, other]

Video2Skill: Adapting Events in Demonstration Videos to Skills in an Environment using Cyclic MDP Homomorphisms

Authors: Sumedh A Sontakke, Sumegh Roychowdhury, Mausoom Sarkar, Nikaash Puri, Balaji Krishnamurthy, Laurent Itti

Abstract: Humans excel at learning long-horizon tasks from demonstrations augmented with textual commentary, as evidenced by the burgeoning popularity of tutorial videos online. Intuitively, this capability can be separated into 2 distinct subtasks - first, dividing a long-horizon demonstration sequence into semantically meaningful events; second, adapting such events into meaningful behaviors in one's own… ▽ More Humans excel at learning long-horizon tasks from demonstrations augmented with textual commentary, as evidenced by the burgeoning popularity of tutorial videos online. Intuitively, this capability can be separated into 2 distinct subtasks - first, dividing a long-horizon demonstration sequence into semantically meaningful events; second, adapting such events into meaningful behaviors in one's own environment. Here, we present Video2Skill (V2S), which attempts to extend this capability to artificial agents by allowing a robot arm to learn from human cooking videos. We first use sequence-to-sequence Auto-Encoder style architectures to learn a temporal latent space for events in long-horizon demonstrations. We then transfer these representations to the robotic target domain, using a small amount of offline and unrelated interaction data (sequences of state-action pairs of the robot arm controlled by an expert) to adapt these events into actionable representations, i.e., skills. Through experiments, we demonstrate that our approach results in self-supervised analogy learning, where the agent learns to draw analogies between motions in human demonstration data and behaviors in the robotic environment. We also demonstrate the efficacy of our approach on model learning - demonstrating how Video2Skill utilizes prior knowledge from human demonstration to outperform traditional model learning of long-horizon dynamics. Finally, we demonstrate the utility of our approach for non-tabula rasa decision-making, i.e, utilizing video demonstration for zero-shot skill generation. △ Less

Submitted 9 September, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

arXiv:2105.06956 [pdf, other]

Information-theoretic Evolution of Model Agnostic Global Explanations

Authors: Sukriti Verma, Nikaash Puri, Piyush Gupta, Balaji Krishnamurthy

Abstract: Explaining the behavior of black box machine learning models through human interpretable rules is an important research area. Recent work has focused on explaining model behavior locally i.e. for specific predictions as well as globally across the fields of vision, natural language, reinforcement learning and data science. We present a novel model-agnostic approach that derives rules to globally e… ▽ More Explaining the behavior of black box machine learning models through human interpretable rules is an important research area. Recent work has focused on explaining model behavior locally i.e. for specific predictions as well as globally across the fields of vision, natural language, reinforcement learning and data science. We present a novel model-agnostic approach that derives rules to globally explain the behavior of classification models trained on numerical and/or categorical data. Our approach builds on top of existing local model explanation methods to extract conditions important for explaining model behavior for specific instances followed by an evolutionary algorithm that optimizes an information theory based fitness function to construct rules that explain global model behavior. We show how our approach outperforms existing approaches on a variety of datasets. Further, we introduce a parameter to evaluate the quality of interpretation under the scenario of distributional shift. This parameter evaluates how well the interpretation can predict model behavior for previously unseen data distributions. We show how existing approaches for interpreting models globally lack distributional robustness. Finally, we show how the quality of the interpretation can be improved under the scenario of distributional shift by adding out of distribution samples to the dataset used to learn the interpretation and thereby, increase robustness. All of the datasets used in our paper are open and publicly available. Our approach has been deployed in a leading digital marketing suite of products. △ Less

Submitted 14 May, 2021; originally announced May 2021.

arXiv:2010.02556 [pdf, other]

SHERLock: Self-Supervised Hierarchical Event Representation Learning

Authors: Sumegh Roychowdhury, Sumedh A. Sontakke, Nikaash Puri, Mausoom Sarkar, Milan Aggarwal, Pinkesh Badjatiya, Balaji Krishnamurthy, Laurent Itti

Abstract: Temporal event representations are an essential aspect of learning among humans. They allow for succinct encoding of the experiences we have through a variety of sensory inputs. Also, they are believed to be arranged hierarchically, allowing for an efficient representation of complex long-horizon experiences. Additionally, these representations are acquired in a self-supervised manner. Analogously… ▽ More Temporal event representations are an essential aspect of learning among humans. They allow for succinct encoding of the experiences we have through a variety of sensory inputs. Also, they are believed to be arranged hierarchically, allowing for an efficient representation of complex long-horizon experiences. Additionally, these representations are acquired in a self-supervised manner. Analogously, here we propose a model that learns temporal representations from long-horizon visual demonstration data and associated textual descriptions, without explicit temporal supervision. Our method produces a hierarchy of representations that align more closely with ground-truth human-annotated events (+15.3) than state-of-the-art unsupervised baselines. Our results are comparable to heavily-supervised baselines in complex visual domains such as Chess Openings, YouCook2 and TutorialVQA datasets. Finally, we perform ablation studies illustrating the robustness of our approach. We release our code and demo visualizations in the Supplementary Material. △ Less

Submitted 22 August, 2022; v1 submitted 6 October, 2020; originally announced October 2020.

Comments: Accepted at ICPR '22

arXiv:2009.01571 [pdf, other]

MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance

Authors: Anubha Kabra, Ayush Chopra, Nikaash Puri, Pinkesh Badjatiya, Sukriti Verma, Piyush Gupta, Balaji K

Abstract: Training a classification model on a dataset where the instances of one class outnumber those of the other class is a challenging problem. Such imbalanced datasets are standard in real-world situations such as fraud detection, medical diagnosis, and computational advertising. We propose an iterative data augmentation method, MixBoost, which intelligently selects (Boost) and then combines (Mix) ins… ▽ More Training a classification model on a dataset where the instances of one class outnumber those of the other class is a challenging problem. Such imbalanced datasets are standard in real-world situations such as fraud detection, medical diagnosis, and computational advertising. We propose an iterative data augmentation method, MixBoost, which intelligently selects (Boost) and then combines (Mix) instances from the majority and minority classes to generate synthetic hybrid instances that have characteristics of both classes. We evaluate MixBoost on 20 benchmark datasets, show that it outperforms existing approaches, and test its efficacy through significance testing. We also present ablation studies to analyze the impact of the different components of MixBoost. △ Less

Submitted 3 September, 2020; originally announced September 2020.

Comments: Work done as part of internship at MDSR

arXiv:2001.05458 [pdf, other]

Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss

Authors: Pinkesh Badjatiya, Mausoom Sarkar, Abhishek Sinha, Siddharth Singh, Nikaash Puri, Jayakumar Subramanian, Balaji Krishnamurthy

Abstract: In social dilemma situations, individual rationality leads to sub-optimal group outcomes. Several human engagements can be modeled as a sequential (multi-step) social dilemmas. However, in contrast to humans, Deep Reinforcement Learning agents trained to optimize individual rewards in sequential social dilemmas converge to selfish, mutually harmful behavior. We introduce a status-quo loss (SQLoss)… ▽ More In social dilemma situations, individual rationality leads to sub-optimal group outcomes. Several human engagements can be modeled as a sequential (multi-step) social dilemmas. However, in contrast to humans, Deep Reinforcement Learning agents trained to optimize individual rewards in sequential social dilemmas converge to selfish, mutually harmful behavior. We introduce a status-quo loss (SQLoss) that encourages an agent to stick to the status quo, rather than repeatedly changing its policy. We show how agents trained with SQLoss evolve cooperative behavior in several social dilemma matrix games. To work with social dilemma games that have visual input, we propose GameDistill. GameDistill uses self-supervision and clustering to automatically extract cooperative and selfish policies from a social dilemma game. We combine GameDistill and SQLoss to show how agents evolve socially desirable cooperative behavior in the Coin Game. △ Less

Submitted 13 February, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

arXiv:1912.12191 [pdf, other]

Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution

Authors: Nikaash Puri, Sukriti Verma, Piyush Gupta, Dhruv Kayastha, Shripad Deshmukh, Balaji Krishnamurthy, Sameer Singh

Abstract: As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant t… ▽ More As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our proposed approach, SARFA (Specific and Relevant Feature Attribution), generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweighs irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare SARFA with existing approaches on agents trained to play board games (Chess and Go) and Atari games (Breakout, Pong and Space Invaders). We show through illustrative examples (Chess, Atari, Go), human studies (Chess), and automated evaluation methods (Chess) that SARFA generates saliency maps that are more interpretable for humans than existing approaches. For the code release and demo videos, see https://nikaashpuri.github.io/sarfa-saliency/. △ Less

Submitted 3 April, 2020; v1 submitted 23 December, 2019; originally announced December 2019.

Comments: Accepted at the International Conference on Learning Representations (ICLR) 2020

arXiv:1909.07806 [pdf, other]

OpticalGAN : Generative Adversarial Networks for Continuous Variable Quantum Computation

Authors: Nilay Shrivastava, Nikaash Puri, Piyush Gupta, Balaji Krishnamurthy, Sukriti Verma

Abstract: We present OpticalGAN, an extension of quantum generative adversarial networks for continuous-variable quantum computation. OpticalGAN consists of photonic variational circuits comprising of optical Gaussian and Kerr gates. Photonic quantum computation is a realization of continuous variable quantum computing which involves encoding and processing information in the continuous quadrature amplitude… ▽ More We present OpticalGAN, an extension of quantum generative adversarial networks for continuous-variable quantum computation. OpticalGAN consists of photonic variational circuits comprising of optical Gaussian and Kerr gates. Photonic quantum computation is a realization of continuous variable quantum computing which involves encoding and processing information in the continuous quadrature amplitudes of quantized electromagnetic field such as light. Information processing in photonic quantum computers is performed using optical gates on squeezed light. Both the generator and discriminator of OpticalGAN are short depth variational circuits composed of gaussian and non-gaussian gates. We demonstrate our approach by using OpticalGAN to generate energy eigenstates and coherent states. All of our code is available at https://github.com/abcd1729/opticalgan. △ Less

Submitted 15 September, 2019; originally announced September 2019.

arXiv:1706.07160 [pdf, other]

MAGIX: Model Agnostic Globally Interpretable Explanations

Authors: Nikaash Puri, Piyush Gupta, Pratiksha Agarwal, Sukriti Verma, Balaji Krishnamurthy

Abstract: Explaining the behavior of a black box machine learning model at the instance level is useful for building trust. However, it is also important to understand how the model behaves globally. Such an understanding provides insight into both the data on which the model was trained and the patterns that it learned. We present here an approach that learns if-then rules to globally explain the behavior… ▽ More Explaining the behavior of a black box machine learning model at the instance level is useful for building trust. However, it is also important to understand how the model behaves globally. Such an understanding provides insight into both the data on which the model was trained and the patterns that it learned. We present here an approach that learns if-then rules to globally explain the behavior of black box machine learning models that have been used to solve classification problems. The approach works by first extracting conditions that were important at the instance level and then evolving rules through a genetic algorithm with an appropriate fitness function. Collectively, these rules represent the patterns followed by the model for decisioning and are useful for understanding its behavior. We demonstrate the validity and usefulness of the approach by interpreting black box models created using publicly available data sets as well as a private digital marketing data set. △ Less

Submitted 15 June, 2018; v1 submitted 21 June, 2017; originally announced June 2017.

arXiv:1512.08611 [pdf]

Surface wake field model of beam-foil circular Rydberg states

Authors: Gaurav Sharma, Nitin Kumar Puri, Adya Prasad Mishra, Tapan Nandi

Abstract: Production of projectile Rydberg states in fast ion-solid collisions in H-like ions exhibits a pronounce target thickness dependence in spite of these states forming at the last layers. This occurs due to important role of the surface wake field which varies with the target foil thickness. Further, according to the proposed model Rydberg states with low angular momentum are transformed into a circ… ▽ More Production of projectile Rydberg states in fast ion-solid collisions in H-like ions exhibits a pronounce target thickness dependence in spite of these states forming at the last layers. This occurs due to important role of the surface wake field which varies with the target foil thickness. Further, according to the proposed model Rydberg states with low angular momentum are transformed into a circular Rydberg states while passing through the field. The transfer occurs by a single multiphoton process with high probability depending upon the projectile ion velocity with respect to the Fermi velocity of the target electrons. △ Less

Submitted 29 December, 2015; originally announced December 2015.

arXiv:1512.08399 [pdf]

X-ray spectroscopy technique for the pile-up region

Authors: Gaurav Sharma, Deepak Swami, Basu Kumar, Nitin Kumar Puri, Tapan Nandi

Abstract: We report a pile-up rejection technique based on X-ray absorption concept of Beer-Lambert law for measuring true events in the pile-up region. We have detected a 10^4 times weaker peak in the pile-up region. This technique also enables one to resolve the weak peaks adjacent to an intense peak provided the later lies in the lower energy side, and the peaks are at least theoretically resolvable by t… ▽ More We report a pile-up rejection technique based on X-ray absorption concept of Beer-Lambert law for measuring true events in the pile-up region. We have detected a 10^4 times weaker peak in the pile-up region. This technique also enables one to resolve the weak peaks adjacent to an intense peak provided the later lies in the lower energy side, and the peaks are at least theoretically resolvable by the detector used. We have resolved such peaks by reducing the intensity ratios in our experiment. The technique allows us to obtain the actual intensities of the observed peaks to have been measured without any attenuator. The possible applications of this technique can be to study the physics of two electron one-photon transition as well as the properties of projectile-like or target-like ions △ Less

Submitted 19 January, 2016; v1 submitted 28 December, 2015; originally announced December 2015.

Showing 1–15 of 15 results for author: Puri, N