-
The Frobenius number for the triple of the 2-step star numbers
Authors:
Takao Komatsu,
Ritika Goel,
Neha Gupta
Abstract:
In this paper, we give closed form expressions of the Frobenius number for the triple of the $2$-step star numbers $an(n-2) + 1$ for an integer $a \geq 4$. These numbers have been studied from different aspects for some $a$'s. These numbers can also be considered as variations of the well known star numbers of the form $6n(n-1) + 1$. We also give closed form expressions of the Sylvester number (ge…
▽ More
In this paper, we give closed form expressions of the Frobenius number for the triple of the $2$-step star numbers $an(n-2) + 1$ for an integer $a \geq 4$. These numbers have been studied from different aspects for some $a$'s. These numbers can also be considered as variations of the well known star numbers of the form $6n(n-1) + 1$. We also give closed form expressions of the Sylvester number (genus) for the triple of the $2$-step star numbers.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation
Authors:
Utkarsh Nath,
Rajeev Goel,
Eun Som Jeon,
Changhoon Kim,
Kyle Min,
Yezhou Yang,
Yingzhen Yang,
Pavan Turaga
Abstract:
To address the data scarcity associated with 3D assets, 2D-lifting techniques such as Score Distillation Sampling (SDS) have become a widely adopted practice in text-to-3D generation pipelines. However, the diffusion models used in these techniques are prone to viewpoint bias and thus lead to geometric inconsistencies such as the Janus problem. To counter this, we introduce MT3D, a text-to-3D gene…
▽ More
To address the data scarcity associated with 3D assets, 2D-lifting techniques such as Score Distillation Sampling (SDS) have become a widely adopted practice in text-to-3D generation pipelines. However, the diffusion models used in these techniques are prone to viewpoint bias and thus lead to geometric inconsistencies such as the Janus problem. To counter this, we introduce MT3D, a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias and explicitly infuse geometric understanding into the generation pipeline. Firstly, we employ depth maps derived from a high-quality 3D model as control signals to guarantee that the generated 2D images preserve the fundamental shape and structure, thereby reducing the inherent viewpoint bias. Next, we utilize deep geometric moments to ensure geometric consistency in the 3D representation explicitly. By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometrically consistent objects, thereby improving the quality and usability of our 3D representations.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection
Authors:
Hong Guan,
Yancheng Wang,
Lulu Xie,
Soham Nag,
Rajeev Goel,
Niranjan Erappa Narayana Swamy,
Yingzhen Yang,
Chaowei Xiao,
Jonathan Prisby,
Ross Maciejewski,
Jia Zou
Abstract:
Effective fraud detection and analysis of government-issued identity documents, such as passports, driver's licenses, and identity cards, are essential in thwarting identity theft and bolstering security on online platforms. The training of accurate fraud detection and analysis tools depends on the availability of extensive identity document datasets. However, current publicly available benchmark…
▽ More
Effective fraud detection and analysis of government-issued identity documents, such as passports, driver's licenses, and identity cards, are essential in thwarting identity theft and bolstering security on online platforms. The training of accurate fraud detection and analysis tools depends on the availability of extensive identity document datasets. However, current publicly available benchmark datasets for identity document analysis, including MIDV-500, MIDV-2020, and FMIDV, fall short in several respects: they offer a limited number of samples, cover insufficient varieties of fraud patterns, and seldom include alterations in critical personal identifying fields like portrait images, limiting their utility in training models capable of detecting realistic frauds while preserving privacy.
In response to these shortcomings, our research introduces a new benchmark dataset, IDNet, designed to advance privacy-preserving fraud detection efforts. The IDNet dataset comprises 837,060 images of synthetically generated identity documents, totaling approximately 490 gigabytes, categorized into 20 types from $10$ U.S. states and 10 European countries. We evaluate the utility and present use cases of the dataset, illustrating how it can aid in training privacy-preserving fraud detection methods, facilitating the generation of camera and video capturing of identity documents, and testing schema unification and other identity document management functionalities.
△ Less
Submitted 3 September, 2024; v1 submitted 3 August, 2024;
originally announced August 2024.
-
Taming 3DGS: High-Quality Radiance Fields with Limited Resources
Authors:
Saswat Subhajyoti Mallick,
Rahul Goel,
Bernhard Kerbl,
Francisco Vicente Carrasco,
Markus Steinberger,
Fernando De La Torre
Abstract:
3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering. However, its resource requirements limit its usability. Especially on constrained devices, training performance degrades quickly and often cannot complete due to excessive memory consumption of the model. The method converges with an indefinite number of Gaussians -- many of…
▽ More
3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering. However, its resource requirements limit its usability. Especially on constrained devices, training performance degrades quickly and often cannot complete due to excessive memory consumption of the model. The method converges with an indefinite number of Gaussians -- many of them redundant -- making rendering unnecessarily slow and preventing its usage in downstream tasks that expect fixed-size inputs. To address these issues, we tackle the challenges of training and rendering 3DGS models on a budget. We use a guided, purely constructive densification process that steers densification toward Gaussians that raise the reconstruction quality. Model size continuously increases in a controlled manner towards an exact budget, using score-based densification of Gaussians with training-time priors that measure their contribution. We further address training speed obstacles: following a careful analysis of 3DGS' original pipeline, we derive faster, numerically equivalent solutions for gradient computation and attribute updates, including an alternative parallelization for efficient backpropagation. We also propose quality-preserving approximations where suitable to reduce training time even further. Taken together, these enhancements yield a robust, scalable solution with reduced training times, lower compute and memory requirements, and high quality. Our evaluation shows that in a budgeted setting, we obtain competitive quality metrics with 3DGS while achieving a 4--5x reduction in both model size and training time. With more generous budgets, our measured quality surpasses theirs. These advances open the door for novel-view synthesis in constrained environments, e.g., mobile devices.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Number of Independent Sets in Regular and Irregular Graphs: A 31 Year Journey
Authors:
Dev Chheda,
Ram Goel,
Eddie Qiao
Abstract:
We review the progress made on bounding the number of independent sets in $d$-regular and irregular graphs over the last 31 years. We particularly focus on contributions from Kahn, Zhao, and Sah et al. in incrementally proving stronger and more general versions of the upper bound. We reproduce the main results of these works, particularly focusing on the unweighted special case (with fugacity…
▽ More
We review the progress made on bounding the number of independent sets in $d$-regular and irregular graphs over the last 31 years. We particularly focus on contributions from Kahn, Zhao, and Sah et al. in incrementally proving stronger and more general versions of the upper bound. We reproduce the main results of these works, particularly focusing on the unweighted special case (with fugacity $λ= 1$), which allows us to provide more intuitive and clear explanations of the key ideas that have been developed in the field over three decades.
△ Less
Submitted 30 May, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Transformer models classify random numbers
Authors:
Rishabh Goel,
YiZi Xiao,
Ramin Ramezani
Abstract:
Random numbers are incredibly important in a variety of fields, and the need for their validation remains important. A Quantum Random Number Generator (QRNG) can theoretically generate truly random numbers however this does not remove the need to thoroughly test their randomness. Generally, the task of validating random numbers has been delegated to different statistical tests such as the tests fr…
▽ More
Random numbers are incredibly important in a variety of fields, and the need for their validation remains important. A Quantum Random Number Generator (QRNG) can theoretically generate truly random numbers however this does not remove the need to thoroughly test their randomness. Generally, the task of validating random numbers has been delegated to different statistical tests such as the tests from the NIST Statistical Test Suite (STS) which are often slow and only perform one task at a time. Our work presents a deep learning model that utilizes the transformer architecture to encode some of the tests from the NIST STS in a single model that also runs much faster. This model performs multi-label classification on these tests and outputs the probability of passing each statistical test that it encodes. We perform a thorough hyper-parameter optimization to converge on the best possible model and as a result, achieve a high degree of accuracy with a sample f1 score of above 0.9.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Learning Low-Rank Feature for Thorax Disease Classification
Authors:
Rajeev Goel,
Utkarsh Nath,
Yancheng Wang,
Alvin C. Silva,
Teresa Wu,
Yingzhen Yang
Abstract:
Deep neural networks, including Convolutional Neural Networks (CNNs) and Visual Transformers (ViT), have achieved stunning success in medical image domain. We study thorax disease classification in this paper. Effective extraction of features for the disease areas is crucial for disease classification on radiographic images. While various neural architectures and training techniques, such as self-…
▽ More
Deep neural networks, including Convolutional Neural Networks (CNNs) and Visual Transformers (ViT), have achieved stunning success in medical image domain. We study thorax disease classification in this paper. Effective extraction of features for the disease areas is crucial for disease classification on radiographic images. While various neural architectures and training techniques, such as self-supervised learning with contrastive/restorative learning, have been employed for disease classification on radiographic images, there are no principled methods which can effectively reduce the adverse effect of noise and background, or non-disease areas, on the radiographic images for disease classification. To address this challenge, we propose a novel Low-Rank Feature Learning (LRFL) method in this paper, which is universally applicable to the training of all neural networks. The LRFL method is both empirically motivated by the low frequency property observed on all the medical datasets in this paper, and theoretically motivated by our sharp generalization bound for neural networks with low-rank features. In the empirical study, using a neural network such as a ViT or a CNN pre-trained on unlabeled chest X-rays by Masked Autoencoders (MAE), our novel LRFL method is applied on the pre-trained neural network and demonstrate better classification results in terms of both multiclass area under the receiver operating curve (mAUC) and classification accuracy.
△ Less
Submitted 14 February, 2024;
originally announced April 2024.
-
On Speculative Decoding for Multimodal Large Language Models
Authors:
Mukul Gagrani,
Raghavv Goel,
Wonseok Jeon,
Junyoung Park,
Mingu Lee,
Christopher Lott
Abstract:
Inference with Multimodal Large Language Models (MLLMs) is slow due to their large-language-model backbone which suffers from memory bandwidth bottleneck and generates tokens auto-regressively. In this paper, we explore the application of speculative decoding to enhance the inference efficiency of MLLMs, specifically the LLaVA 7B model. We show that a language-only model can serve as a good draft…
▽ More
Inference with Multimodal Large Language Models (MLLMs) is slow due to their large-language-model backbone which suffers from memory bandwidth bottleneck and generates tokens auto-regressively. In this paper, we explore the application of speculative decoding to enhance the inference efficiency of MLLMs, specifically the LLaVA 7B model. We show that a language-only model can serve as a good draft model for speculative decoding with LLaVA 7B, bypassing the need for image tokens and their associated processing components from the draft model. Our experiments across three different tasks show that speculative decoding can achieve a memory-bound speedup of up to 2.37$\times$ using a 115M parameter language model that we trained from scratch. Additionally, we introduce a compact LLaVA draft model incorporating an image adapter, which shows marginal performance gains in image captioning while maintaining comparable results in other tasks.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs
Authors:
Raghavv Goel,
Mukul Gagrani,
Wonseok Jeon,
Junyoung Park,
Mingu Lee,
Christopher Lott
Abstract:
Text generation with Large Language Models (LLMs) is known to be memory bound due to the combination of their auto-regressive nature, huge parameter counts, and limited memory bandwidths, often resulting in low token rates. Speculative decoding has been proposed as a solution for LLM inference acceleration. However, since draft models are often unavailable in the modern open-source LLM families, e…
▽ More
Text generation with Large Language Models (LLMs) is known to be memory bound due to the combination of their auto-regressive nature, huge parameter counts, and limited memory bandwidths, often resulting in low token rates. Speculative decoding has been proposed as a solution for LLM inference acceleration. However, since draft models are often unavailable in the modern open-source LLM families, e.g., for Llama 2 7B, training a high-quality draft model is required to enable inference acceleration via speculative decoding. In this paper, we propose a simple draft model training framework for direct alignment to chat-capable target models. With the proposed framework, we train Llama 2 Chat Drafter 115M, a draft model for Llama 2 Chat 7B or larger, with only 1.64\% of the original size. Our training framework only consists of pretraining, distillation dataset generation, and finetuning with knowledge distillation, with no additional alignment procedure. For the finetuning step, we use instruction-response pairs generated by target model for distillation in plausible data distribution, and propose a new Total Variation Distance++ (TVD++) loss that incorporates variance reduction techniques inspired from the policy gradient method in reinforcement learning. Our empirical results show that Llama 2 Chat Drafter 115M with speculative decoding achieves up to 2.3 block efficiency and 2.4$\times$ speed-up relative to autoregressive decoding on various tasks with no further task-specific fine-tuning.
△ Less
Submitted 13 May, 2024; v1 submitted 29 February, 2024;
originally announced March 2024.
-
Using text embedding models and vector databases as text classifiers with the example of medical data
Authors:
Rishabh Goel
Abstract:
The advent of Large Language Models (LLMs) is promising and has found application in numerous fields, but as it often is with the medical field, the bar is typically quite high [5]. In tandem with LLMs, vector embedding models and vector databases provide a robust way of expressing numerous modes of data that are easily digestible by typical machine learning models. Along with the ease of adding i…
▽ More
The advent of Large Language Models (LLMs) is promising and has found application in numerous fields, but as it often is with the medical field, the bar is typically quite high [5]. In tandem with LLMs, vector embedding models and vector databases provide a robust way of expressing numerous modes of data that are easily digestible by typical machine learning models. Along with the ease of adding information, knowledge, and data to these vector databases, they provide a compelling reason to apply them in numerous fields where the task of retrieving information is typically done by humans. Researchers at Google have developed a clear alternative model, Med-PaLM [6] specifically designed to match a clinician's level of accuracy when it comes to medical knowledge. When training classifiers, and developing models, it is imperative to maintain factuality and reduce bias [4]. Here, we explore the use of vector databases and embedding models as a means of encoding, and classifying text with the example and application in the field of medicine. We show the robustness of these tools depends heavily on the sparsity of the data presented, and even with low amounts of data in the vector database itself, the vector database does a good job at classifying data [9]. Using various LLMs to generate the medical data, we also understand the limitations of the medical knowledge of these models and encourage further expert medical review of our testing data. By using vector databases to classify a clinician's notes on a patient presented with a certain ailment, we understand the limitations of such methods, but also the promise of their prospective use and with continued testing and experimentation, hope to explore a unique use case of vector databases and embedding models.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
Authors:
Wonseok Jeon,
Mukul Gagrani,
Raghavv Goel,
Junyoung Park,
Mingu Lee,
Christopher Lott
Abstract:
Speculative decoding is an inference-acceleration method for large language models (LLMs) where a small language model generates a draft-token sequence which is further verified by the target LLM in parallel. Recent works have advanced this method by establishing a draft-token tree, achieving superior performance over a single-sequence speculative decoding. However, those works independently gener…
▽ More
Speculative decoding is an inference-acceleration method for large language models (LLMs) where a small language model generates a draft-token sequence which is further verified by the target LLM in parallel. Recent works have advanced this method by establishing a draft-token tree, achieving superior performance over a single-sequence speculative decoding. However, those works independently generate tokens at each level of the tree, not leveraging the tree's entire diversifiability. Besides, their empirical superiority has been shown for fixed length of sequences, implicitly granting more computational resource to LLM for the tree-based methods. None of the existing works has conducted empirical studies with fixed target computational budgets despite its importance to resource-bounded devices. We present Recursive Speculative Decoding (RSD), a novel tree-based method that samples draft tokens without replacement and maximizes the diversity of the tree. During RSD's drafting, the tree is built by either Gumbel-Top-$k$ trick that draws tokens without replacement in parallel or Stochastic Beam Search that samples sequences without replacement while early-truncating unlikely draft sequences and reducing the computational cost of LLM. We empirically evaluate RSD with Llama 2 and OPT models, showing that RSD outperforms the baseline methods, consistently for fixed draft sequence length and in most cases for fixed computational budgets at LLM.
△ Less
Submitted 5 March, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
GSN: Generalisable Segmentation in Neural Radiance Field
Authors:
Vinayak Gupta,
Rahul Goel,
Sirikonda Dhawal,
P. J. Narayanan
Abstract:
Traditional Radiance Field (RF) representations capture details of a specific scene and must be trained afresh on each scene. Semantic feature fields have been added to RFs to facilitate several segmentation tasks. Generalised RF representations learn the principles of view interpolation. A generalised RF can render new views of an unknown and untrained scene, given a few views. We present a way t…
▽ More
Traditional Radiance Field (RF) representations capture details of a specific scene and must be trained afresh on each scene. Semantic feature fields have been added to RFs to facilitate several segmentation tasks. Generalised RF representations learn the principles of view interpolation. A generalised RF can render new views of an unknown and untrained scene, given a few views. We present a way to distil feature fields into the generalised GNT representation. Our GSN representation generates new views of unseen scenes on the fly along with consistent, per-pixel semantic features. This enables multi-view segmentation of arbitrary new scenes. We show different semantic features being distilled into generalised RFs. Our multi-view segmentation results are on par with methods that use traditional RFs. GSN closes the gap between standard and generalisable RF methods significantly. Project Page: https://vinayak-vg.github.io/GSN/
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Quantum Kernel Machine Learning With Continuous Variables
Authors:
Laura J. Henderson,
Rishi Goel,
Sally Shrapnel
Abstract:
The popular qubit framework has dominated recent work on quantum kernel machine learning, with results characterising expressivity, learnability and generalisation. As yet, there is no comparative framework to understand these concepts for continuous variable (CV) quantum computing platforms. In this paper we represent CV quantum kernels as closed form functions and use this representation to prov…
▽ More
The popular qubit framework has dominated recent work on quantum kernel machine learning, with results characterising expressivity, learnability and generalisation. As yet, there is no comparative framework to understand these concepts for continuous variable (CV) quantum computing platforms. In this paper we represent CV quantum kernels as closed form functions and use this representation to provide several important theoretical insights. We derive a general closed form solution for all CV quantum kernels and show every such kernel can be expressed as the product of a Gaussian and an algebraic function of the parameters of the feature map. Furthermore, in the multi-mode case, we present quantification of a quantum-classical separation for all quantum kernels via a hierarchical notion of the "stellar rank" of the quantum kernel feature map. We then prove kernels defined by feature maps of infinite stellar rank, such as GKP-state encodings, can be approximated arbitrarily well by kernels defined by feature maps of finite stellar rank. Finally, we simulate learning with a single-mode displaced Fock state encoding and show that (i) accuracy on our specific task (an annular data set) increases with stellar rank, (ii) for underfit models, accuracy can be improved by increasing a bandwidth hyperparameter, and (iii) for noisy data that is overfit, decreasing the bandwidth will improve generalisation but does so at the cost of effective stellar rank.
△ Less
Submitted 9 July, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Motion Informed Needle Segmentation in Ultrasound Images
Authors:
Raghavv Goel,
Cecilia Morales,
Manpreet Singh,
Artur Dubrawski,
John Galeotti,
Howie Choset
Abstract:
Segmenting a moving needle in ultrasound images is challenging due to the presence of artifacts, noise, and needle occlusion. This task becomes even more demanding in scenarios where data availability is limited. In this paper, we present a novel approach for needle segmentation for 2D ultrasound that combines classical Kalman Filter (KF) techniques with data-driven learning, incorporating both ne…
▽ More
Segmenting a moving needle in ultrasound images is challenging due to the presence of artifacts, noise, and needle occlusion. This task becomes even more demanding in scenarios where data availability is limited. In this paper, we present a novel approach for needle segmentation for 2D ultrasound that combines classical Kalman Filter (KF) techniques with data-driven learning, incorporating both needle features and needle motion. Our method offers three key contributions. First, we propose a compatible framework that seamlessly integrates into commonly used encoder-decoder style architectures. Second, we demonstrate superior performance compared to recent state-of-the-art needle segmentation models using our novel convolutional neural network (CNN) based KF-inspired block, achieving a 15\% reduction in pixel-wise needle tip error and an 8\% reduction in length error. Third, to our knowledge we are the first to implement a learnable filter to incorporate non-linear needle motion for improving needle segmentation.
△ Less
Submitted 3 May, 2024; v1 submitted 2 December, 2023;
originally announced December 2023.
-
RSM-NLP at BLP-2023 Task 2: Bangla Sentiment Analysis using Weighted and Majority Voted Fine-Tuned Transformers
Authors:
Pratinav Seth,
Rashi Goel,
Komal Mathur,
Swetha Vemulapalli
Abstract:
This paper describes our approach to submissions made at Shared Task 2 at BLP Workshop - Sentiment Analysis of Bangla Social Media Posts. Sentiment Analysis is an action research area in the digital age. With the rapid and constant growth of online social media sites and services and the increasing amount of textual data, the application of automatic Sentiment Analysis is on the rise. However, mos…
▽ More
This paper describes our approach to submissions made at Shared Task 2 at BLP Workshop - Sentiment Analysis of Bangla Social Media Posts. Sentiment Analysis is an action research area in the digital age. With the rapid and constant growth of online social media sites and services and the increasing amount of textual data, the application of automatic Sentiment Analysis is on the rise. However, most of the research in this domain is based on the English language. Despite being the world's sixth most widely spoken language, little work has been done in Bangla. This task aims to promote work on Bangla Sentiment Analysis while identifying the polarity of social media content by determining whether the sentiment expressed in the text is Positive, Negative, or Neutral. Our approach consists of experimenting and finetuning various multilingual and pre-trained BERT-based models on our downstream tasks and using a Majority Voting and Weighted ensemble model that outperforms individual baseline model scores. Our system scored 0.711 for the multiclass classification task and scored 10th place among the participants on the leaderboard for the shared task. Our code is available at https://github.com/ptnv-s/RSM-NLP-BLP-Task2 .
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
FusedRF: Fusing Multiple Radiance Fields
Authors:
Rahul Goel,
Dhawal Sirikonda,
Rajvi Shah,
PJ Narayanan
Abstract:
Radiance Fields (RFs) have shown great potential to represent scenes from casually captured discrete views. Compositing parts or whole of multiple captured scenes could greatly interest several XR applications. Prior works can generate new views of such scenes by tracing each scene in parallel. This increases the render times and memory requirements with the number of components. In this work, we…
▽ More
Radiance Fields (RFs) have shown great potential to represent scenes from casually captured discrete views. Compositing parts or whole of multiple captured scenes could greatly interest several XR applications. Prior works can generate new views of such scenes by tracing each scene in parallel. This increases the render times and memory requirements with the number of components. In this work, we provide a method to create a single, compact, fused RF representation for a scene composited using multiple RFs. The fused RF has the same render times and memory utilizations as a single RF. Our method distills information from multiple teacher RFs into a single student RF while also facilitating further manipulations like addition and deletion into the fused representation.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
PRESTO: A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs
Authors:
Rahul Goel,
Waleed Ammar,
Aditya Gupta,
Siddharth Vashishtha,
Motoki Sano,
Faiz Surani,
Max Chang,
HyunJeong Choe,
David Greene,
Kyle He,
Rattima Nitisaroj,
Anna Trukhina,
Shachi Paul,
Pararth Shah,
Rushin Shah,
Zhou Yu
Abstract:
Research interest in task-oriented dialogs has increased as systems such as Google Assistant, Alexa and Siri have become ubiquitous in everyday life. However, the impact of academic research in this area has been limited by the lack of datasets that realistically capture the wide array of user pain points. To enable research on some of the more challenging aspects of parsing realistic conversation…
▽ More
Research interest in task-oriented dialogs has increased as systems such as Google Assistant, Alexa and Siri have become ubiquitous in everyday life. However, the impact of academic research in this area has been limited by the lack of datasets that realistically capture the wide array of user pain points. To enable research on some of the more challenging aspects of parsing realistic conversations, we introduce PRESTO, a public dataset of over 550K contextual multilingual conversations between humans and virtual assistants. PRESTO contains a diverse array of challenges that occur in real-world NLU tasks such as disfluencies, code-switching, and revisions. It is the only large scale human generated conversational parsing dataset that provides structured context such as a user's contacts and lists for each example. Our mT5 model based baselines demonstrate that the conversational phenomenon present in PRESTO are challenging to model, which is further pronounced in a low-resource setup.
△ Less
Submitted 16 March, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Predicting Socio-Economic Well-being Using Mobile Apps Data: A Case Study of France
Authors:
Rahul Goel,
Angelo Furno,
Rajesh Sharma
Abstract:
Socio-economic indicators provide context for assessing a country's overall condition. These indicators contain information about education, gender, poverty, employment, and other factors. Therefore, reliable and accurate information is critical for social research and government policing. Most data sources available today, such as censuses, have sparse population coverage or are updated infrequen…
▽ More
Socio-economic indicators provide context for assessing a country's overall condition. These indicators contain information about education, gender, poverty, employment, and other factors. Therefore, reliable and accurate information is critical for social research and government policing. Most data sources available today, such as censuses, have sparse population coverage or are updated infrequently. Nonetheless, alternative data sources, such as call data records (CDR) and mobile app usage, can serve as cost-effective and up-to-date sources for identifying socio-economic indicators.
This work investigates mobile app data to predict socio-economic features. We present a large-scale study using data that captures the traffic of thousands of mobile applications by approximately 30 million users distributed over 550,000 km square and served by over 25,000 base stations. The dataset covers the whole France territory and spans more than 2.5 months, starting from 16th March 2019 to 6th June 2019. Using the app usage patterns, our best model can estimate socio-economic indicators (attaining an R-squared score upto 0.66). Furthermore, using models' explainability, we discover that mobile app usage patterns have the potential to reveal socio-economic disparities in IRIS. Insights of this study provide several avenues for future interventions, including user temporal network analysis to understand evolving network patterns and exploration of alternative data sources.
△ Less
Submitted 4 February, 2023; v1 submitted 15 January, 2023;
originally announced January 2023.
-
Interactive Segmentation of Radiance Fields
Authors:
Rahul Goel,
Dhawal Sirikonda,
Saurabh Saini,
PJ Narayanan
Abstract:
Radiance Fields (RF) are popular to represent casually-captured scenes for new view synthesis and several applications beyond it. Mixed reality on personal spaces needs understanding and manipulating scenes represented as RFs, with semantic segmentation of objects as an important step. Prior segmentation efforts show promise but don't scale to complex objects with diverse appearance. We present th…
▽ More
Radiance Fields (RF) are popular to represent casually-captured scenes for new view synthesis and several applications beyond it. Mixed reality on personal spaces needs understanding and manipulating scenes represented as RFs, with semantic segmentation of objects as an important step. Prior segmentation efforts show promise but don't scale to complex objects with diverse appearance. We present the ISRF method to interactively segment objects with fine structure and appearance. Nearest neighbor feature matching using distilled semantic features identifies high-confidence seed regions. Bilateral search in a joint spatio-semantic space grows the region to recover accurate segmentation. We show state-of-the-art results of segmenting objects from RFs and compositing them to another scene, changing appearance, etc., and an interactive segmentation tool that others can use.
Project Page: https://rahul-goel.github.io/isrf/
△ Less
Submitted 25 March, 2023; v1 submitted 27 December, 2022;
originally announced December 2022.
-
StyleTRF: Stylizing Tensorial Radiance Fields
Authors:
Rahul Goel,
Sirikonda Dhawal,
Saurabh Saini,
P. J. Narayanan
Abstract:
Stylized view generation of scenes captured casually using a camera has received much attention recently. The geometry and appearance of the scene are typically captured as neural point sets or neural radiance fields in the previous work. An image stylization method is used to stylize the captured appearance by training its network jointly or iteratively with the structure capture network. The sta…
▽ More
Stylized view generation of scenes captured casually using a camera has received much attention recently. The geometry and appearance of the scene are typically captured as neural point sets or neural radiance fields in the previous work. An image stylization method is used to stylize the captured appearance by training its network jointly or iteratively with the structure capture network. The state-of-the-art SNeRF method trains the NeRF and stylization network in an alternating manner. These methods have high training time and require joint optimization. In this work, we present StyleTRF, a compact, quick-to-optimize strategy for stylized view generation using TensoRF. The appearance part is fine-tuned using sparse stylized priors of a few views rendered using the TensoRF representation for a few iterations. Our method thus effectively decouples style-adaption from view capture and is much faster than the previous methods. We show state-of-the-art results on several scenes used for this purpose.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue
Authors:
William Held,
Christopher Hidey,
Fei Liu,
Eric Zhu,
Rahul Goel,
Diyi Yang,
Rushin Shah
Abstract:
Modern virtual assistants use internal semantic parsing engines to convert user utterances to actionable commands. However, prior work has demonstrated that semantic parsing is a difficult multilingual transfer task with low transfer efficiency compared to other tasks. In global markets such as India and Latin America, this is a critical issue as switching between languages is prevalent for biling…
▽ More
Modern virtual assistants use internal semantic parsing engines to convert user utterances to actionable commands. However, prior work has demonstrated that semantic parsing is a difficult multilingual transfer task with low transfer efficiency compared to other tasks. In global markets such as India and Latin America, this is a critical issue as switching between languages is prevalent for bilingual users. In this work we dramatically improve the zero-shot performance of a multilingual and codeswitched semantic parsing system using two stages of multilingual alignment. First, we show that constrastive alignment pretraining improves both English performance and transfer efficiency. We then introduce a constrained optimization approach for hyperparameter-free adversarial alignment during finetuning. Our Doubly Aligned Multilingual Parser (DAMP) improves mBERT transfer performance by 3x, 6x, and 81x on the Spanglish, Hinglish and Multilingual Task Oriented Parsing benchmarks respectively and outperforms XLM-R and mT5-Large using 3.2x fewer parameters.
△ Less
Submitted 26 May, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
CST5: Data Augmentation for Code-Switched Semantic Parsing
Authors:
Anmol Agarwal,
Jigar Gupta,
Rahul Goel,
Shyam Upadhyay,
Pankaj Joshi,
Rengarajan Aravamudhan
Abstract:
Extending semantic parsers to code-switched input has been a challenging problem, primarily due to a lack of supervised training data. In this work, we introduce CST5, a new data augmentation technique that finetunes a T5 model using a small seed set ($\approx$100 utterances) to generate code-switched utterances from English utterances. We show that CST5 generates high quality code-switched data,…
▽ More
Extending semantic parsers to code-switched input has been a challenging problem, primarily due to a lack of supervised training data. In this work, we introduce CST5, a new data augmentation technique that finetunes a T5 model using a small seed set ($\approx$100 utterances) to generate code-switched utterances from English utterances. We show that CST5 generates high quality code-switched data, both intrinsically (per human evaluation) and extrinsically by comparing baseline models which are trained without data augmentation to models which are trained with augmented data. Empirically we observe that using CST5, one can achieve the same semantic parsing performance by using up to 20x less labeled data. To aid further research in this area, we are also releasing (a) Hinglish-TOP, the largest human annotated code-switched semantic parsing dataset to date, containing 10k human annotated Hindi-English (Hinglish) code-switched utterances, and (b) Over 170K CST5 generated code-switched utterances from the TOPv2 dataset. Human evaluation shows that both the human annotated data as well as the CST5 generated data is of good quality.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
GLINKX: A Scalable Unified Framework For Homophilous and Heterophilous Graphs
Authors:
Marios Papachristou,
Rishab Goel,
Frank Portman,
Matthew Miller,
Rong Jin
Abstract:
In graph learning, there have been two predominant inductive biases regarding graph-inspired architectures: On the one hand, higher-order interactions and message passing work well on homophilous graphs and are leveraged by GCNs and GATs. Such architectures, however, cannot easily scale to large real-world graphs. On the other hand, shallow (or node-level) models using ego features and adjacency e…
▽ More
In graph learning, there have been two predominant inductive biases regarding graph-inspired architectures: On the one hand, higher-order interactions and message passing work well on homophilous graphs and are leveraged by GCNs and GATs. Such architectures, however, cannot easily scale to large real-world graphs. On the other hand, shallow (or node-level) models using ego features and adjacency embeddings work well in heterophilous graphs. In this work, we propose a novel scalable shallow method -- GLINKX -- that can work both on homophilous and heterophilous graphs. GLINKX leverages (i) novel monophilous label propagations, (ii) ego/node features, (iii) knowledge graph embeddings as positional embeddings, (iv) node-level training, and (v) low-dimensional message passing. Formally, we prove novel error bounds and justify the components of GLINKX. Experimentally, we show its effectiveness on several homophilous and heterophilous datasets.
△ Less
Submitted 18 November, 2022; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Towards Trustworthy Automatic Diagnosis Systems by Emulating Doctors' Reasoning with Deep Reinforcement Learning
Authors:
Arsene Fansi Tchango,
Rishab Goel,
Julien Martel,
Zhi Wen,
Gaetan Marceau Caron,
Joumana Ghosn
Abstract:
The automation of the medical evidence acquisition and diagnosis process has recently attracted increasing attention in order to reduce the workload of doctors and democratize access to medical care. However, most works proposed in the machine learning literature focus solely on improving the prediction accuracy of a patient's pathology. We argue that this objective is insufficient to ensure docto…
▽ More
The automation of the medical evidence acquisition and diagnosis process has recently attracted increasing attention in order to reduce the workload of doctors and democratize access to medical care. However, most works proposed in the machine learning literature focus solely on improving the prediction accuracy of a patient's pathology. We argue that this objective is insufficient to ensure doctors' acceptability of such systems. In their initial interaction with patients, doctors do not only focus on identifying the pathology a patient is suffering from; they instead generate a differential diagnosis (in the form of a short list of plausible diseases) because the medical evidence collected from patients is often insufficient to establish a final diagnosis. Moreover, doctors explicitly explore severe pathologies before potentially ruling them out from the differential, especially in acute care settings. Finally, for doctors to trust a system's recommendations, they need to understand how the gathered evidences led to the predicted diseases. In particular, interactions between a system and a patient need to emulate the reasoning of doctors. We therefore propose to model the evidence acquisition and automatic diagnosis tasks using a deep reinforcement learning framework that considers three essential aspects of a doctor's reasoning, namely generating a differential diagnosis using an exploration-confirmation approach while prioritizing severe pathologies. We propose metrics for evaluating interaction quality based on these three aspects. We show that our approach performs better than existing models while maintaining competitive pathology prediction accuracy.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Gender gap in mobility outside home in urban India
Authors:
Rahul Goel
Abstract:
India has one of the highest levels of gender inequality in the world. Work participation rate of women is among the lowest with a wide gender gap. There are seclusion norms that restrict mobility of women outside home. However, transport literature in India has not explored the impact of this lack of autonomy on gender differences in travel demand. I use 2019 population-representative nationwide…
▽ More
India has one of the highest levels of gender inequality in the world. Work participation rate of women is among the lowest with a wide gender gap. There are seclusion norms that restrict mobility of women outside home. However, transport literature in India has not explored the impact of this lack of autonomy on gender differences in travel demand. I use 2019 population-representative nationwide time-use survey of India. The dataset reported both travel and non-travel activities in 30-minute episodes over a 24-hour period. For urban residents, I analysed gender differences in trip rates and mobility rates, where the latter is defined as the percentage going out of home at least once on the reporting day. I developed gender-stratified logistic regression models at the individual level with mobility as a binary outcome. It was found that 53% of the females did not report going out of home compared to only 14% males. The mobility of females reduces steeply from adolescence to young adulthood and then remains largely stable at a low level before reducing further for older adults. No such variation is observed among males, except their mobility is also reduced among older adults. There is a clear dichotomy with women mostly participating in in-house activities while men mostly involved in out-of-home activities. Adolescence or adulthood, marriage, living with one or more household members, having an infant in the house, lower income, and less education are associated with lower likelihood of female mobility. I discuss many implications of these gender differences in mobility.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
A Finite Analogue Of Fine's Function $F(a,b;t)$
Authors:
Ritika Goel
Abstract:
We initiate a systematic development of $F_N(a, b; t)$, a finite analogue of Fine's function $F(a, b; t)$. Our results are transformations between $F_N(a, b; t)$ and $F_N(aq^{\ell}, bq^{m}; tq^{n})$, where $\ell,m$ and $n$ take the values $0$ or $1$.
We initiate a systematic development of $F_N(a, b; t)$, a finite analogue of Fine's function $F(a, b; t)$. Our results are transformations between $F_N(a, b; t)$ and $F_N(aq^{\ell}, bq^{m}; tq^{n})$, where $\ell,m$ and $n$ take the values $0$ or $1$.
△ Less
Submitted 2 July, 2022;
originally announced July 2022.
-
Composite Adaptive Control for Time-varying Systems with Dual Adaptation
Authors:
Raghavv Goel,
Sayan Basu Roy
Abstract:
This paper proposes a composite adaptive control architecture using dual adaptation scheme for dynamical systems comprising time-varying uncertain parameters. While majority of the adaptive control schemes in literature address the case of constant parameters, recent research has conceptualized improved adaptive control techniques for time-varying systems with rigorous stability proofs. The propos…
▽ More
This paper proposes a composite adaptive control architecture using dual adaptation scheme for dynamical systems comprising time-varying uncertain parameters. While majority of the adaptive control schemes in literature address the case of constant parameters, recent research has conceptualized improved adaptive control techniques for time-varying systems with rigorous stability proofs. The proposed work is an effort towards a similar direction, where a novel dual adaptation mechanism is introduced to efficiently tackle the time-varying nature of the parameters. Projection and $σ$-modification algorithms are strategically combined using congelation of variables to claim a global result for the tracking error space. While the classical adaptive systems demand a restrictive condition of persistence of excitation (PE) for accurate parameter estimation, the proposed work relies on a milder condition, called initial excitation (IE) for the same. A rigorous Lyapunov stability analysis is carried out to establish uniformly ultimately bounded (UUB) stability of the closed-loop system. Further it is analytically shown that the proposed work can recover the performance of previously designed IE-based adaptive controller in case of time invariant systems.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
DDXPlus: A New Dataset For Automatic Medical Diagnosis
Authors:
Arsene Fansi Tchango,
Rishab Goel,
Zhi Wen,
Julien Martel,
Joumana Ghosn
Abstract:
There has been a rapidly growing interest in Automatic Symptom Detection (ASD) and Automatic Diagnosis (AD) systems in the machine learning research literature, aiming to assist doctors in telemedicine services. These systems are designed to interact with patients, collect evidence about their symptoms and relevant antecedents, and possibly make predictions about the underlying diseases. Doctors w…
▽ More
There has been a rapidly growing interest in Automatic Symptom Detection (ASD) and Automatic Diagnosis (AD) systems in the machine learning research literature, aiming to assist doctors in telemedicine services. These systems are designed to interact with patients, collect evidence about their symptoms and relevant antecedents, and possibly make predictions about the underlying diseases. Doctors would review the interactions, including the evidence and the predictions, collect if necessary additional information from patients, before deciding on next steps. Despite recent progress in this area, an important piece of doctors' interactions with patients is missing in the design of these systems, namely the differential diagnosis. Its absence is largely due to the lack of datasets that include such information for models to train on. In this work, we present a large-scale synthetic dataset of roughly 1.3 million patients that includes a differential diagnosis, along with the ground truth pathology, symptoms and antecedents for each patient. Unlike existing datasets which only contain binary symptoms and antecedents, this dataset also contains categorical and multi-choice symptoms and antecedents useful for efficient data collection. Moreover, some symptoms are organized in a hierarchy, making it possible to design systems able to interact with patients in a logical way. As a proof-of-concept, we extend two existing AD and ASD systems to incorporate the differential diagnosis, and provide empirical evidence that using differentials as training signals is essential for the efficiency of such systems or for helping doctors better understand the reasoning of those systems.
△ Less
Submitted 13 October, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation
Authors:
Raman Goel,
Seba Susan,
Sachin Vashisht,
Armaan Dhanda
Abstract:
Modern day conversational agents are trained to emulate the manner in which humans communicate. To emotionally bond with the user, these virtual agents need to be aware of the affective state of the user. Transformers are the recent state of the art in sequence-to-sequence learning that involves training an encoder-decoder model with word embeddings from utterance-response pairs. We propose an emo…
▽ More
Modern day conversational agents are trained to emulate the manner in which humans communicate. To emotionally bond with the user, these virtual agents need to be aware of the affective state of the user. Transformers are the recent state of the art in sequence-to-sequence learning that involves training an encoder-decoder model with word embeddings from utterance-response pairs. We propose an emotion-aware transformer encoder for capturing the emotional quotient in the user utterance in order to generate human-like empathetic responses. The contributions of our paper are as follows: 1) An emotion detector module trained on the input utterances determines the affective state of the user in the initial phase 2) A novel transformer encoder is proposed that adds and normalizes the word embedding with emotion embedding thereby integrating the semantic and affective aspects of the input utterance 3) The encoder and decoder stacks belong to the Transformer-XL architecture which is the recent state of the art in language modeling. Experimentation on the benchmark Facebook AI empathetic dialogue dataset confirms the efficacy of our model from the higher BLEU-4 scores achieved for the generated responses as compared to existing methods. Emotionally intelligent virtual agents are now a reality and inclusion of affect as a modality in all human-machine interfaces is foreseen in the immediate future.
△ Less
Submitted 24 April, 2022;
originally announced April 2022.
-
Improving Top-K Decoding for Non-Autoregressive Semantic Parsing via Intent Conditioning
Authors:
Geunseob Oh,
Rahul Goel,
Chris Hidey,
Shachi Paul,
Aditya Gupta,
Pararth Shah,
Rushin Shah
Abstract:
Semantic parsing (SP) is a core component of modern virtual assistants like Google Assistant and Amazon Alexa. While sequence-to-sequence-based auto-regressive (AR) approaches are common for conversational semantic parsing, recent studies employ non-autoregressive (NAR) decoders and reduce inference latency while maintaining competitive parsing quality. However, a major drawback of NAR decoders is…
▽ More
Semantic parsing (SP) is a core component of modern virtual assistants like Google Assistant and Amazon Alexa. While sequence-to-sequence-based auto-regressive (AR) approaches are common for conversational semantic parsing, recent studies employ non-autoregressive (NAR) decoders and reduce inference latency while maintaining competitive parsing quality. However, a major drawback of NAR decoders is the difficulty of generating top-k (i.e., k-best) outputs with approaches such as beam search. To address this challenge, we propose a novel NAR semantic parser that introduces intent conditioning on the decoder. Inspired by the traditional intent and slot tagging parsers, we decouple the top-level intent prediction from the rest of a parse. As the top-level intent largely governs the syntax and semantics of a parse, the intent conditioning allows the model to better control beam search and improves the quality and diversity of top-k outputs. We introduce a hybrid teacher-forcing approach to avoid training and inference mismatch. We evaluate the proposed NAR on conversational SP datasets, TOP & TOPv2. Like the existing NAR models, we maintain the O(1) decoding time complexity while generating more diverse outputs and improving the top-3 exact match (EM) by 2.4 points. In comparison with AR models, our model speeds up beam search inference by 6.7 times on CPU with competitive top-k EM.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments
Authors:
Christopher Hidey,
Fei Liu,
Rahul Goel
Abstract:
Retraining modern deep learning systems can lead to variations in model performance even when trained using the same data and hyper-parameters by simply using different random seeds. We call this phenomenon model jitter. This issue is often exacerbated in production settings, where models are retrained on noisy data. In this work we tackle the problem of stable retraining with a focus on conversat…
▽ More
Retraining modern deep learning systems can lead to variations in model performance even when trained using the same data and hyper-parameters by simply using different random seeds. We call this phenomenon model jitter. This issue is often exacerbated in production settings, where models are retrained on noisy data. In this work we tackle the problem of stable retraining with a focus on conversational semantic parsers. We first quantify the model jitter problem by introducing the model agreement metric and showing the variation with dataset noise and model sizes. We then demonstrate the effectiveness of various jitter reduction techniques such as ensembling and distillation. Lastly, we discuss practical trade-offs between such techniques and show that co-distillation provides a sweet spot in terms of jitter reduction for semantic parsing systems with only a modest increase in resource usage.
△ Less
Submitted 23 September, 2022; v1 submitted 10 April, 2022;
originally announced April 2022.
-
Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions
Authors:
David Bieber,
Rishab Goel,
Daniel Zheng,
Hugo Larochelle,
Daniel Tarlow
Abstract:
The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in isolation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runti…
▽ More
The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in isolation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runtime errors in a "static" setting, where program execution is not possible? Here, we introduce a real-world dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. We approach this task by developing an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and "learns to execute" descriptions of the contents of external resources. Surprisingly, we show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error. In total, we present a practical and difficult-yet-approachable challenge problem related to learning program execution and we demonstrate promising new capabilities of interpreter-inspired machine learning models for code.
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
TableFormer: Robust Transformer Modeling for Table-Text Encoding
Authors:
Jingfeng Yang,
Aditya Gupta,
Shyam Upadhyay,
Luheng He,
Rahul Goel,
Shachi Paul
Abstract:
Understanding tables is an important aspect of natural language understanding. Existing models for table understanding require linearization of the table structure, where row or column order is encoded as an unwanted bias. Such spurious biases make the model vulnerable to row and column order perturbations. Additionally, prior work has not thoroughly modeled the table structures or table-text alig…
▽ More
Understanding tables is an important aspect of natural language understanding. Existing models for table understanding require linearization of the table structure, where row or column order is encoded as an unwanted bias. Such spurious biases make the model vulnerable to row and column order perturbations. Additionally, prior work has not thoroughly modeled the table structures or table-text alignments, hindering the table-text understanding ability. In this work, we propose a robust and structurally aware table-text encoding architecture TableFormer, where tabular structural biases are incorporated completely through learnable attention biases. TableFormer is (1) strictly invariant to row and column orders, and, (2) could understand tables better due to its tabular inductive biases. Our evaluations showed that TableFormer outperforms strong baselines in all settings on SQA, WTQ and TabFact table reasoning datasets, and achieves state-of-the-art performance on SQA, especially when facing answer-invariant row and column order perturbations (6% improvement over the best baseline), because previous SOTA models' performance drops by 4% - 6% when facing such perturbations while TableFormer is not affected.
△ Less
Submitted 3 May, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Pre-Trained Language Transformers are Universal Image Classifiers
Authors:
Rahul Goel,
Modar Sulaiman,
Kimia Noorbakhsh,
Mahdi Sharifi,
Rajesh Sharma,
Pooyan Jamshidi,
Kallol Roy
Abstract:
Facial images disclose many hidden personal traits such as age, gender, race, health, emotion, and psychology. Understanding these traits will help to classify the people in different attributes. In this paper, we have presented a novel method for classifying images using a pretrained transformer model. We apply the pretrained transformer for the binary classification of facial images in criminal…
▽ More
Facial images disclose many hidden personal traits such as age, gender, race, health, emotion, and psychology. Understanding these traits will help to classify the people in different attributes. In this paper, we have presented a novel method for classifying images using a pretrained transformer model. We apply the pretrained transformer for the binary classification of facial images in criminal and non-criminal classes. The pretrained transformer of GPT-2 is trained to generate text and then fine-tuned to classify facial images. During the finetuning process with images, most of the layers of GT-2 are frozen during backpropagation and the model is frozen pretrained transformer (FPT). The FPT acts as a universal image classifier, and this paper shows the application of FPT on facial images. We also use our FPT on encrypted images for classification. Our FPT shows high accuracy on both raw facial images and encrypted images. We hypothesize the meta-learning capacity FPT gained because of its large size and trained on a large size with theory and experiments. The GPT-2 trained to generate a single word token at a time, through the autoregressive process, forced to heavy-tail distribution. Then the FPT uses the heavy-tail property as its meta-learning capacity for classifying images. Our work shows one way to avoid bias during the machine classification of images.The FPT encodes worldly knowledge because of the pretraining of one text, which it uses during the classification. The statistical error of classification is reduced because of the added context gained from the text.Our paper shows the ethical dimension of using encrypted data for classification.Criminal images are sensitive to share across the boundary but encrypted largely evades ethical concern.FPT showing good classification accuracy on encrypted images shows promise for further research on privacy-preserving machine learning.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Clickbait in YouTube Prevention, Detection and Analysis of the Bait using Ensemble Learning
Authors:
Peya Mowar,
Mini Jain,
Ruchika Goel,
Dinesh Kumar Vishwakarma
Abstract:
Unscrupulous content creators on YouTube employ deceptive techniques such as spam and clickbait to reach a broad audience and trick users into clicking on their videos to increase their advertisement revenue. Clickbait detection on YouTube requires an in depth examination and analysis of the intricate relationship between the video content and video descriptors title and thumbnail. However, the cu…
▽ More
Unscrupulous content creators on YouTube employ deceptive techniques such as spam and clickbait to reach a broad audience and trick users into clicking on their videos to increase their advertisement revenue. Clickbait detection on YouTube requires an in depth examination and analysis of the intricate relationship between the video content and video descriptors title and thumbnail. However, the current solutions are mostly centred around the study of video descriptors and other metadata such as likes, tags, comments, etc and fail to utilize the video content, both video and audio. Therefore, we introduce a novel model to detect clickbaits on YouTube that consider the relationship between video content and title or thumbnail. The proposed model consists of a stacking classifier framework composed of six base models (K Nearest Neighbours, Support Vector Machine, XGBoost, Naive Bayes, Logistic Regression, and Multilayer Perceptron) and a meta classifier. The developed clickbait detection model achieved a high accuracy of 92.89% for the novel BollyBAIT dataset and 95.38% for Misleading Video Dataset. Additionally, the stated classifier does not use meta features or other statistics dependent on user interaction with the video (the number of likes, followers, or comments) for classification, and thus, can be used to detect potential clickbait videos before they are uploaded, thereby preventing the nuisance of clickbaits altogether and improving the users streaming experience.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Products of reflections in smooth Bruhat intervals
Authors:
Christian Gaetz,
Ram K. Goel
Abstract:
A permutation is called smooth if the corresponding Schubert variety is smooth. Gilboa and Lapid prove that in the symmetric group, multiplying the reflections below a smooth element $w$ in Bruhat order in a compatible order yields back the element $w$. We strengthen this result by showing that such a product in fact determines a saturated chain $e \to w$ in Bruhat order, and that this property ch…
▽ More
A permutation is called smooth if the corresponding Schubert variety is smooth. Gilboa and Lapid prove that in the symmetric group, multiplying the reflections below a smooth element $w$ in Bruhat order in a compatible order yields back the element $w$. We strengthen this result by showing that such a product in fact determines a saturated chain $e \to w$ in Bruhat order, and that this property characterizes smooth elements.
△ Less
Submitted 24 February, 2023; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Do Facial Trait Correlates with Roll Call Voting in Parliament? Using fWHR to Study Performance in Politics
Authors:
Rahul Goel,
Tymofii Brik,
Rajesh Sharma
Abstract:
Research has shown that people recognize and select leaders based on their facial appearance. However, considering the correlation between the performance of leaders and their facial traits, empirical findings are mixed. This paper adds to the debate by focusing on two previously understudied aspects of facial traits among political leaders: (i) previous studies have focused on electoral success a…
▽ More
Research has shown that people recognize and select leaders based on their facial appearance. However, considering the correlation between the performance of leaders and their facial traits, empirical findings are mixed. This paper adds to the debate by focusing on two previously understudied aspects of facial traits among political leaders: (i) previous studies have focused on electoral success and achievement drive of politicians omitting their actual daily performance after elections; (ii) previous research has analyzed individual politicians omitting the context of social circumstances which potentially influence their performance. We address these issues by analyzing Ukrainian members of parliament (MPs) who voted for bills in six consecutive Verkhovna Rada starting from Rada 4 (2002-06) to Rada 9 (2019-Present) to study politicians' performance, which is defined as co-voting or cooperation between MPs in voting on the same bill. In simple words, we analyze whether politicians tend to follow leaders when voting. This ability to summon the votes of others is interpreted as better performance. To measure performance, we proposed a generic methodology named Feature Importance For Measuring Performance (FIMP) that can be used in various scenarios. Using FIMP, our data suggest that MPs vote has no impact from their colleagues with higher or lower facial width-to-height ratio (fWHR), a popular measure of the facial trait.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Misinformation Detection on YouTube Using Video Captions
Authors:
Raj Jagtap,
Abhinav Kumar,
Rahul Goel,
Shakshi Sharma,
Rajesh Sharma,
Clint P. George
Abstract:
Millions of people use platforms such as YouTube, Facebook, Twitter, and other mass media. Due to the accessibility of these platforms, they are often used to establish a narrative, conduct propaganda, and disseminate misinformation. This work proposes an approach that uses state-of-the-art NLP techniques to extract features from video captions (subtitles). To evaluate our approach, we utilize a p…
▽ More
Millions of people use platforms such as YouTube, Facebook, Twitter, and other mass media. Due to the accessibility of these platforms, they are often used to establish a narrative, conduct propaganda, and disseminate misinformation. This work proposes an approach that uses state-of-the-art NLP techniques to extract features from video captions (subtitles). To evaluate our approach, we utilize a publicly accessible and labeled dataset for classifying videos as misinformation or not. The motivation behind exploring video captions stems from our analysis of videos metadata. Attributes such as the number of views, likes, dislikes, and comments are ineffective as videos are hard to differentiate using this information. Using caption dataset, the proposed models can classify videos among three classes (Misinformation, Debunking Misinformation, and Neutral) with 0.85 to 0.90 F1-score. To emphasize the relevance of the misinformation class, we re-formulate our classification problem as a two-class classification - Misinformation vs. others (Debunking Misinformation and Neutral). In our experiments, the proposed models can classify videos with 0.92 to 0.95 F1-score and 0.78 to 0.90 AUC ROC.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
Overall Behavioural Index (OBI) For Measuring Segregation
Authors:
Rahul Goel,
Rajesh Sharma,
Anto Aasa
Abstract:
Segregation, defined as the degree of separation between two or more population groups, helps to understand a complex social environment and subsequently provides a basis for public policy intervention. To measure segregation, past works often propose indexes that are criticized for being over-simplified and over-reduced. In other words, these indexes use the highly aggregated information to measu…
▽ More
Segregation, defined as the degree of separation between two or more population groups, helps to understand a complex social environment and subsequently provides a basis for public policy intervention. To measure segregation, past works often propose indexes that are criticized for being over-simplified and over-reduced. In other words, these indexes use the highly aggregated information to measure segregation. In this paper, we propose three novel indexes to measure segregation, namely: (i) Individual Segregation Index (ISI), (ii) Individual Inclination Index (III), and (iii) Overall Behavioural Index (OBI). The ISI index measures individuals' segregation, and the III index reports the individuals' inclination towards other population groups. The OBI index, calculated using both III and ISI index, is non-simplified and not only recognizes individuals' connectivity behaviour but group's connectivity behavioural distribution as well. By considering commonly used Freeman's segregation and homophily index as baseline indexes, we compare the OBI index on real call data records (CDR) dataset of Estonia to show the effectiveness of the proposed indexes.
△ Less
Submitted 20 June, 2021;
originally announced June 2021.
-
Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue Systems
Authors:
Anish Acharya,
Suranjit Adhikari,
Sanchit Agarwal,
Vincent Auvray,
Nehal Belgamwar,
Arijit Biswas,
Shubhra Chandra,
Tagyoung Chung,
Maryam Fazel-Zarandi,
Raefer Gabriel,
Shuyang Gao,
Rahul Goel,
Dilek Hakkani-Tur,
Jan Jezabek,
Abhay Jha,
Jiun-Yu Kao,
Prakash Krishnan,
Peter Ku,
Anuj Goyal,
Chien-Wei Lin,
Qing Liu,
Arindam Mandal,
Angeliki Metallinou,
Vishal Naik,
Yi Pan
, et al. (6 additional authors not shown)
Abstract:
Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and…
▽ More
Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and do not scale either. End-to-End dialogue systems, on the other hand, do not require module-specific annotations but need a large amount of data for training. To overcome these problems, in this demo, we present Alexa Conversations, a new approach for building goal-oriented dialogue systems that is scalable, extensible as well as data efficient. The components of this system are trained in a data-driven manner, but instead of collecting annotated conversations for training, we generate them using a novel dialogue simulator based on a few seed dialogues and specifications of APIs and entities provided by the developer. Our approach provides out-of-the-box support for natural conversational phenomena like entity sharing across turns or users changing their mind during conversation without requiring developers to provide any such dialogue flows. We exemplify our approach using a simple pizza ordering task and showcase its value in reducing the developer burden for creating a robust experience. Finally, we evaluate our system using a typical movie ticket booking task and show that the dialogue simulator is an essential component of the system that leads to over $50\%$ improvement in turn-level action signature prediction accuracy.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Identification and Development of Therapeutics for COVID-19
Authors:
Halie M. Rando,
Nils Wellhausen,
Soumita Ghosh,
Alexandra J. Lee,
Anna Ada Dattoli,
Fengling Hu,
James Brian Byrd,
Diane N. Rafizadeh,
Ronan Lordan,
Yanjun Qi,
Yuchen Sun,
Christian Brueffer,
Jeffrey M. Field,
Marouen Ben Guebila,
Nafisa M. Jadavji,
Ashwin N. Skelly,
Bharath Ramsundar,
Jinhui Wang,
Rishi Raj Goel,
YoSon Park,
the COVID-19 Review Consortium,
Simina M. Boca,
Anthony Gitter,
Casey S. Greene
Abstract:
After emerging in China in late 2019, the novel Severe acute respiratory syndrome-like coronavirus 2 (SARS-CoV-2) spread worldwide and as of early 2021, continues to significantly impact most countries. Only a small number of coronaviruses are known to infect humans, and only two are associated with the severe outcomes associated with SARS-CoV-2: Severe acute respiratory syndrome-related coronavir…
▽ More
After emerging in China in late 2019, the novel Severe acute respiratory syndrome-like coronavirus 2 (SARS-CoV-2) spread worldwide and as of early 2021, continues to significantly impact most countries. Only a small number of coronaviruses are known to infect humans, and only two are associated with the severe outcomes associated with SARS-CoV-2: Severe acute respiratory syndrome-related coronavirus, a closely related species of SARS-CoV-2 that emerged in 2002, and Middle East respiratory syndrome-related coronavirus, which emerged in 2012. Both of these previous epidemics were controlled fairly rapidly through public health measures, and no vaccines or robust therapeutic interventions were identified. However, previous insights into the immune response to coronaviruses gained during the outbreaks of severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) have proved beneficial to identifying approaches to the treatment and prophylaxis of novel coronavirus disease 2019 (COVID-19). A number of potential therapeutics against SARS-CoV-2 and the resultant COVID-19 illness were rapidly identified, leading to a large number of clinical trials investigating a variety of possible therapeutic approaches being initiated early on in the pandemic. As a result, a small number of therapeutics have already been authorized by regulatory agencies such as the Food and Drug Administration (FDA) in the United States, and many other therapeutics remain under investigation. Here, we describe a range of approaches for the treatment of COVID-19, along with their proposed mechanisms of action and the current status of clinical investigation into each candidate. The status of these investigations will continue to evolve, and this review will be updated as progress is made.
△ Less
Submitted 10 September, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure
Authors:
Halie M. Rando,
Adam L. MacLean,
Alexandra J. Lee,
Ronan Lordan,
Sandipan Ray,
Vikas Bansal,
Ashwin N. Skelly,
Elizabeth Sell,
John J. Dziak,
Lamonica Shinholster,
Lucy D'Agostino McGowan,
Marouen Ben Guebila,
Nils Wellhausen,
Sergey Knyazev,
Simina M. Boca,
Stephen Capone,
Yanjun Qi,
YoSon Park,
Yuchen Sun,
David Mai,
Joel D. Boerckel,
Christian Brueffer,
James Brian Byrd,
Jeremy P. Kamil,
Jinhui Wang
, et al. (9 additional authors not shown)
Abstract:
The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the…
▽ More
The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease.
△ Less
Submitted 3 December, 2021; v1 submitted 1 February, 2021;
originally announced February 2021.
-
Studying Leaders & Their Concerns Using Online Social Media During The Times Of Crisis -- A COVID Case Study
Authors:
Rahul Goel,
Rajesh Sharma
Abstract:
Online social media (OSM) has emerged as a prominent platform for debate on a wide range of issues. Even celebrities and public figures often share their opinions on a variety of topics through OSM platforms. One such subject that has gained a lot of coverage on Twitter is the Novel Coronavirus, officially known as COVID-19, which has become a pandemic and has sparked a crisis in human history. In…
▽ More
Online social media (OSM) has emerged as a prominent platform for debate on a wide range of issues. Even celebrities and public figures often share their opinions on a variety of topics through OSM platforms. One such subject that has gained a lot of coverage on Twitter is the Novel Coronavirus, officially known as COVID-19, which has become a pandemic and has sparked a crisis in human history. In this study, we examine 29 million tweets over three months to study highly influential users, whom we refer to as leaders. We recognize these leaders through social network techniques and analyze their tweets using text analysis. Using a community detection algorithm, we categorize these leaders into four clusters: research, news, health, and politics, with each cluster containing Twitter handles (accounts) of individual users or organizations. E.g., the health cluster includes the World Health Organization (@WHO), the Director-General of WHO (@DrTedros), and so on. The emotion analysis reveals that (i) all clusters show an equal amount of fear in their tweets, (ii) research and news clusters display more sadness than others, and (iii) health and politics clusters are attempting to win public trust. According to the text analysis, the (i) research cluster is more concerned with recognizing symptoms and the development of vaccination; (ii) news and politics clusters are mostly concerned with travel. We then show that we can use our findings to classify tweets into clusters with a score of 96% AUC ROC.
△ Less
Submitted 27 May, 2021; v1 submitted 8 January, 2021;
originally announced January 2021.
-
COVID-19 and the stock market: evidence from Twitter
Authors:
Rahul Goel,
Lucas Javier Ford,
Maksym Obrizan,
Rajesh Sharma
Abstract:
COVID-19 has had a much larger impact on the financial markets compared to previous epidemics because the news information is transferred over the social networks at a speed of light. Using Twitter's API, we compiled a unique dataset with more than 26 million COVID-19 related Tweets collected from February 2nd until May 1st, 2020. We find that more frequent use of the word "stock" in daily Tweets…
▽ More
COVID-19 has had a much larger impact on the financial markets compared to previous epidemics because the news information is transferred over the social networks at a speed of light. Using Twitter's API, we compiled a unique dataset with more than 26 million COVID-19 related Tweets collected from February 2nd until May 1st, 2020. We find that more frequent use of the word "stock" in daily Tweets is associated with a substantial decline in log returns of three key US indices - Dow Jones Industrial Average, S&P500, and NASDAQ. The results remain virtually unchanged in multiple robustness checks.
△ Less
Submitted 13 November, 2020;
originally announced November 2020.
-
Linear Predictive Coding for Acute Stress Prediction from Computer Mouse Movements
Authors:
Lawrence H. Kim,
Rahul Goel,
Jia Liang,
Mert Pilanci,
Pablo E. Paredes
Abstract:
Prior work demonstrated the potential of using the Linear Predictive Coding (LPC) filter to approximate muscle stiffness and damping from computer mouse movements to predict acute stress levels of users. Theoretically, muscle stiffness and damping in the arm can be estimated using a mass-spring-damper (MSD) biomechanical model. However, the damping frequency (i.e., stiffness) and damping ratio val…
▽ More
Prior work demonstrated the potential of using the Linear Predictive Coding (LPC) filter to approximate muscle stiffness and damping from computer mouse movements to predict acute stress levels of users. Theoretically, muscle stiffness and damping in the arm can be estimated using a mass-spring-damper (MSD) biomechanical model. However, the damping frequency (i.e., stiffness) and damping ratio values derived using LPC were not yet compared with those from a theoretical MSD model. This work demonstrates that the damping frequency and damping ratio from LPC are significantly correlated with those from an MSD model, thus confirming the validity of using LPC to infer muscle stiffness and damping. We also compare the stress level binary classification performance using the values from LPC and MSD with each other and with neural network-based baselines. We found comparable performance across all conditions demonstrating LPC and MSD model-based stress prediction efficacy, especially for longer mouse trajectories. Clinical relevance: This work demonstrates the validity of the LPC filter to approximate muscle stiffness and damping and predict acute stress from computer mouse movements.
△ Less
Submitted 15 December, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Update Frequently, Update Fast: Retraining Semantic Parsing Systems in a Fraction of Time
Authors:
Vladislav Lialin,
Rahul Goel,
Andrey Simanovsky,
Anna Rumshisky,
Rushin Shah
Abstract:
Currently used semantic parsing systems deployed in voice assistants can require weeks to train. Datasets for these models often receive small and frequent updates, data patches. Each patch requires training a new model. To reduce training time, one can fine-tune the previously trained model on each patch, but naive fine-tuning exhibits catastrophic forgetting - degradation of the model performanc…
▽ More
Currently used semantic parsing systems deployed in voice assistants can require weeks to train. Datasets for these models often receive small and frequent updates, data patches. Each patch requires training a new model. To reduce training time, one can fine-tune the previously trained model on each patch, but naive fine-tuning exhibits catastrophic forgetting - degradation of the model performance on the data not represented in the data patch. In this work, we propose a simple method that alleviates catastrophic forgetting and show that it is possible to match the performance of a model trained from scratch in less than 10% of a time via fine-tuning. The key to achieving this is supersampling and EWC regularization. We demonstrate the effectiveness of our method on multiple splits of the Facebook TOP and SNIPS datasets.
△ Less
Submitted 22 March, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Data-Efficient Reinforcement Learning with Self-Predictive Representations
Authors:
Max Schwarzer,
Ankesh Anand,
Rishab Goel,
R Devon Hjelm,
Aaron Courville,
Philip Bachman
Abstract:
While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interaction with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential intera…
▽ More
While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interaction with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Self-Predictive Representations(SPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent's parameters and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.415 on Atari in a setting limited to 100k steps of environment interaction, which represents a 55% relative improvement over the previous state-of-the-art. Notably, even in this limited data regime, SPR exceeds expert human scores on 7 out of 26 games. The code associated with this work is available at https://github.com/mila-iqia/spr
△ Less
Submitted 20 May, 2021; v1 submitted 12 July, 2020;
originally announced July 2020.
-
Out-of-Sample Representation Learning for Multi-Relational Graphs
Authors:
Marjan Albooyeh,
Rishab Goel,
Seyed Mehran Kazemi
Abstract:
Many important problems can be formulated as reasoning in knowledge graphs. Representation learning has proved extremely effective for transductive reasoning, in which one needs to make new predictions for already observed entities. This is true for both attributed graphs(where each entity has an initial feature vector) and non-attributed graphs (where the only initial information derives from kno…
▽ More
Many important problems can be formulated as reasoning in knowledge graphs. Representation learning has proved extremely effective for transductive reasoning, in which one needs to make new predictions for already observed entities. This is true for both attributed graphs(where each entity has an initial feature vector) and non-attributed graphs (where the only initial information derives from known relations with other entities). For out-of-sample reasoning, where one needs to make predictions for entities that were unseen at training time, much prior work considers attributed graph. However, this problem is surprisingly under-explored for non-attributed graphs. In this paper, we study the out-of-sample representation learning problem for non-attributed knowledge graphs, create benchmark datasets for this task, develop several models and baselines, and provide empirical analyses and comparisons of the proposed models and baselines.
△ Less
Submitted 23 October, 2020; v1 submitted 27 April, 2020;
originally announced April 2020.
-
Mobility Based SIR Model For Pandemics -- With Case Study Of COVID-19
Authors:
Rahul Goel,
Rajesh Sharma
Abstract:
In the last decade, humanity has faced many different pandemics such as SARS, H1N1, and presently novel coronavirus (COVID-19). On one side, scientists are focusing on vaccinations, and on the other side, there is a need to propose models that can help us in understanding the spread of these pandemics as it can help governmental and other concerned agencies to be well prepared, especially from pan…
▽ More
In the last decade, humanity has faced many different pandemics such as SARS, H1N1, and presently novel coronavirus (COVID-19). On one side, scientists are focusing on vaccinations, and on the other side, there is a need to propose models that can help us in understanding the spread of these pandemics as it can help governmental and other concerned agencies to be well prepared, especially from pandemics, which spreads faster like COVID-19. The main reason for some epidemic turning into pandemics is the connectivity among different regions of the world, which makes it easier to affect a wider geographical area, often worldwide. In addition, the population distribution and social coherence in the different regions of the world is non-uniform. Thus, once the epidemic enters a region, then the local population distribution plays an important role. Inspired by these ideas, we proposed a mobility-based SIR model for epidemics, which especially takes into account pandemic situations. To the best of our knowledge, this model is first of its kind, which takes into account the population distribution and connectivity of different geographic locations across the globe. In addition to presenting the mathematical proof of our model, we have performed extensive simulations using synthetic data to demonstrate our model's generalizability. To demonstrate the wider scope of our model, we used our model to forecast the COVID-19 cases for Estonia.
△ Less
Submitted 26 April, 2020;
originally announced April 2020.