-
A Unified Differentiable Boolean Operator with Fuzzy Logic
Authors:
Hsueh-Ti Derek Liu,
Maneesh Agrawala,
Cem Yuksel,
Tim Omernick,
Vinith Misra,
Stefano Corazza,
Morgan McGuire,
Victor Zordan
Abstract:
This paper presents a unified differentiable boolean operator for implicit solid shape modeling using Constructive Solid Geometry (CSG). Traditional CSG relies on min, max operators to perform boolean operations on implicit shapes. But because these boolean operators are discontinuous and discrete in the choice of operations, this makes optimization over the CSG representation challenging. Drawing…
▽ More
This paper presents a unified differentiable boolean operator for implicit solid shape modeling using Constructive Solid Geometry (CSG). Traditional CSG relies on min, max operators to perform boolean operations on implicit shapes. But because these boolean operators are discontinuous and discrete in the choice of operations, this makes optimization over the CSG representation challenging. Drawing inspiration from fuzzy logic, we present a unified boolean operator that outputs a continuous function and is differentiable with respect to operator types. This enables optimization of both the primitives and the boolean operations employed in CSG with continuous optimization techniques, such as gradient descent. We further demonstrate that such a continuous boolean operator allows modeling of both sharp mechanical objects and smooth organic shapes with the same framework. Our proposed boolean operator opens up new possibilities for future research toward fully continuous CSG optimization.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Personalized Predictions from Population Level Experiments: A Study on Alzheimer's Disease
Authors:
Dennis Shen,
Anish Agarwal,
Vishal Misra,
Bjoern Schelter,
Devavrat Shah,
Helen Shiells,
Claude Wischik
Abstract:
The purpose of this article is to infer patient level outcomes from population level randomized control trials (RCTs). In this pursuit, we utilize the recently proposed synthetic nearest neighbors (SNN) estimator. At its core, SNN leverages information across patients to impute missing data associated with each patient of interest. We focus on two types of missing data: (i) unrecorded outcomes fro…
▽ More
The purpose of this article is to infer patient level outcomes from population level randomized control trials (RCTs). In this pursuit, we utilize the recently proposed synthetic nearest neighbors (SNN) estimator. At its core, SNN leverages information across patients to impute missing data associated with each patient of interest. We focus on two types of missing data: (i) unrecorded outcomes from discontinuing the assigned treatments and (ii) unobserved outcomes associated with unassigned treatments. Data imputation in the former powers and de-biases RCTs, while data imputation in the latter simulates "synthetic RCTs" to predict the outcomes for each patient under every treatment. The SNN estimator is interpretable, transparent, and causally justified under a broad class of missing data scenarios. Relative to several standard methods, we empirically find that SNN performs well for the above two applications using Phase 3 clinical trial data on patients with Alzheimer's Disease. Our findings directly suggest that SNN can tackle a current pain point within the clinical trial workflow on patient dropouts and serve as a new tool towards the development of precision medicine. Building on our insights, we discuss how SNN can further generalize to real-world applications.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1110 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 8 August, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference
Authors:
Siddhartha Dalal,
Vishal Misra
Abstract:
This paper introduces a novel Bayesian learning model to explain the behavior of Large Language Models (LLMs), focusing on their core optimization metric of next token prediction. We develop a theoretical framework based on an ideal generative text model represented by a multinomial transition probability matrix with a prior, and examine how LLMs approximate this matrix. Key contributions include:…
▽ More
This paper introduces a novel Bayesian learning model to explain the behavior of Large Language Models (LLMs), focusing on their core optimization metric of next token prediction. We develop a theoretical framework based on an ideal generative text model represented by a multinomial transition probability matrix with a prior, and examine how LLMs approximate this matrix. Key contributions include: (i) a continuity theorem relating embeddings to multinomial distributions, (ii) a demonstration that LLM text generation aligns with Bayesian learning principles, (iii) an explanation for the emergence of in-context learning in larger models, (iv) empirical validation using visualizations of next token probabilities from an instrumented Llama model Our findings provide new insights into LLM functioning, offering a statistical foundation for understanding their capabilities and limitations. This framework has implications for LLM design, training, and application, potentially guiding future developments in the field.
△ Less
Submitted 24 September, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Estimating Ground Reaction Forces from Inertial Sensors
Authors:
Bowen Song,
Marco Paolieri,
Harper E. Stewart,
Leana Golubchik,
Jill L. McNitt-Gray,
Vishal Misra,
Devavrat Shah
Abstract:
Objective: Our aim is to determine if data collected with inertial measurement units (IMUs) during steady-state running could be used to estimate ground reaction forces (GRFs) and to derive biomechanical variables (e.g., contact time, impulse, change in velocity) using lightweight machine-learning approaches. In contrast, state-of-the-art estimation using LSTMs suffers from prohibitive inference t…
▽ More
Objective: Our aim is to determine if data collected with inertial measurement units (IMUs) during steady-state running could be used to estimate ground reaction forces (GRFs) and to derive biomechanical variables (e.g., contact time, impulse, change in velocity) using lightweight machine-learning approaches. In contrast, state-of-the-art estimation using LSTMs suffers from prohibitive inference times on edge devices, requires expensive training and hyperparameter optimization, and results in black box models. Methods: We proposed a novel lightweight solution, SVD Embedding Regression (SER), using linear regression between SVD embeddings of IMU data and GRF data. We also compared lightweight solutions including SER and k-Nearest-Neighbors (KNN) regression with state-of-the-art LSTMs. Results: We performed extensive experiments to evaluate these techniques under multiple scenarios and combinations of IMU signals and quantified estimation errors for predicting GRFs and biomechanical variables. We did this using training data from different athletes, from the same athlete, or both, and we explored the use of acceleration and angular velocity data from sensors at different locations (sacrum and shanks). Conclusion: Our results illustrated that lightweight solutions such as SER and KNN can be similarly accurate or more accurate than LSTMs. The use of personal data reduced estimation errors of all methods, particularly for most biomechanical variables (as compared to GRFs); moreover, this gain was more pronounced in the lightweight methods. Significance: The study of GRFs is used to characterize the mechanical loading experienced by individuals in movements such as running, which is clinically applicable to identify athletes at risk for stress-related injuries.
△ Less
Submitted 18 September, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
Gap-free 16-year (2005-2020) sub-diurnal surface meteorological observations across Florida
Authors:
Julie Peeling,
Jasmeet Judge,
Vasubandhu Misra,
C. B. Jayasankar,
Rick Lusher
Abstract:
The rather unique sub-tropical, flat, peninsular region of Florida is subject to a unique climate with extreme weather events across the year that impacts agriculture, public health, and management of natural resources. Meteorological data at high temporal resolutions especially in the tropical latitudes are essential to understand diurnal and semi-diurnal variations of climate, which are consider…
▽ More
The rather unique sub-tropical, flat, peninsular region of Florida is subject to a unique climate with extreme weather events across the year that impacts agriculture, public health, and management of natural resources. Meteorological data at high temporal resolutions especially in the tropical latitudes are essential to understand diurnal and semi-diurnal variations of climate, which are considered to be the fundamental modes of climate variations of our Earth system. However, many meteorological datasets contain gaps that limit their use for validation of models and further detailed observational analysis. The objective of this paper is to apply a set of data gap filling strategies to develop a gap-free dataset with 15-minute observations for the sub-tropical region of Florida. Using data from the Florida Automated Weather Network (FAWN), methods of linear interpolation, trend continuation, reference to external sources, and nearest station substitution were applied to fill in the data gaps depending on the extent of the gap. The outcome of this study provides continuous, publicly accessible surface meteorological observations for 30 FAWN stations at 15-minute intervals for the years 2005-2020.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
PaLM 2 Technical Report
Authors:
Rohan Anil,
Andrew M. Dai,
Orhan Firat,
Melvin Johnson,
Dmitry Lepikhin,
Alexandre Passos,
Siamak Shakeri,
Emanuel Taropa,
Paige Bailey,
Zhifeng Chen,
Eric Chu,
Jonathan H. Clark,
Laurent El Shafey,
Yanping Huang,
Kathy Meier-Hellstern,
Gaurav Mishra,
Erica Moreira,
Mark Omernick,
Kevin Robinson,
Sebastian Ruder,
Yi Tay,
Kefan Xiao,
Yuanzhong Xu,
Yujing Zhang,
Gustavo Hernandez Abrego
, et al. (103 additional authors not shown)
Abstract:
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on…
▽ More
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.
When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.
△ Less
Submitted 13 September, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Differentially Private Synthetic Control
Authors:
Saeyoung Rho,
Rachel Cummings,
Vishal Misra
Abstract:
Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool ) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. A…
▽ More
Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool ) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. As synthetic control tools are increasingly applied to sensitive or proprietary data, formal privacy protections are often required. In this work, we provide the first algorithms for differentially private synthetic control with explicit error bounds. Our approach builds upon tools from non-private synthetic control and differentially private empirical risk minimization. We provide upper and lower bounds on the sensitivity of the synthetic control query and provide explicit error bounds on the accuracy of our private synthetic control algorithms. We show that our algorithms produce accurate predictions for the target unit, and that the cost of privacy is small. Finally, we empirically evaluate the performance of our algorithm, and show favorable performance in a variety of parameter regimes, as well as providing guidance to practitioners for hyperparameter tuning.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Robot Synesthesia: A Sound and Emotion Guided AI Painter
Authors:
Vihaan Misra,
Peter Schaldenbrand,
Jean Oh
Abstract:
If a picture paints a thousand words, sound may voice a million. While recent robotic painting and image synthesis methods have achieved progress in generating visuals from text inputs, the translation of sound into images is vastly unexplored. Generally, sound-based interfaces and sonic interactions have the potential to expand accessibility and control for the user and provide a means to convey…
▽ More
If a picture paints a thousand words, sound may voice a million. While recent robotic painting and image synthesis methods have achieved progress in generating visuals from text inputs, the translation of sound into images is vastly unexplored. Generally, sound-based interfaces and sonic interactions have the potential to expand accessibility and control for the user and provide a means to convey complex emotions and the dynamic aspects of the real world. In this paper, we propose an approach for using sound and speech to guide a robotic painting process, known here as robot synesthesia. For general sound, we encode the simulated paintings and input sounds into the same latent space. For speech, we decouple speech into its transcribed text and the tone of the speech. Whereas we use the text to control the content, we estimate the emotions from the tone to guide the mood of the painting. Our approach has been fully integrated with FRIDA, a robotic painting framework, adding sound and speech to FRIDA's existing input modalities, such as text and style. In two surveys, participants were able to correctly guess the emotion or natural sound used to generate a given painting more than twice as likely as random chance. On our sound-guided image manipulation and music-guided paintings, we discuss the results qualitatively.
△ Less
Submitted 23 May, 2024; v1 submitted 9 February, 2023;
originally announced February 2023.
-
Exploring Length Generalization in Large Language Models
Authors:
Cem Anil,
Yuhuai Wu,
Anders Andreassen,
Aitor Lewkowycz,
Vedant Misra,
Vinay Ramasesh,
Ambrose Slone,
Guy Gur-Ari,
Ethan Dyer,
Behnam Neyshabur
Abstract:
The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring th…
▽ More
The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring the length generalization capabilities of transformer-based language models. We first establish that naively finetuning transformers on length generalization tasks shows significant generalization deficiencies independent of model scale. We then show that combining pretrained large language models' in-context learning abilities with scratchpad prompting (asking the model to output solution steps before producing an answer) results in a dramatic improvement in length generalization. We run careful failure analyses on each of the learning modalities and identify common sources of mistakes that highlight opportunities in equipping language models with the ability to generalize to longer problems.
△ Less
Submitted 14 November, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
Solving Quantitative Reasoning Problems with Language Models
Authors:
Aitor Lewkowycz,
Anders Andreassen,
David Dohan,
Ethan Dyer,
Henryk Michalewski,
Vinay Ramasesh,
Ambrose Slone,
Cem Anil,
Imanol Schlag,
Theo Gutman-Solo,
Yuhuai Wu,
Behnam Neyshabur,
Guy Gur-Ari,
Vedant Misra
Abstract:
Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained o…
▽ More
Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.
△ Less
Submitted 30 June, 2022; v1 submitted 29 June, 2022;
originally announced June 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Bandwidth Allocation Games
Authors:
Niloofar Bayat,
Vishal Misra,
Dan Rubenstein
Abstract:
Internet providers often offer data plans that, for each user's monthly billing cycle, guarantee a fixed amount of data at high rates until a byte threshold is reached, at which point the user's data rate is throttled to a lower rate for the remainder of the cycle. In practice, the thresholds and rates of throttling can appear and may be somewhat arbitrary. In this paper, we evaluate the choice of…
▽ More
Internet providers often offer data plans that, for each user's monthly billing cycle, guarantee a fixed amount of data at high rates until a byte threshold is reached, at which point the user's data rate is throttled to a lower rate for the remainder of the cycle. In practice, the thresholds and rates of throttling can appear and may be somewhat arbitrary. In this paper, we evaluate the choice of threshold and rate as an optimization problem (regret minimization) and demonstrate that intuitive formulations of client regret, which preserve desirable fairness properties, lead to optimization problems that have tractably computable solutions.
We begin by exploring the effectiveness of using thresholding mechanisms to modulate overall bandwidth consumption. Next, we separately consider the regret of heterogeneous users who are {\em streamers}, wishing to view content over a finite period of fixed rates, and users who are {\em file downloaders}, desiring a fixed amount of bandwidth per month at their highest obtainable rate. We extend our analysis to a game-theoretic setting where users can choose from a variety of plans that vary the cap on the unbounded-rate data, and demonstrate the convergence of the game. Our model provides a fresh perspective on a fair allocation of resources where the demand is higher than capacity, while focusing on the real-world phenomena of bandwidth throttling practiced by ISPs. We show how the solution to the optimization problem results in allocations that exhibit several desirable fairness properties among the users between whom the capacity must be partitioned.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
PaLM: Scaling Language Modeling with Pathways
Authors:
Aakanksha Chowdhery,
Sharan Narang,
Jacob Devlin,
Maarten Bosma,
Gaurav Mishra,
Adam Roberts,
Paul Barham,
Hyung Won Chung,
Charles Sutton,
Sebastian Gehrmann,
Parker Schuh,
Kensen Shi,
Sasha Tsvyashchenko,
Joshua Maynez,
Abhishek Rao,
Parker Barnes,
Yi Tay,
Noam Shazeer,
Vinodkumar Prabhakaran,
Emily Reif,
Nan Du,
Ben Hutchinson,
Reiner Pope,
James Bradbury,
Jacob Austin
, et al. (42 additional authors not shown)
Abstract:
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran…
▽ More
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.
△ Less
Submitted 5 October, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Authors:
Alethea Power,
Yuri Burda,
Harri Edwards,
Igor Babuschkin,
Vedant Misra
Abstract:
In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of "grokking" a pattern in the data, improving generalization performance from ra…
▽ More
In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of "grokking" a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. We also study generalization as a function of dataset size and find that smaller datasets require increasing amounts of optimization for generalization. We argue that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning: generalization of overparametrized neural networks beyond memorization of the finite training dataset.
△ Less
Submitted 6 January, 2022;
originally announced January 2022.
-
On the Assumptions of Synthetic Control Methods
Authors:
Claudia Shi,
Dhanya Sridhar,
Vishal Misra,
David M. Blei
Abstract:
Synthetic control (SC) methods have been widely applied to estimate the causal effect of large-scale interventions, e.g., the state-wide effect of a change in policy. The idea of synthetic controls is to approximate one unit's counterfactual outcomes using a weighted combination of some other units' observed outcomes. The motivating question of this paper is: how does the SC strategy lead to valid…
▽ More
Synthetic control (SC) methods have been widely applied to estimate the causal effect of large-scale interventions, e.g., the state-wide effect of a change in policy. The idea of synthetic controls is to approximate one unit's counterfactual outcomes using a weighted combination of some other units' observed outcomes. The motivating question of this paper is: how does the SC strategy lead to valid causal inferences? We address this question by re-formulating the causal inference problem targeted by SC with a more fine-grained model, where we change the unit of the analysis from "large units" (e.g., states) to "small units" (e.g., individuals in states). Under this re-formulation, we derive sufficient conditions for the non-parametric causal identification of the causal effect. We highlight two implications of the reformulation: (1) it clarifies where "linearity" comes from, and how it falls naturally out of the more fine-grained and flexible model, and (2) it suggests new ways of using available data with SC methods for valid causal inference, in particular, new ways of selecting observations from which to estimate the counterfactual.
△ Less
Submitted 14 December, 2021; v1 submitted 10 December, 2021;
originally announced December 2021.
-
Evaluating Large Language Models Trained on Code
Authors:
Mark Chen,
Jerry Tworek,
Heewoo Jun,
Qiming Yuan,
Henrique Ponde de Oliveira Pinto,
Jared Kaplan,
Harri Edwards,
Yuri Burda,
Nicholas Joseph,
Greg Brockman,
Alex Ray,
Raul Puri,
Gretchen Krueger,
Michael Petrov,
Heidy Khlaaf,
Girish Sastry,
Pamela Mishkin,
Brooke Chan,
Scott Gray,
Nick Ryder,
Mikhail Pavlov,
Alethea Power,
Lukasz Kaiser,
Mohammad Bavarian,
Clemens Winter
, et al. (33 additional authors not shown)
Abstract:
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J sol…
▽ More
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.
△ Less
Submitted 14 July, 2021; v1 submitted 7 July, 2021;
originally announced July 2021.
-
Improving non-deterministic uncertainty modelling in Industry 4.0 scheduling
Authors:
Ashwin Misra,
Ankit Mittal,
Vihaan Misra,
Deepanshu Pandey
Abstract:
The latest Industrial revolution has helped industries in achieving very high rates of productivity and efficiency. It has introduced data aggregation and cyber-physical systems to optimize planning and scheduling. Although, uncertainty in the environment and the imprecise nature of human operators are not accurately considered for into the decision making process. This leads to delays in consignm…
▽ More
The latest Industrial revolution has helped industries in achieving very high rates of productivity and efficiency. It has introduced data aggregation and cyber-physical systems to optimize planning and scheduling. Although, uncertainty in the environment and the imprecise nature of human operators are not accurately considered for into the decision making process. This leads to delays in consignments and imprecise budget estimations. This widespread practice in the industrial models is flawed and requires rectification. Various other articles have approached to solve this problem through stochastic or fuzzy set model methods. This paper presents a comprehensive method to logically and realistically quantify the non-deterministic uncertainty through probabilistic uncertainty modelling. This method is applicable on virtually all Industrial data sets, as the model is self adjusting and uses epsilon-contamination to cater to limited or incomplete data sets. The results are numerically validated through an Industrial data set in Flanders, Belgium. The data driven results achieved through this robust scheduling method illustrate the improvement in performance.
△ Less
Submitted 8 January, 2021;
originally announced January 2021.
-
Synthetic Control, Synthetic Interventions, and COVID-19 spread: Exploring the impact of lockdown measures and herd immunity
Authors:
Niloofar Bayat,
Cody Morrin,
Yuheng Wang,
Vishal Misra
Abstract:
The synthetic control method is an empirical methodology forcausal inference using observational data. By observing thespread of COVID-19 throughout the world, we analyze the dataon the number of deaths and cases in different regions usingthe power of prediction, counterfactual analysis, and syntheticinterventions of the synthetic control and its extensions. Weobserve that the number of deaths and…
▽ More
The synthetic control method is an empirical methodology forcausal inference using observational data. By observing thespread of COVID-19 throughout the world, we analyze the dataon the number of deaths and cases in different regions usingthe power of prediction, counterfactual analysis, and syntheticinterventions of the synthetic control and its extensions. Weobserve that the number of deaths and cases in different re-gions would have been much smaller had the lockdowns beenimposed earlier and had the re-openings been done later, es-pecially among indoor bars and restaurants. We also analyzethe speculated impact of herd immunity on the spread giventhe population of each region and show that lockdown policieshave a very strong impact on the spread regardless of the levelof prior infections.
Our most up-to-date code, model, and data can be foundon github: https://github.com/niloofarbayat/COVID19-synthetic-control-analysis
△ Less
Submitted 26 September, 2020; v1 submitted 21 September, 2020;
originally announced September 2020.
-
Simplify-then-Translate: Automatic Preprocessing for Black-Box Machine Translation
Authors:
Sneha Mehta,
Bahareh Azarnoush,
Boris Chen,
Avneesh Saluja,
Vinith Misra,
Ballav Bihani,
Ritwik Kumar
Abstract:
Black-box machine translation systems have proven incredibly useful for a variety of applications yet by design are hard to adapt, tune to a specific domain, or build on top of. In this work, we introduce a method to improve such systems via automatic pre-processing (APP) using sentence simplification. We first propose a method to automatically generate a large in-domain paraphrase corpus through…
▽ More
Black-box machine translation systems have proven incredibly useful for a variety of applications yet by design are hard to adapt, tune to a specific domain, or build on top of. In this work, we introduce a method to improve such systems via automatic pre-processing (APP) using sentence simplification. We first propose a method to automatically generate a large in-domain paraphrase corpus through back-translation with a black-box MT system, which is used to train a paraphrase model that "simplifies" the original sentence to be more conducive for translation. The model is used to preprocess source sentences of multiple low-resource language pairs. We show that this preprocessing leads to better translation performance as compared to non-preprocessed source sentences. We further perform side-by-side human evaluation to verify that translations of the simplified sentences are better than the original ones. Finally, we provide some guidance on recommended language pairs for generating the simplification model corpora by investigating the relationship between ease of translation of a language pair (as measured by BLEU) and quality of the resulting simplification model from back-translations of this language pair (as measured by SARI), and tie this into the downstream task of low-resource translation.
△ Less
Submitted 27 May, 2020; v1 submitted 22 May, 2020;
originally announced May 2020.
-
Zero-Rating and Net Neutrality: Who Wins, Who Loses?
Authors:
Niloofar Bayat,
Richard Ma,
Vishal Misra,
Dan Rubenstein
Abstract:
An objective of network neutrality is that the design of regulations for the Internet will ensure that it remains a public, open platform where innovations can thrive. While there is broad agreement that preserving the content quality of service falls under the purview of net neutrality, the role of differential pricing, especially the practice of \emph {zero-rating} remains controversial. Even th…
▽ More
An objective of network neutrality is that the design of regulations for the Internet will ensure that it remains a public, open platform where innovations can thrive. While there is broad agreement that preserving the content quality of service falls under the purview of net neutrality, the role of differential pricing, especially the practice of \emph {zero-rating} remains controversial. Even though some countries (India, Canada) have banned zero-rating, others have either taken no stance or explicitly allowed it (South Africa, Kenya, U.S.). In this paper, we model zero-rating options available between Internet service providers (ISPs) and content providers (CPs) and use these models to better understand the conditions under which offering zero-rated services are preferred, and who specifically gains in utility. We develop a formulation in which providers' incomes vary, from low-income startups to high-income incumbents, and where their decisions to zero-rate are a variation of the traditional prisoner's dilemma game. We find that if zero-rating is permitted, low-income CPs often lose utility, whereas high-income CPs often gain utility. We also study the competitiveness of the CP markets via the \emph{Herfindahl Index}. Our findings suggest that in most cases the introduction of zero-rating \emph{reduces} competitiveness.
△ Less
Submitted 13 February, 2020;
originally announced March 2020.
-
A Machine Learning Application for Raising WASH Awareness in the Times of COVID-19 Pandemic
Authors:
Rohan Pandey,
Vaibhav Gautam,
Ridam Pal,
Harsh Bandhey,
Lovedeep Singh Dhingra,
Himanshu Sharma,
Chirag Jain,
Kanav Bhagat,
Arushi,
Lajjaben Patel,
Mudit Agarwal,
Samprati Agrawal,
Rishabh Jalan,
Akshat Wadhwa,
Ayush Garg,
Vihaan Misra,
Yashwin Agrawal,
Bhavika Rana,
Ponnurangam Kumaraguru,
Tavpritesh Sethi
Abstract:
Background: The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this Infodemic requires strong health messaging systems that are engaging, vernacular, scalable, effective and c…
▽ More
Background: The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this Infodemic requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation.
Objective: We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages.
Methods: We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. Results: A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot Satya increased thus proving the usefulness of an mHealth platform to mitigate health misinformation.
Conclusion: We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation.
△ Less
Submitted 30 October, 2020; v1 submitted 16 March, 2020;
originally announced March 2020.
-
Down for Failure: Active Power Status Monitoring
Authors:
Niloofar Bayat,
Kunal Mahajan,
Sam Denton,
Vishal Misra,
Dan Rubenstein
Abstract:
Despite society's strong dependence on electricity, power outages remain prevalent. Standard methods for directly measuring power availability are complex, often inaccurate, and are prone to attack. This paper explores an alternative approach to identifying power outages through intelligent monitoring of IP address availability. In finding these outages, we explore the trade-off between the accura…
▽ More
Despite society's strong dependence on electricity, power outages remain prevalent. Standard methods for directly measuring power availability are complex, often inaccurate, and are prone to attack. This paper explores an alternative approach to identifying power outages through intelligent monitoring of IP address availability. In finding these outages, we explore the trade-off between the accuracy of detection and false alarms.
We begin by experimentally demonstrating that static, residential Internet connections serve as good indicators of power, as they are mostly active unless power fails and rarely have battery backups. We construct metrics that dynamically score the reliability of each residential IP, where a higher score indicates a higher correlation between that IP's availability and its regional power. We monitor specifically selected subsets of residential IPs and evaluate the accuracy with which they can indicate current county power status.
Using data gathered during the power outages caused by Hurricane Florence, we demonstrate that we can track power outages at different granularities, state and county, in both sparse and dense regions. By comparing our detection with the reports gathered from power utility companies, we achieve an average detection accuracy of $90\%$, where we also show some of our false alarms and missed outage events could be due to imperfect ground truth data. Therefore, our method can be used as a complementary technique of power outage detection.
△ Less
Submitted 22 November, 2019;
originally announced December 2019.
-
mRSC: Multi-dimensional Robust Synthetic Control
Authors:
Muhummad Amjad,
Vishal Misra,
Devavrat Shah,
Dennis Shen
Abstract:
When evaluating the impact of a policy on a metric of interest, it may not be possible to conduct a randomized control trial. In settings where only observational data is available, Synthetic Control (SC) methods provide a popular data-driven approach to estimate a "synthetic" control by combining measurements of "similar" units (donors). Recently, Robust SC (RSC) was proposed as a generalization…
▽ More
When evaluating the impact of a policy on a metric of interest, it may not be possible to conduct a randomized control trial. In settings where only observational data is available, Synthetic Control (SC) methods provide a popular data-driven approach to estimate a "synthetic" control by combining measurements of "similar" units (donors). Recently, Robust SC (RSC) was proposed as a generalization of SC to overcome the challenges of missing data high levels of noise, while removing the reliance on domain knowledge for selecting donors. However, SC, RSC, and their variants, suffer from poor estimation when the pre-intervention period is too short. As the main contribution, we propose a generalization of unidimensional RSC to multi-dimensional RSC, mRSC. Our proposed mechanism incorporates multiple metrics to estimate a synthetic control, thus overcoming the challenge of poor inference from limited pre-intervention data. We show that the mRSC algorithm with $K$ metrics leads to a consistent estimator of the synthetic control for the target unit under any metric. Our finite-sample analysis suggests that the prediction error decays to zero at a rate faster than the RSC algorithm by a factor of $K$ and $\sqrt{K}$ for the training and testing periods (pre- and post-intervention), respectively. Additionally, we provide a diagnostic test that evaluates the utility of including additional metrics. Moreover, we introduce a mechanism to validate the performance of mRSC: time series prediction. That is, we propose a method to predict the future evolution of a time series based on limited data when the notion of time is relative and not absolute, i.e., we have access to a donor pool that has undergone the desired future evolution. Finally, we conduct experimentation to establish the efficacy of mRSC on synthetic data and two real-world case studies (retail and Cricket).
△ Less
Submitted 23 September, 2019; v1 submitted 15 May, 2019;
originally announced May 2019.
-
Bernoulli Embeddings for Graphs
Authors:
Vinith Misra,
Sumit Bhatia
Abstract:
Just as semantic hashing can accelerate information retrieval, binary valued embeddings can significantly reduce latency in the retrieval of graphical data. We introduce a simple but effective model for learning such binary vectors for nodes in a graph. By imagining the embeddings as independent coin flips of varying bias, continuous optimization techniques can be applied to the approximate expect…
▽ More
Just as semantic hashing can accelerate information retrieval, binary valued embeddings can significantly reduce latency in the retrieval of graphical data. We introduce a simple but effective model for learning such binary vectors for nodes in a graph. By imagining the embeddings as independent coin flips of varying bias, continuous optimization techniques can be applied to the approximate expected loss. Embeddings optimized in this fashion consistently outperform the quantization of both spectral graph embeddings and various learned real-valued embeddings, on both ranking and pre-ranking tasks for a variety of datasets.
△ Less
Submitted 25 March, 2018;
originally announced March 2018.
-
Delay Bounds for Multiclass FIFO
Authors:
Yuming Jiang,
Vishal Misra
Abstract:
FIFO is perhaps the simplest scheduling discipline. For single-class FIFO, its delay guarantee performance has been extensively studied: The well-known results include a stochastic delay bound for $GI/GI/1$ by Kingman and a deterministic delay bound for $D/D/1$ by Cruz. However, for multiclass FIFO, few such results are available. To fill the gap, we prove delay bounds for multiclass FIFO in this…
▽ More
FIFO is perhaps the simplest scheduling discipline. For single-class FIFO, its delay guarantee performance has been extensively studied: The well-known results include a stochastic delay bound for $GI/GI/1$ by Kingman and a deterministic delay bound for $D/D/1$ by Cruz. However, for multiclass FIFO, few such results are available. To fill the gap, we prove delay bounds for multiclass FIFO in this work, considering both deterministic and stochastic cases. Specifically, delay bounds are presented for multiclass D/D/1, GI/GI/1 and G/G/1. In addition, examples are provided for several basic settings to demonstrate the obtained bounds in more explicit forms, which are also compared with simulation results.
△ Less
Submitted 25 August, 2017; v1 submitted 18 May, 2016;
originally announced May 2016.
-
On the Evolution of the Internet Economic Ecosystem
Authors:
Richard T. B. Ma,
John C. S. Lui,
Vishal Misra
Abstract:
The evolution of the Internet has manifested itself in many ways: the traffic characteristics, the interconnection topologies and the business relationships among the autonomous components. It is important to understand why (and how) this evolution came about, and how the interplay of these dynamics may affect future evolution and services. We propose a network aware, macroscopic model that captur…
▽ More
The evolution of the Internet has manifested itself in many ways: the traffic characteristics, the interconnection topologies and the business relationships among the autonomous components. It is important to understand why (and how) this evolution came about, and how the interplay of these dynamics may affect future evolution and services. We propose a network aware, macroscopic model that captures the characteristics and interactions of the application and network providers, and show how it leads to a market equilibrium of the ecosystem. By analyzing the driving forces and the dynamics of the market equilibrium, we obtain some fundamental understandings of the cause and effect of the Internet evolution, which explain why some historical and recent evolutions have happened. Furthermore, by projecting the likely future evolutions, our model can help application and network providers to make informed business decisions so as to succeed in this competitive ecosystem.
△ Less
Submitted 25 November, 2012;
originally announced November 2012.
-
Distributed Functional Scalar Quantization Simplified
Authors:
John Z. Sun,
Vinith Misra,
Vivek K Goyal
Abstract:
Distributed functional scalar quantization (DFSQ) theory provides optimality conditions and predicts performance of data acquisition systems in which a computation on acquired data is desired. We address two limitations of previous works: prohibitively expensive decoder design and a restriction to sources with bounded distributions. We rigorously show that a much simpler decoder has equivalent asy…
▽ More
Distributed functional scalar quantization (DFSQ) theory provides optimality conditions and predicts performance of data acquisition systems in which a computation on acquired data is desired. We address two limitations of previous works: prohibitively expensive decoder design and a restriction to sources with bounded distributions. We rigorously show that a much simpler decoder has equivalent asymptotic performance as the conditional expectation estimator previously explored, thus reducing decoder design complexity. The simpler decoder has the feature of decoupled communication and computation blocks. Moreover, we extend the DFSQ framework with the simpler decoder to acquire sources with infinite-support distributions such as Gaussian or exponential distributions. Finally, through simulation results we demonstrate that performance at moderate coding rates is well predicted by the asymptotic analysis, and we give new insight on the rate of convergence.
△ Less
Submitted 6 June, 2012;
originally announced June 2012.
-
The Porosity of Additive Noise Sequences
Authors:
Vinith Misra,
Tsachy Weissman
Abstract:
Consider a binary additive noise channel with noiseless feedback. When the noise is a stationary and ergodic process $\mathbf{Z}$, the capacity is $1-\mathbb{H}(\mathbf{Z})$ ($\mathbb{H}(\cdot)$ denoting the entropy rate). It is shown analogously that when the noise is a deterministic sequence $z^\infty$, the capacity under finite-state encoding and decoding is $1-\barρ(z^\infty)$, where…
▽ More
Consider a binary additive noise channel with noiseless feedback. When the noise is a stationary and ergodic process $\mathbf{Z}$, the capacity is $1-\mathbb{H}(\mathbf{Z})$ ($\mathbb{H}(\cdot)$ denoting the entropy rate). It is shown analogously that when the noise is a deterministic sequence $z^\infty$, the capacity under finite-state encoding and decoding is $1-\barρ(z^\infty)$, where $\barρ(\cdot)$ is Lempel and Ziv's finite-state compressibility. This quantity is termed the \emph{porosity} $\underlineσ(\cdot)$ of an individual noise sequence. A sequence of schemes are presented that universally achieve porosity for any noise sequence. These converse and achievability results may be interpreted both as a channel-coding counterpart to Ziv and Lempel's work in universal source coding, as well as an extension to the work by Lomnitz and Feder and Shayevitz and Feder on communication across modulo-additive channels. Additionally, a slightly more practical architecture is suggested that draws a connection with finite-state predictability, as introduced by Feder, Gutman, and Merhav.
△ Less
Submitted 31 May, 2012;
originally announced May 2012.
-
Evidence of market manipulation in the financial crisis
Authors:
Vedant Misra,
Marco Lagi,
Yaneer Bar-Yam
Abstract:
We provide direct evidence of market manipulation at the beginning of the financial crisis in November 2007. The type of manipulation, a "bear raid," would have been prevented by a regulation that was repealed by the Securities and Exchange Commission in July 2007. The regulation, the uptick rule, was designed to prevent manipulation and promote stability and was in force from 1938 as a key part o…
▽ More
We provide direct evidence of market manipulation at the beginning of the financial crisis in November 2007. The type of manipulation, a "bear raid," would have been prevented by a regulation that was repealed by the Securities and Exchange Commission in July 2007. The regulation, the uptick rule, was designed to prevent manipulation and promote stability and was in force from 1938 as a key part of the government response to the 1929 market crash and its aftermath. On November 1, 2007, Citigroup experienced an unusual increase in trading volume and decrease in price. Our analysis of financial industry data shows that this decline coincided with an anomalous increase in borrowed shares, the selling of which would be a large fraction of the total trading volume. The selling of borrowed shares cannot be explained by news events as there is no corresponding increase in selling by share owners. A similar number of shares were returned on a single day six days later. The magnitude and coincidence of borrowing and returning of shares is evidence of a concerted effort to drive down Citigroup's stock price and achieve a profit, i.e., a bear raid. Interpretations and analyses of financial markets should consider the possibility that the intentional actions of individual actors or coordinated groups can impact market behavior. Markets are not sufficiently transparent to reveal even major market manipulation events. Our results point to the need for regulations that prevent intentional actions that cause markets to deviate from equilibrium and contribute to crashes. Enforcement actions cannot reverse severe damage to the economic system. The current "alternative" uptick rule which is only in effect for stocks dropping by over 10% in a single day is insufficient. Prevention may be achieved through improved availability of market data and the original uptick rule or other transaction limitations.
△ Less
Submitted 3 January, 2012; v1 submitted 13 December, 2011;
originally announced December 2011.
-
The Public Option: a Non-regulatory Alternative to Network Neutrality
Authors:
Richard T. B. Ma,
Vishal Misra
Abstract:
Network neutrality and the role of regulation on the Internet have been heavily debated in recent times. Amongst the various definitions of network neutrality, we focus on the one which prohibits paid prioritization of content and we present an analytical treatment of the topic. We develop a model of the Internet ecosystem in terms of three primary players: consumers, ISPs and content providers. O…
▽ More
Network neutrality and the role of regulation on the Internet have been heavily debated in recent times. Amongst the various definitions of network neutrality, we focus on the one which prohibits paid prioritization of content and we present an analytical treatment of the topic. We develop a model of the Internet ecosystem in terms of three primary players: consumers, ISPs and content providers. Our analysis looks at this issue from the point of view of the consumer, and we describe the desired state of the system as one which maximizes consumer surplus. By analyzing different scenarios of monopoly and competition, we obtain different conclusions on the desirability of regulation. We also introduce the notion of a Public Option ISP, an ISP that carries traffic in a network neutral manner. Our major findings are (i) in a monopolistic scenario, network neutral regulations benefit consumers; however, the introduction of a Public Option ISP is even better for consumers, as it aligns the interests of the monopolistic ISP with the consumer surplus and (ii) in an oligopolistic situation, the presence of a Public Option ISP is again preferable to network neutral regulations, although the presence of competing price-discriminating ISPs provides the most desirable situation for the consumers.
△ Less
Submitted 1 July, 2011; v1 submitted 16 June, 2011;
originally announced June 2011.
-
Rational Orbits around Charged Black Holes
Authors:
Vedant Misra,
Janna Levin
Abstract:
We show that all eccentric timelike orbits in Reissner-Nordström spacetime can be classified using a taxonomy that draws upon an isomorphism between periodic orbits and the set of rational numbers. By virtue of the fact that the rationals are dense, the taxonomy can be used to approximate aperiodic orbits with periodic orbits. This may help reduce computational overhead for calculations in gravita…
▽ More
We show that all eccentric timelike orbits in Reissner-Nordström spacetime can be classified using a taxonomy that draws upon an isomorphism between periodic orbits and the set of rational numbers. By virtue of the fact that the rationals are dense, the taxonomy can be used to approximate aperiodic orbits with periodic orbits. This may help reduce computational overhead for calculations in gravitational wave astronomy. Our dynamical systems approach enables us to study orbits for both charged and uncharged particles in spite of the fact that charged particle orbits around a charged black hole do not admit a simple one-dimensional effective potential description. Finally, we show that comparing periodic orbits in the RN and Schwarzschild geometries enables us to distinguish charged and uncharged spacetimes by looking only at the orbital dynamics.
△ Less
Submitted 16 July, 2010;
originally announced July 2010.
-
Distributed Scalar Quantization for Computing: High-Resolution Analysis and Extensions
Authors:
Vinith Misra,
Vivek K Goyal,
Lav R. Varshney
Abstract:
Communication of quantized information is frequently followed by a computation. We consider situations of \emph{distributed functional scalar quantization}: distributed scalar quantization of (possibly correlated) sources followed by centralized computation of a function. Under smoothness conditions on the sources and function, companding scalar quantizer designs are developed to minimize mean-squ…
▽ More
Communication of quantized information is frequently followed by a computation. We consider situations of \emph{distributed functional scalar quantization}: distributed scalar quantization of (possibly correlated) sources followed by centralized computation of a function. Under smoothness conditions on the sources and function, companding scalar quantizer designs are developed to minimize mean-squared error (MSE) of the computed function as the quantizer resolution is allowed to grow. Striking improvements over quantizers designed without consideration of the function are possible and are larger in the entropy-constrained setting than in the fixed-rate setting. As extensions to the basic analysis, we characterize a large class of functions for which regular quantization suffices, consider certain functions for which asymptotic optimality is achieved without arbitrarily fine quantization, and allow limited collaboration between source encoders. In the entropy-constrained setting, a single bit per sample communicated between encoders can have an arbitrarily-large effect on functional distortion. In contrast, such communication has very little effect in the fixed-rate setting.
△ Less
Submitted 12 May, 2011; v1 submitted 21 November, 2008;
originally announced November 2008.
-
High Temperature Ferromagnetism in Zn1-xMnxO semiconductor thin films
Authors:
Nikoleta Theodoropoulou,
Vinith Misra,
John Philip,
Patrick LeClair,
Geetha P. Berera,
Jagadeesh S. Moodera,
Biswarup Satpati,
Tapobrata Som
Abstract:
Clear evidence of ferromagnetic behavior at temperatures >400 K as well as spin polarization of the charge carriers have been observed in ZnMnO thin films grown on Al2O3 and MgO substrates. The magnetic properties depended on the exact Mn concentration and the growth parameters. In well-characterized single-phase films, the magnetic moment is 4.8?B/Mn at 350 K, the highest moment yet reported fo…
▽ More
Clear evidence of ferromagnetic behavior at temperatures >400 K as well as spin polarization of the charge carriers have been observed in ZnMnO thin films grown on Al2O3 and MgO substrates. The magnetic properties depended on the exact Mn concentration and the growth parameters. In well-characterized single-phase films, the magnetic moment is 4.8?B/Mn at 350 K, the highest moment yet reported for any Mn doped magnetic semiconductor. Anomalous Hall effect shows that the charge carriers (electrons) are spin polarized and participate in the observed ferromagnetic behavior.
△ Less
Submitted 12 August, 2004;
originally announced August 2004.