Skip to main content

Showing 1–50 of 83 results for author: Chaudhary, V

  1. arXiv:2410.12999  [pdf, other

    cs.CL

    POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization

    Authors: Batuhan K. Karaman, Ishmam Zabir, Alon Benhaim, Vishrav Chaudhary, Mert R. Sabuncu, Xia Song

    Abstract: Balancing safety and usefulness in large language models has become a critical challenge in recent years. Models often exhibit unsafe behavior or adopt an overly cautious approach, leading to frequent overrefusal of benign prompts, which reduces their usefulness. Addressing these issues requires methods that maintain safety while avoiding overrefusal. In this work, we examine how the overgeneratio… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  2. arXiv:2410.12883  [pdf, other

    cs.CL cs.LG

    Scaling Laws for Multilingual Language Models

    Authors: Yifei He, Alon Benhaim, Barun Patra, Praneetha Vaddamanu, Sanchit Ahuja, Parul Chopra, Vishrav Chaudhary, Han Zhao, Xia Song

    Abstract: We propose a novel scaling law for general-purpose decoder-only language models (LMs) trained on multilingual data, addressing the problem of balancing languages during multilingual pretraining. A primary challenge in studying multilingual scaling is the difficulty of analyzing individual language performance due to cross-lingual transfer. To address this, we shift the focus from individual langua… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  3. arXiv:2410.05331  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion

    Authors: Guanchu Wang, Yu-Neng Chuang, Ruixiang Tang, Shaochen Zhong, Jiayi Yuan, Hongye Jin, Zirui Liu, Vipin Chaudhary, Shuai Xu, James Caverlee, Xia Hu

    Abstract: Ensuring the security of released large language models (LLMs) poses a significant dilemma, as existing mechanisms either compromise ownership rights or raise data privacy concerns. To address this dilemma, we introduce TaylorMLP to protect the ownership of released LLMs and prevent their abuse. Specifically, TaylorMLP preserves the ownership of LLMs by transforming the weights of LLMs into parame… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

  4. arXiv:2410.01322  [pdf, other

    cs.LG cs.AI cs.CV cs.IT

    Forte : Finding Outliers with Representation Typicality Estimation

    Authors: Debargha Ganguly, Warren Morningstar, Andrew Yu, Vipin Chaudhary

    Abstract: Generative models can now produce photorealistic synthetic data which is virtually indistinguishable from the real data used to train it. This is a significant evolution over previous models which could produce reasonable facsimiles of the training data, but ones which could be visually distinguished from the training data by human evaluation. Recent work on OOD detection has raised doubts that ge… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  5. arXiv:2409.19913  [pdf, other

    cs.LG cs.AI cs.CL

    Scaling Optimal LR Across Token Horizons

    Authors: Johan Bjorck, Alon Benhaim, Vishrav Chaudhary, Furu Wei, Xia Song

    Abstract: State-of-the-art LLMs are powered by scaling -- scaling model size, dataset size and cluster size. It is economically infeasible to extensively tune hyperparameter for the largest runs. Instead, approximately optimal hyperparameters must be inferred or \textit{transferred} from smaller experiments. Hyperparameter transfer across model sizes has been studied in Yang et al. However, hyperparameter t… ▽ More

    Submitted 2 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

  6. arXiv:2409.18235  [pdf, other

    cs.CV cs.LG

    Visual Concept Networks: A Graph-Based Approach to Detecting Anomalous Data in Deep Neural Networks

    Authors: Debargha Ganguly, Debayan Gupta, Vipin Chaudhary

    Abstract: Deep neural networks (DNNs), while increasingly deployed in many applications, struggle with robustness against anomalous and out-of-distribution (OOD) data. Current OOD benchmarks often oversimplify, focusing on single-object tasks and not fully representing complex real-world anomalies. This paper introduces a new, straightforward method employing graph structures and topological features to eff… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  7. arXiv:2409.17270  [pdf, other

    cs.AI cs.CL cs.LG cs.LO cs.NE

    Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning

    Authors: Debargha Ganguly, Srinivasan Iyengar, Vipin Chaudhary, Shivkumar Kalyanaraman

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing, yet they struggle with inconsistent reasoning, particularly in novel domains and complex logical sequences. This research introduces Proof of Thought, a framework that enhances the reliability and transparency of LLM outputs. Our approach bridges LLM-generated ideas with formal logic verification, employing a custom inte… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  8. arXiv:2409.12136  [pdf, other

    cs.CL cs.AI cs.LG

    GRIN: GRadient-INformed MoE

    Authors: Liyuan Liu, Young Jin Kim, Shuohang Wang, Chen Liang, Yelong Shen, Hao Cheng, Xiaodong Liu, Masahiro Tanaka, Xiaoxia Wu, Wenxiang Hu, Vishrav Chaudhary, Zeqi Lin, Chenruidong Zhang, Jilong Xue, Hany Awadalla, Jianfeng Gao, Weizhu Chen

    Abstract: Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules. However, sparse computation challenges traditional training practices, as discrete expert routing hinders standard backpropagation and thus gradient-based optimization, which are the cornerstone of deep learning. To… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: 58 pages

  9. arXiv:2408.04762  [pdf, other

    cs.CV

    Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2

    Authors: Andrew Seohwan Yu, Mohsen Hariri, Xuecen Zhang, Mingrui Yang, Vipin Chaudhary, Xiaojuan Li

    Abstract: Intelligent medical image segmentation methods are rapidly evolving and being increasingly applied, yet they face the challenge of domain transfer, where algorithm performance degrades due to different data distributions between source and target domains. To address this, we introduce a method for zero-shot, single-prompt segmentation of 3D knee MRI by adapting Segment Anything Model 2 (SAM2), a g… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  10. arXiv:2407.17678  [pdf, other

    cs.CL

    S2-Attention: Hardware-Aware Context Sharding Among Attention Heads

    Authors: Xihui Lin, Yunan Zhang, Suyu Ge, Liliang Ren, Barun Patra, Vishrav Chaudhary, Hao Peng, Xia Song

    Abstract: Sparse attention, which selectively attends to a subset of tokens in the context was supposed to be efficient. However, its theoretical reduction in FLOPs has rarely translated into wall-clock speed-up over its dense attention counterparts due to the lack of hardware-aware optimizations like FlashAttention. Meanwhile, it remains unclear whether sparse attention can maintain the model's quality at… ▽ More

    Submitted 6 October, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: 10 pages

  11. arXiv:2407.15229  [pdf, other

    cs.CL cs.AI

    The Hitchhiker's Guide to Human Alignment with *PO

    Authors: Kian Ahrabian, Xihui Lin, Barun Patra, Vishrav Chaudhary, Alon Benhaim, Jay Pujara, Xia Song

    Abstract: With the growing utilization of large language models (LLMs) across domains, alignment towards human preferences has become one of the most critical aspects of training models. At the forefront of state-of-the-art human alignment methods are preference optimization methods (*PO). However, prior research has often concentrated on identifying the best-performing method, typically involving a grid se… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 10 pages

  12. arXiv:2407.09879  [pdf, other

    cs.CL

    sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting

    Authors: Sanchit Ahuja, Kumar Tanmay, Hardik Hansrajbhai Chauhan, Barun Patra, Kriti Aggarwal, Luciano Del Corro, Arindam Mitra, Tejas Indulal Dhamecha, Ahmed Awadallah, Monojit Choudhary, Vishrav Chaudhary, Sunayana Sitaram

    Abstract: Despite the remarkable success of LLMs in English, there is a significant gap in performance in non-English languages. In order to address this, we introduce a novel recipe for creating a multilingual synthetic instruction tuning dataset, sPhinX, which is created by selectively translating instruction response pairs from English into 50 languages. We test the effectiveness of sPhinx by using it to… ▽ More

    Submitted 16 October, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

    Comments: 20 pages, 12 tables, 5 figures

  13. arXiv:2407.09004  [pdf, other

    cs.CR

    Privacy-Preserving Collaborative Genomic Research: A Real-Life Deployment and Vision

    Authors: Zahra Rahmani, Nahal Shahini, Nadav Gat, Zebin Yun, Yuzhou Jiang, Ofir Farchy, Yaniv Harel, Vipin Chaudhary, Mahmood Sharif, Erman Ayday

    Abstract: The data revolution holds significant promise for the health sector. Vast amounts of data collected from individuals will be transformed into knowledge, AI models, predictive systems, and best practices. One area of health that stands to benefit greatly is the genomic domain. Progress in AI, machine learning, and data science has opened new opportunities for genomic research, promising breakthroug… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally to this work. Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

  14. arXiv:2407.01527  [pdf, other

    cs.CL

    KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

    Authors: Jiayi Yuan, Hongyi Liu, Shaochen Zhong, Yu-Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye Jin, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu

    Abstract: Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the… ▽ More

    Submitted 8 October, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  15. arXiv:2406.15029  [pdf

    physics.app-ph cond-mat.mtrl-sci

    Harvesting magneto-acoustic waves using magnetic two-dimensional chromium telluride (CrTe3)

    Authors: Chinmayee Chowde Gowda, Alexey Kartsev, Nishant Tiwari, Suman Sarkar, Safronov A. A, Varun Chaudhary, Chandra Sekhar Tiwary

    Abstract: A vast majority of electrical devices have integrated magnetic units, which generate constant magnetic fields with noticeable vibrations. The majority of existing nanogenerators acquire energy through friction/mechanical forces and most of these instances overlook acoustic vibrations and magnetic fields. Magnetic two-dimensional (2D) tellurides present a wide range of possibilities for devising a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  16. arXiv:2406.00343  [pdf, other

    cs.CL

    Beyond Metrics: Evaluating LLMs' Effectiveness in Culturally Nuanced, Low-Resource Real-World Scenarios

    Authors: Millicent Ochieng, Varun Gumma, Sunayana Sitaram, Jindong Wang, Vishrav Chaudhary, Keshet Ronen, Kalika Bali, Jacki O'Neill

    Abstract: The deployment of Large Language Models (LLMs) in real-world applications presents both opportunities and challenges, particularly in multilingual and code-mixed communication settings. This research evaluates the performance of seven leading LLMs in sentiment analysis on a dataset derived from multilingual and code-mixed WhatsApp chats, including Swahili, English and Sheng. Our evaluation include… ▽ More

    Submitted 13 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  17. arXiv:2404.14457  [pdf

    cs.LG

    Graph Coloring Using Heat Diffusion

    Authors: Vivek Chaudhary

    Abstract: Graph coloring is a problem with varied applications in industry and science such as scheduling, resource allocation, and circuit design. The purpose of this paper is to establish if a new gradient based iterative solver framework known as heat diffusion can solve the graph coloring problem. We propose a solution to the graph coloring problem using the heat diffusion framework. We compare the solu… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 5 Pages, 3 Figures

    MSC Class: 05

  18. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai , et al. (104 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version… ▽ More

    Submitted 30 August, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 24 pages

  19. arXiv:2404.05985  [pdf

    cs.CR cs.LG

    Boosting Digital Safeguards: Blending Cryptography and Steganography

    Authors: Anamitra Maiti, Subham Laha, Rishav Upadhaya, Soumyajit Biswas, Vikas Chaudhary, Biplab Kar, Nikhil Kumar, Jaydip Sen

    Abstract: In today's digital age, the internet is essential for communication and the sharing of information, creating a critical need for sophisticated data security measures to prevent unauthorized access and exploitation. Cryptography encrypts messages into a cipher text that is incomprehensible to unauthorized readers, thus safeguarding data during its transmission. Steganography, on the other hand, ori… ▽ More

    Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: This report pertains to the Capstone Project done by Group 3 of the Fall batch of 2023 students at Praxis Tech School, Kolkata, India. The reports consists of 36 pages and it includes 11 figures and 5 tables

  20. arXiv:2402.01441  [pdf, ps, other

    q-fin.TR cs.LG

    Learning the Market: Sentiment-Based Ensemble Trading Agents

    Authors: Andrew Ye, James Xu, Yi Wang, Yifan Yu, Daniel Yan, Ryan Chen, Bosheng Dong, Vipin Chaudhary, Shuai Xu

    Abstract: We propose the integration of sentiment analysis and deep-reinforcement learning ensemble algorithms for stock trading, and design a strategy capable of dynamically altering its employed agent given concurrent market sentiment. In particular, we create a simple-yet-effective method for extracting news sentiment and combine this with general improvements upon existing works, resulting in automated… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  21. arXiv:2401.02416  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    ODIN: A Single Model for 2D and 3D Segmentation

    Authors: Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

    Abstract: State-of-the-art models on contemporary 3D segmentation benchmarks like ScanNet consume and label dataset-provided 3D point clouds, obtained through post processing of sensed multiview RGB-D images. They are typically trained in-domain, forego large-scale 2D pre-training and outperform alternatives that featurize the posed RGB-D multiview images instead. The gap in performance between methods that… ▽ More

    Submitted 25 June, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: Camera Ready (CVPR 2024, Highlight)

  22. arXiv:2312.14199  [pdf, other

    cs.CR

    Report on 2023 CyberTraining PI Meeting, 26-27 September 2023

    Authors: Geoffrey Fox, Mary P Thomas, Sajal Bhatia, Marisa Brazil, Nicole M Gasparini, Venkatesh Mohan Merwade, Henry J. Neeman, Jeff Carver, Henri Casanova, Vipin Chaudhary, Dirk Colbry, Lonnie Crosby, Prasun Dewan, Jessica Eisma, Nicole M Gasparini, Ahmed Irfan, Kate Kaehey, Qianqian Liu, Zhen Ni, Sushil Prasad, Apan Qasem, Erik Saule, Prabha Sundaravadivel, Karen Tomko

    Abstract: This document describes a two-day meeting held for the Principal Investigators (PIs) of NSF CyberTraining grants. The report covers invited talks, panels, and six breakout sessions. The meeting involved over 80 PIs and NSF program managers (PMs). The lessons recorded in detail in the report are a wealth of information that could help current and future PIs, as well as NSF PMs, understand the futur… ▽ More

    Submitted 28 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 38 pages, 3 main sections and 2 Appendix sections, 2 figures, 19 tables; updated version: author corrections

  23. arXiv:2312.06877  [pdf

    cs.LG

    A Novel Differentiable Loss Function for Unsupervised Graph Neural Networks in Graph Partitioning

    Authors: Vivek Chaudhary

    Abstract: In this paper, we explore the graph partitioning problem, a pivotal combina-torial optimization challenge with extensive applications in various fields such as science, technology, and business. Recognized as an NP-hard prob-lem, graph partitioning lacks polynomial-time algorithms for its resolution. Recently, there has been a burgeoning interest in leveraging machine learn-ing, particularly appro… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 2 Tables, 2 Figures

    ACM Class: I.2.8

  24. arXiv:2312.02073  [pdf, other

    cs.CL

    A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia

    Authors: Giovanni Monea, Maxime Peyrard, Martin Josifoski, Vishrav Chaudhary, Jason Eisner, Emre Kıcıman, Hamid Palangi, Barun Patra, Robert West

    Abstract: Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context. Yet the mechanisms underlying this contextual grounding remain unknown, especially in situations where contextual information contradicts factual knowledge stored in the parameters, which LLMs also excel at recalling. Favoring the contextual information is critical for retrieval-augmente… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted at ACL 2024 (main conference)

  25. arXiv:2311.01460  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Implicit Chain of Thought Reasoning via Knowledge Distillation

    Authors: Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber

    Abstract: To augment language models with the ability to reason, researchers usually prompt or finetune them to produce chain of thought reasoning steps before producing the final answer. However, although people use natural language to reason effectively, it may be that LMs could reason more effectively with some intermediate computation that is not in natural language. In this work, we explore an alternat… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  26. arXiv:2310.07782  [pdf, other

    cs.CV

    An automated approach for improving the inference latency and energy efficiency of pretrained CNNs by removing irrelevant pixels with focused convolutions

    Authors: Caleb Tung, Nicholas Eliopoulos, Purvish Jajal, Gowri Ramshankar, Chen-Yun Yang, Nicholas Synovic, Xuecen Zhang, Vipin Chaudhary, George K. Thiruvathukal, Yung-Hsiang Lu

    Abstract: Computer vision often uses highly accurate Convolutional Neural Networks (CNNs), but these deep learning models are associated with ever-increasing energy and computation requirements. Producing more energy-efficient CNNs often requires model training which can be cost-prohibitive. We propose a novel, automated method to make a pretrained CNN more energy-efficient without re-training. Given a pret… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  27. arXiv:2308.10153  [pdf, other

    quant-ph

    Online Detection of Golden Circuit Cutting Points

    Authors: Daniel T. Chen, Ethan H. Hansen, Xinpeng Li, Aaron Orenstein, Vinooth Kulkarni, Vipin Chaudhary, Qiang Guan, Ji Liu, Yang Zhang, Shuai Xu

    Abstract: Quantum circuit cutting has emerged as a promising method for simulating large quantum circuits using a collection of small quantum machines. Running low-qubit "circuit fragments" not only overcomes the size limitation of near-term hardware, but it also increases the fidelity of the simulation. However, reconstructing measurement statistics requires computational resources - both classical and qua… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

  28. arXiv:2305.15265  [pdf, other

    cs.LG cs.CL

    Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model

    Authors: Zirui Liu, Guanchu Wang, Shaochen Zhong, Zhaozhuo Xu, Daochen Zha, Ruixiang Tang, Zhimeng Jiang, Kaixiong Zhou, Vipin Chaudhary, Shuai Xu, Xia Hu

    Abstract: With the rapid growth in model size, fine-tuning the large pre-trained language model has become increasingly difficult due to its extensive memory usage. Previous works usually focus on reducing the number of trainable parameters in the network. While the model parameters do contribute to memory usage, the primary memory bottleneck during training arises from storing feature maps, also known as a… ▽ More

    Submitted 9 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  29. arXiv:2305.14218  [pdf, other

    cs.CV cs.AI

    DUBLIN -- Document Understanding By Language-Image Network

    Authors: Kriti Aggarwal, Aditi Khandelwal, Kumar Tanmay, Owais Mohammed Khan, Qiang Liu, Monojit Choudhury, Hardik Hansrajbhai Chauhan, Subhojit Som, Vishrav Chaudhary, Saurabh Tiwary

    Abstract: Visual document understanding is a complex task that involves analyzing both the text and the visual elements in document images. Existing models often rely on manual feature engineering or domain-specific pipelines, which limit their generalization ability across different document types and languages. In this paper, we propose DUBLIN, which is pretrained on web pages using three novel objectives… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    ACM Class: F.2.2; I.2.7

  30. arXiv:2304.04093  [pdf, other

    quant-ph

    Efficient Quantum Circuit Cutting by Neglecting Basis Elements

    Authors: Daniel T. Chen, Ethan H. Hansen, Xinpeng Li, Vinooth Kulkarni, Vipin Chaudhary, Bin Ren, Qiang Guan, Sanmukh Kuppannagari, Ji Liu, Shuai Xu

    Abstract: Quantum circuit cutting has been proposed to help execute large quantum circuits using only small and noisy machines. Intuitively, cutting a qubit wire can be thought of as classically passing information of a quantum state along each element in a basis set. As the number of cuts increase, the number of quantum degrees of freedom needed to be passed through scales exponentially. We propose a simpl… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Comments: 7 pages, 5 figures, submitted to 37th IEEE International Parallel & Distributed Processing Symposium

  31. arXiv:2302.14045  [pdf, other

    cs.CL cs.CV

    Language Is Not All You Need: Aligning Perception with Language Models

    Authors: Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei

    Abstract: A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal co… ▽ More

    Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  32. arXiv:2302.03335  [pdf, ps, other

    cs.IT

    Low-Latency Communication using Delay-Aware Relays Against Reactive Adversaries

    Authors: Vivek Chaudhary, J. Harshan

    Abstract: This work addresses a reactive jamming attack on the low-latency messages of a victim, wherein the jammer deploys countermeasure detection mechanisms to change its strategy. We highlight that the existing schemes against reactive jammers use relays with instantaneous full-duplex (FD) radios to evade the attack. However, due to the limitation of the radio architecture of the FD helper, instantaneou… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: 30 pages

  33. arXiv:2301.12004  [pdf, other

    cs.CL

    Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation

    Authors: Jessica Huynh, Cathy Jiao, Prakhar Gupta, Shikib Mehri, Payal Bajaj, Vishrav Chaudhary, Maxine Eskenazi

    Abstract: Language models have steadily increased in size over the past few years. They achieve a high level of performance on various natural language processing (NLP) tasks such as question answering and summarization. Large language models (LLMs) have been used for generation and can now output human-like text. Due to this, there are other downstream tasks in the realm of dialog that can now harness the… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Comments: Accepted for publication at IWSDS 2023

  34. arXiv:2301.05493  [pdf, other

    cond-mat.mtrl-sci

    Spin and current transport in the robust half-metallic magnet $c$-CoFeGe

    Authors: Vikrant Chaudhary, Sapna Singh, Deepak Gujjar, Tashi Nautiyal, Tulika Maitra, Jeroen van den Brink, Hem C. Kandpal

    Abstract: Spintronics is an emerging form of electronics based on the electrons' spin degree of freedom for which materials with robust half-metallic ferromagnet (HMF) character are very attractive. Here we determine the structural stability, electronic, magnetic, and mechanical properties of the half-Heusler (hH) compound CoFeGe, in particular also in its cubic form. The first-principles calculations sugge… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

    Comments: 8 pages, 6 figures, and 2 tables

    Journal ref: Journal of Physics: Condensed Matter, 35 (2023) 285502

  35. Effect of hydrostatic pressure and alloying on thermoelectric properties of van der Waals solid KMgSb: An \textit{ab-initio} study

    Authors: Vikrant Chaudhary, Tulika Maitra, Tashi Nautiyal, Jeroen van den Brink, Hem C. Kandpal

    Abstract: Through a combined first-principles and Boltzmann transport theory, we systematically investigate the thermal and electrical transport properties of the unexplored ternary quasi two-dimensional KMgSb system of KMgX (X = P, As, Sb, and Bi) family. Herein, the transport properties of KMgSb under the application of hydrostatic pressure and alloy engineering are reported. At a carrier concentration of… ▽ More

    Submitted 9 June, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: 10 pages, 8 figures, and Supplementary Information

    Journal ref: Physcial Review Materials, 2023

  36. arXiv:2212.10554  [pdf, other

    cs.CL

    A Length-Extrapolatable Transformer

    Authors: Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei

    Abstract: Position modeling plays a critical role in Transformers. In this paper, we focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We define attention resolution as an indicator of extrapolation. Then we propose two designs to improve the above metric of Transformers. Specifically, we introduce a relative position embedding to explicitly maximize attention r… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: 9 pages

  37. arXiv:2212.01270  [pdf, other

    quant-ph

    Approximate Quantum Circuit Cutting

    Authors: Daniel Chen, Betis Baheri, Vipin Chaudhary, Qiang Guan, Ning Xie, Shuai Xu

    Abstract: Current and imminent quantum hardware lacks reliability and applicability due to noise and limited qubit counts. Quantum circuit cutting -- a technique dividing large quantum circuits into smaller subcircuits with sizes appropriate for the limited quantum resource at hand -- is used to mitigate these problems. However, classical postprocessing involved in circuit cutting generally grows exponentia… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  38. arXiv:2211.13184  [pdf, other

    cs.LG cs.CL

    TorchScale: Transformers at Scale

    Authors: Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei

    Abstract: Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Work in progress

  39. arXiv:2211.09110  [pdf, other

    cs.CL cs.AI cs.LG

    Holistic Evaluation of Language Models

    Authors: Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao , et al. (25 additional authors not shown)

    Abstract: Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models. First, we taxonomize the vast space of potential scenarios (i.e. use cases) and metrics (i.e. desiderata) that are of interest fo… ▽ More

    Submitted 1 October, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Project page: https://crfm.stanford.edu/helm/v1.0

    Journal ref: Published in Transactions on Machine Learning Research (TMLR), 2023

  40. arXiv:2210.14867  [pdf, other

    cs.CL cs.LG

    Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

    Authors: Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song

    Abstract: In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications. We show that going beyond English-centric bitexts, coupled with a novel sampling strategy aimed at reducing… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Work in progress

  41. arXiv:2210.07228  [pdf, other

    cs.CL cs.LG

    Language Model Decoding as Likelihood-Utility Alignment

    Authors: Martin Josifoski, Maxime Peyrard, Frano Rajic, Jiheng Wei, Debjit Paul, Valentin Hartmann, Barun Patra, Vishrav Chaudhary, Emre Kıcıman, Boi Faltings, Robert West

    Abstract: A critical component of a successful language generation pipeline is the decoding algorithm. However, the general principles that should guide the choice of a decoding algorithm remain unclear. Previous works only compare decoding algorithms in narrow scenarios, and their findings do not generalize across tasks. We argue that the misalignment between the model's likelihood and the task-specific no… ▽ More

    Submitted 16 March, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted at EACL (Findings) 2023

  42. arXiv:2210.06423  [pdf, other

    cs.LG cs.CL cs.CV

    Foundation Transformers

    Authors: Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

    Abstract: A big convergence of model architectures across language, vision, speech, and multimodal is emerging. However, under the same name "Transformers", the above areas use different implementations for better performance, e.g., Post-LayerNorm for BERT, and Pre-LayerNorm for GPT and vision Transformers. We call for the development of Foundation Transformer for true general-purpose modeling, which serves… ▽ More

    Submitted 19 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Work in progress

  43. arXiv:2207.10741  [pdf, other

    cs.CV

    Irrelevant Pixels are Everywhere: Find and Exclude Them for More Efficient Computer Vision

    Authors: Caleb Tung, Abhinav Goel, Xiao Hu, Nicholas Eliopoulos, Emmanuel Amobi, George K. Thiruvathukal, Vipin Chaudhary, Yung-Hsiang Lu

    Abstract: Computer vision is often performed using Convolutional Neural Networks (CNNs). CNNs are compute-intensive and challenging to deploy on power-contrained systems such as mobile and Internet-of-Things (IoT) devices. CNNs are compute-intensive because they indiscriminately compute many features on all pixels of the input image. We observe that, given a computer vision task, images often contain pixels… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  44. arXiv:2205.13198  [pdf, ps, other

    cs.IT

    Constellation Design for Non-Coherent Fast-Forward Relays to Mitigate Full-Duplex Jamming Attacks

    Authors: Vivek Chaudhary, J. Harshan

    Abstract: With potential applications to short-packet communication, we address communication of low-latency messages in fast-fading channels under the presence of a reactive jammer. Unlike a traditional jammer, we assume a full-duplex (FD) jammer capable of detecting pre-existing countermeasures and subsequently changing the target frequency band. To facilitate reliable communication amidst a strong advers… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

    Comments: Accepted for publication in IEEE Transactions on Communications

  45. arXiv:2204.14268  [pdf, other

    cs.CL cs.AI

    How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?

    Authors: Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzman

    Abstract: A multilingual tokenizer is a fundamental component of multilingual neural machine translation. It is trained from a multilingual corpus. Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus. However, few works have systematically answered how language imbalance in tokenizer training affects downstream performance. In… ▽ More

    Submitted 10 September, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

    Comments: AMTA 2022

  46. arXiv:2203.13867  [pdf, other

    cs.CL cs.LG

    Data Selection Curriculum for Neural Machine Translation

    Authors: Tasnim Mohiuddin, Philipp Koehn, Vishrav Chaudhary, James Cross, Shruti Bhosale, Shafiq Joty

    Abstract: Neural Machine Translation (NMT) models are typically trained on heterogeneous data that are concatenated and randomly shuffled. However, not all of the training data are equally useful to the model. Curriculum training aims to present the data to the NMT models in a meaningful order. In this work, we introduce a two-stage curriculum training framework for NMT where we fine-tune a base NMT model o… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  47. arXiv:2202.13274  [pdf, other

    cs.CL

    OCR Improves Machine Translation for Low-Resource Languages

    Authors: Oana Ignat, Jean Maillard, Vishrav Chaudhary, Francisco Guzmán

    Abstract: We aim to investigate the performance of current OCR systems on low resource languages and low resource scripts. We introduce and make publicly available a novel benchmark, OCR4MT, consisting of real and synthetic data, enriched with noise, for 60 low-resource languages in low resource scripts. We evaluate state-of-the-art OCR systems on our benchmark and analyse most common errors. We show that O… ▽ More

    Submitted 13 March, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

    Comments: Accepted at ACL Findings 2022

  48. arXiv:2202.05382  [pdf, other

    eess.IV cs.AI cs.CV

    Give me a knee radiograph, I will tell you where the knee joint area is: a deep convolutional neural network adventure

    Authors: Shi Yan, Taghi Ramazanian, Elham Sagheb, Walter K. Kremers, Vipin Chaudhary, Michael Taunton, Hilal Maradit Kremers, Ahmad P. Tafti

    Abstract: Knee pain is undoubtedly the most common musculoskeletal symptom that impairs quality of life, confines mobility and functionality across all ages. Knee pain is clinically evaluated by routine radiographs, where the widespread adoption of radiographic images and their availability at low cost, make them the principle component in the assessment of knee pain and knee pathologies, such as arthritis,… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: 13 Pages, 4 Figures

  49. arXiv:2112.10668  [pdf, other

    cs.CL cs.AI

    Few-shot Learning with Multilingual Language Models

    Authors: Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li

    Abstract: Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study t… ▽ More

    Submitted 10 November, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: Accepted to EMNLP 2022; 34 pages

  50. arXiv:2110.07804  [pdf, other

    cs.CL

    Alternative Input Signals Ease Transfer in Multilingual Machine Translation

    Authors: Simeng Sun, Angela Fan, James Cross, Vishrav Chaudhary, Chau Tran, Philipp Koehn, Francisco Guzman

    Abstract: Recent work in multilingual machine translation (MMT) has focused on the potential of positive transfer between languages, particularly cases where higher-resourced languages can benefit lower-resourced ones. While training an MMT model, the supervision signals learned from one language pair can be transferred to the other via the tokens shared by multiple source languages. However, the transfer i… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.