Skip to main content

Showing 1–26 of 26 results for author: Guu, K

  1. arXiv:2406.13121  [pdf, other

    cs.CL cs.AI cs.IR

    Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

    Authors: Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

    Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

  2. arXiv:2403.18286  [pdf, other

    cs.CL cs.AI cs.LG

    Few-Shot Recalibration of Language Models

    Authors: Xiang Lisa Li, Urvashi Khandelwal, Kelvin Guu

    Abstract: Recent work has uncovered promising ways to extract well-calibrated confidence estimates from language models (LMs), where the model's confidence score reflects how likely it is to be correct. However, while LMs may appear well-calibrated over broad distributions, this often hides significant miscalibration within narrower slices (e.g., systemic over-confidence in math can balance out systemic und… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: preprint

  3. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2305.14908  [pdf, other

    cs.CL

    PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions

    Authors: Anthony Chen, Panupong Pasupat, Sameer Singh, Hongrae Lee, Kelvin Guu

    Abstract: The remarkable capabilities of large language models have been accompanied by a persistent drawback: the generation of false and unsubstantiated claims commonly known as "hallucinations". To combat this issue, recent research has introduced approaches that involve editing and attributing the outputs of language models, particularly through prompt-based editing. However, the inference cost and spee… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  5. arXiv:2303.08114  [pdf, other

    cs.LG cs.CL

    Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs

    Authors: Kelvin Guu, Albert Webson, Ellie Pavlick, Lucas Dixon, Ian Tenney, Tolga Bolukbasi

    Abstract: Training data attribution (TDA) methods offer to trace a model's prediction on any given example back to specific influential training examples. Existing approaches do so by assigning a scalar influence score to each training example, under a simplifying assumption that influence is additive. But in reality, we observe that training examples interact in highly non-additive ways due to factors such… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  6. arXiv:2212.02475  [pdf, other

    cs.CL

    Meta-Learning Fast Weight Language Models

    Authors: Kevin Clark, Kelvin Guu, Ming-Wei Chang, Panupong Pasupat, Geoffrey Hinton, Mohammad Norouzi

    Abstract: Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference. We present Fast Weight Layers (FWLs), a neural component that provides the benefits of dynamic evaluation much more efficiently by expressing gradient updates as… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: EMNLP 2022 short paper

  7. arXiv:2210.08726  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    RARR: Researching and Revising What Language Models Say, Using Language Models

    Authors: Luyu Gao, Zhuyun Dai, Panupong Pasupat, Anthony Chen, Arun Tejasvi Chaganty, Yicheng Fan, Vincent Y. Zhao, Ni Lao, Hongrae Lee, Da-Cheng Juan, Kelvin Guu

    Abstract: Language models (LMs) now excel at many tasks such as few-shot learning, question answering, reasoning, and dialog. However, they sometimes generate unsupported or misleading content. A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence. To enable attribution while still preserving all… ▽ More

    Submitted 31 May, 2023; v1 submitted 16 October, 2022; originally announced October 2022.

    Comments: ACL 2023

  8. arXiv:2209.11755  [pdf, other

    cs.CL cs.IR

    Promptagator: Few-shot Dense Retrieval From 8 Examples

    Authors: Zhuyun Dai, Vincent Y. Zhao, Ji Ma, Yi Luan, Jianmo Ni, Jing Lu, Anton Bakalov, Kelvin Guu, Keith B. Hall, Ming-Wei Chang

    Abstract: Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize from one task to all the rest. However, this overlooks the fact that there are many diverse and unique retrieval tasks, each targeting different search… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

  9. arXiv:2205.11482  [pdf, other

    cs.CL cs.IR

    Towards Tracing Factual Knowledge in Language Models Back to the Training Data

    Authors: Ekin Akyürek, Tolga Bolukbasi, Frederick Liu, Binbin Xiong, Ian Tenney, Jacob Andreas, Kelvin Guu

    Abstract: Language models (LMs) have been shown to memorize a great deal of factual knowledge contained in their training data. But when an LM generates an assertion, it is often difficult to determine where it learned this information and whether it is true. In this paper, we propose the problem of fact tracing: identifying which training examples taught an LM to generate a particular factual assertion. Pr… ▽ More

    Submitted 25 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Findings of EMNLP, 2022

  10. arXiv:2205.09073  [pdf, other

    cs.CL cs.AI

    Dialog Inpainting: Turning Documents into Dialogs

    Authors: Zhuyun Dai, Arun Tejasvi Chaganty, Vincent Zhao, Aida Amini, Qazi Mamunur Rashid, Mike Green, Kelvin Guu

    Abstract: Many important questions (e.g. "How to eat healthier?") require conversation to establish context and explore in depth. However, conversational question answering (ConvQA) systems have long been stymied by scarce training data that is expensive to collect. To address this problem, we propose a new technique for synthetically generating diverse and high-quality dialog data: dialog inpainting. Our a… ▽ More

    Submitted 31 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

  11. Controllable Semantic Parsing via Retrieval Augmentation

    Authors: Panupong Pasupat, Yuan Zhang, Kelvin Guu

    Abstract: In practical applications of semantic parsing, we often want to rapidly change the behavior of the parser, such as enabling it to handle queries in a new domain, or changing its predictions on certain targeted queries. While we can introduce new training examples exhibiting the target behavior, a mechanism for enacting such behavior changes without expensive model re-training would be preferable.… ▽ More

    Submitted 23 February, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: EMNLP 2021

  12. arXiv:2109.01652  [pdf, other

    cs.CL

    Finetuned Language Models Are Zero-Shot Learners

    Authors: Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

    Abstract: This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natur… ▽ More

    Submitted 8 February, 2022; v1 submitted 3 September, 2021; originally announced September 2021.

    Comments: Version 5. Find list of changes in Appendix F (page 35)

  13. arXiv:2104.07478  [pdf, other

    cs.CL

    Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations

    Authors: Jonathan Herzig, Peter Shaw, Ming-Wei Chang, Kelvin Guu, Panupong Pasupat, Yuan Zhang

    Abstract: Sequence-to-sequence (seq2seq) models are prevalent in semantic parsing, but have been found to struggle at out-of-distribution compositional generalization. While specialized model architectures and pre-training of seq2seq models have been proposed to address this issue, the former often comes at the cost of generality and the latter only shows limited success. In this paper, we study the impact… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  14. arXiv:2102.01335  [pdf, other

    cs.CL cs.AI

    Neural Data Augmentation via Example Extrapolation

    Authors: Kenton Lee, Kelvin Guu, Luheng He, Tim Dozat, Hyung Won Chung

    Abstract: In many applications of machine learning, certain categories of examples may be underrepresented in the training data, causing systems to underperform on such "few-shot" cases at test time. A common remedy is to perform data augmentation, such as by duplicating underrepresented examples, or heuristically synthesizing new examples. But these remedies often fail to cover the full diversity and compl… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

  15. arXiv:2101.00133  [pdf, other

    cs.CL cs.AI

    NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

    Authors: Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini , et al. (28 additional authors not shown)

    Abstract: We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage conte… ▽ More

    Submitted 19 September, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

    Comments: 26 pages; Published in Proceedings of Machine Learning Research (PMLR), NeurIPS 2020 Competition and Demonstration Track

  16. arXiv:2007.05896  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Abstract Models for Strategic Exploration and Fast Reward Transfer

    Authors: Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang

    Abstract: Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions. However, learning an accurate Markov Decision Process (MDP) over high-dimensional states (e.g., raw pixels) is extremely challenging because it requires function approximation, which… ▽ More

    Submitted 11 July, 2020; originally announced July 2020.

  17. arXiv:2005.10389  [pdf, other

    cs.CL

    Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models

    Authors: Dan Iter, Kelvin Guu, Larry Lansing, Dan Jurafsky

    Abstract: Recent models for unsupervised representation learning of text have employed a number of techniques to improve contextual word representations but have put little focus on discourse-level representations. We propose CONPONO, an inter-sentence objective for pretraining language models that models discourse coherence and the distance between sentences. Given an anchor sentence, our model is trained… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: AC2020

  18. arXiv:2002.08909  [pdf, ps, other

    cs.CL cs.LG

    REALM: Retrieval-Augmented Language Model Pre-Training

    Authors: Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang

    Abstract: Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent kno… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

  19. arXiv:1906.01604  [pdf, ps, other

    cs.CL cs.LG stat.ML

    KERMIT: Generative Insertion-Based Modeling for Sequences

    Authors: William Chan, Nikita Kitaev, Kelvin Guu, Mitchell Stern, Jakob Uszkoreit

    Abstract: We present KERMIT, a simple insertion-based approach to generative modeling for sequences and sequence pairs. KERMIT models the joint distribution and its decompositions (i.e., marginals and conditionals) using a single neural network and, unlike much prior work, does not rely on a prespecified factorization of the data distribution. During training, one can feed KERMIT paired data $(x, y)$ to lea… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: William Chan, Nikita Kitaev, Kelvin Guu, and Mitchell Stern contributed equally

  20. arXiv:1812.01194  [pdf, other

    stat.ML cs.LG

    A Retrieve-and-Edit Framework for Predicting Structured Outputs

    Authors: Tatsunori B. Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang

    Abstract: For the task of generating complex outputs such as source code, editing existing outputs can be easier than generating complex outputs from scratch. With this motivation, we propose an approach that first retrieves a training example based on the input (e.g., natural language description) and then edits it to the desired output (e.g., code). Our contribution is a computationally efficient method f… ▽ More

    Submitted 3 December, 2018; originally announced December 2018.

    Comments: To appear, NeurIPS 2018

  21. arXiv:1809.02922  [pdf, other

    cs.CL

    Transforming Question Answering Datasets Into Natural Language Inference Datasets

    Authors: Dorottya Demszky, Kelvin Guu, Percy Liang

    Abstract: Existing datasets for natural language inference (NLI) have propelled research on language understanding. We propose a new method for automatically deriving NLI datasets from the growing abundance of large-scale question answering datasets. Our approach hinges on learning a sentence transformation model which converts question-answer pairs into their declarative forms. Despite being primarily trai… ▽ More

    Submitted 10 September, 2018; v1 submitted 9 September, 2018; originally announced September 2018.

    Comments: 11 pages, 6 figures

  22. arXiv:1808.09132  [pdf, other

    cs.CL

    Mapping Natural Language Commands to Web Elements

    Authors: Panupong Pasupat, Tian-Shun Jiang, Evan Zheran Liu, Kelvin Guu, Percy Liang

    Abstract: The web provides a rich, open-domain environment with textual, structural, and spatial properties. We propose a new task for grounding language in this environment: given a natural language command (e.g., "click on the second article"), choose the correct element on the web page (e.g., a hyperlink or text box). We collected a dataset of over 50,000 commands that capture various phenomena such as f… ▽ More

    Submitted 30 September, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

    Comments: EMNLP 2018

  23. arXiv:1802.08802  [pdf, other

    cs.AI

    Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

    Authors: Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, Percy Liang

    Abstract: Reinforcement learning (RL) agents improve through trial-and-error, but when reward is sparse and the agent cannot discover successful action sequences, learning stagnates. This has been a notable problem in training deep RL agents to perform web-based tasks, such as booking flights or replying to emails, where a single mistake can ruin the entire sequence of actions. A common remedy is to "warm-s… ▽ More

    Submitted 24 February, 2018; originally announced February 2018.

    Comments: International Conference on Learning Representations (ICLR), 2018

  24. arXiv:1709.08878  [pdf, other

    cs.CL cs.AI cs.LG cs.NE stat.ML

    Generating Sentences by Editing Prototypes

    Authors: Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang

    Abstract: We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to hu… ▽ More

    Submitted 7 September, 2018; v1 submitted 26 September, 2017; originally announced September 2017.

    Comments: 14 pages, Transactions of the Association for Computational Linguistics (TACL), 2018

  25. arXiv:1704.07926  [pdf, other

    cs.AI cs.LG stat.ML

    From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood

    Authors: Kelvin Guu, Panupong Pasupat, Evan Zheran Liu, Percy Liang

    Abstract: Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself. Consequently, we must search the space of programs for those that output the correct result, while not being misled by spurious programs: incorrect programs that coincid… ▽ More

    Submitted 25 April, 2017; originally announced April 2017.

    Comments: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (2017)

  26. arXiv:1506.01094  [pdf, other

    cs.CL cs.AI cs.DB stat.ML

    Traversing Knowledge Graphs in Vector Space

    Authors: Kelvin Guu, John Miller, Percy Liang

    Abstract: Path queries on a knowledge graph can be used to answer compositional questions such as "What languages are spoken by people living in Lisbon?". However, knowledge graphs often have missing facts (edges) which disrupts path queries. Recent models for knowledge base completion impute missing facts by embedding knowledge graphs in vector spaces. We show that these models can be recursively applied t… ▽ More

    Submitted 19 August, 2015; v1 submitted 2 June, 2015; originally announced June 2015.

    Comments: 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP)