Skip to main content

Showing 1–25 of 25 results for author: Hearst, M A

  1. arXiv:2407.15959  [pdf, other

    cs.HC

    "It's a Good Idea to Put It Into Words": Writing `Rudders' in the Initial Stages of Visualization Design

    Authors: Chase Stokes, Clara Hu, Marti A. Hearst

    Abstract: Written language is a useful tool for non-visual creative activities like writing essays and planning searches. This paper investigates the integration of written language in to the visualization design process. We create the idea of a 'writing rudder,' which acts as a guiding force or strategy for the design. Via an interview study of 24 working visualization designers, we first established that… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures, accepted for IEEE VIS conference 2024

    ACM Class: H.5.0

  2. arXiv:2404.00131  [pdf, ps, other

    cs.HC

    Give Text A Chance: Advocating for Equal Consideration for Language and Visualization

    Authors: Chase Stokes, Marti A. Hearst

    Abstract: Visualization research tends to de-emphasize consideration of the textual context in which its images are placed. We argue that visualization research should consider textual representations as a primary alternative to visual options when assessing designs, and when assessing designs, equal attention should be given to the construction of the language as to the visualizations. We also call for a c… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: 2 pages

    Journal ref: NL VIZ: 2021 Workshop on Exploring Opportunities and Challenges for Natural Language Techniques to Support Visual Analysis

  3. arXiv:2402.12255  [pdf, other

    cs.CL

    Shallow Synthesis of Knowledge in GPT-Generated Texts: A Case Study in Automatic Related Work Composition

    Authors: Anna Martin-Boyle, Aahan Tyagi, Marti A. Hearst, Dongyeop Kang

    Abstract: Numerous AI-assisted scholarly applications have been developed to aid different stages of the research process. We present an analysis of AI-assisted scholarly writing generated with ScholaCite, a tool we built that is designed for organizing literature and composing Related Work sections for academic papers. Our evaluation method focuses on the analysis of citation graphs to assess the structura… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 15 pages, 5 figures, submitted to ACL 2024

  4. The Role of Text in Visualizations: How Annotations Shape Perceptions of Bias and Influence Predictions

    Authors: Chase Stokes, Cindy Xiong Bearfield, Marti A. Hearst

    Abstract: This paper investigates the role of text in visualizations, specifically the impact of text position, semantic content, and biased wording. Two empirical studies were conducted based on two tasks (predicting data trends and appraising bias) using two visualization types (bar and line charts). While the addition of text had a minimal effect on how people perceive data trends, there was a significan… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: 12 pages, 7 figures, for supplemental materials: https://github.com/chasejstokes/role-text

    ACM Class: H.5.0

  5. arXiv:2309.15337  [pdf, other

    cs.CL cs.HC

    Beyond the Chat: Executable and Verifiable Text-Editing with LLMs

    Authors: Philippe Laban, Jesse Vig, Marti A. Hearst, Caiming Xiong, Chien-Sheng Wu

    Abstract: Conversational interfaces powered by Large Language Models (LLMs) have recently become a popular way to obtain feedback during document editing. However, standard chat-based conversational interfaces do not support transparency and verifiability of the editing changes that they suggest. To give the author more agency when editing with an LLM, we present InkSync, an editing interface that suggests… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  6. arXiv:2305.14660  [pdf, other

    cs.CL

    Complex Mathematical Symbol Definition Structures: A Dataset and Model for Coordination Resolution in Definition Extraction

    Authors: Anna Martin-Boyle, Andrew Head, Kyle Lo, Risham Sidhu, Marti A. Hearst, Dongyeop Kang

    Abstract: Mathematical symbol definition extraction is important for improving scholarly reading interfaces and scholarly information extraction (IE). However, the task poses several challenges: math symbols are difficult to process as they are not composed of natural language morphemes; and scholarly papers often contain sentences that require resolving complex coordinate structures. We present SymDef, an… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 9 pages, 4 figures

    ACM Class: I.2.7

  7. arXiv:2304.02812  [pdf, other

    cs.CL

    Pragmatically Appropriate Diversity for Dialogue Evaluation

    Authors: Katherine Stasaski, Marti A. Hearst

    Abstract: Linguistic pragmatics state that a conversation's underlying speech acts can constrain the type of response which is appropriate at each turn in the conversation. When generating dialogue responses, neural dialogue agents struggle to produce diverse responses. Currently, dialogue diversity is assessed using automatic metrics, but the underlying speech acts do not inform these metrics. To remedy… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  8. arXiv:2303.14334  [pdf, other

    cs.HC cs.AI cs.CL

    The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

    Authors: Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney , et al. (30 additional authors not shown)

    Abstract: Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has chan… ▽ More

    Submitted 23 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

  9. Scim: Intelligent Skimming Support for Scientific Papers

    Authors: Raymond Fok, Hita Kambhamettu, Luca Soldaini, Jonathan Bragg, Kyle Lo, Andrew Head, Marti A. Hearst, Daniel S. Weld

    Abstract: Researchers need to keep up with immense literatures, though it is time-consuming and difficult to do so. In this paper, we investigate the role that intelligent interfaces can play in helping researchers skim papers, that is, rapidly reviewing a paper to attain a cursory understanding of its contents. After conducting formative interviews and a design probe, we suggest that skimming aids should a… ▽ More

    Submitted 25 September, 2023; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: Updated to reflect version published in proceedings of IUI 2023

  10. arXiv:2205.01497  [pdf, other

    cs.CL

    Semantic Diversity in Dialogue with Natural Language Inference

    Authors: Katherine Stasaski, Marti A. Hearst

    Abstract: Generating diverse, interesting responses to chitchat conversations is a problem for neural conversational agents. This paper makes two substantial contributions to improving diversity in dialogue generation. First, we propose a novel metric which uses Natural Language Inference (NLI) to measure the semantic diversity of a set of model responses for a conversation. We evaluate this metric using an… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: To appear at NAACL 2022

  11. arXiv:2203.00130  [pdf, other

    cs.HC cs.CL

    Paper Plain: Making Medical Research Papers Approachable to Healthcare Consumers with Natural Language Processing

    Authors: Tal August, Lucy Lu Wang, Jonathan Bragg, Marti A. Hearst, Andrew Head, Kyle Lo

    Abstract: When seeking information not covered in patient-friendly documents, like medical pamphlets, healthcare consumers may turn to the research literature. Reading medical papers, however, can be a challenging experience. To improve access to medical papers, we introduce a novel interactive interface-Paper Plain-with four features powered by natural language processing: definitions of unfamiliar terms,… ▽ More

    Submitted 28 February, 2022; originally announced March 2022.

    Comments: 39 pages, 10 figures

    ACM Class: H.5.2

  12. NewsPod: Automatic and Interactive News Podcasts

    Authors: Philippe Laban, Elicia Ye, Srujay Korlakunta, John Canny, Marti A. Hearst

    Abstract: News podcasts are a popular medium to stay informed and dive deep into news topics. Today, most podcasts are handcrafted by professionals. In this work, we advance the state-of-the-art in automatically generated podcasts, making use of recent advances in natural language processing and text-to-speech technology. We present NewsPod, an automatically generated, interactive news podcast. The podcast… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    Comments: Accepted at IUI 2022, 16 pages, 10 figures

  13. arXiv:2111.09525  [pdf, other

    cs.CL

    SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

    Authors: Philippe Laban, Tobias Schnabel, Paul N. Bennett, Marti A. Hearst

    Abstract: In the summarization domain, a key requirement for summaries is to be factually consistent with the input document. Previous work has found that natural language inference (NLI) models do not perform competitively when applied to inconsistency detection. In this work, we revisit the use of NLI for inconsistency detection, finding that past work suffered from a mismatch in input granularity between… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: TACL pre-MIT Press publication version; 11 pages, 2 figures, 5 tables

  14. arXiv:2107.03448  [pdf, other

    cs.CL

    Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

    Authors: Philippe Laban, Luke Dai, Lucas Bandarkar, Marti A. Hearst

    Abstract: The Shuffle Test is the most common task to evaluate whether NLP models can measure coherence in text. Most recent work uses direct supervision on the task; we show that by simply finetuning a RoBERTa model, we can achieve a near perfect accuracy of 97.8%, a state-of-the-art. We argue that this outstanding performance is unlikely to lead to a good model of text coherence, and suggest that the Shuf… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: Accepted at ACL-IJCNLP 2021 (short paper), 7 pages, 4 figures

    Journal ref: Association for Computational Linguistics (2021)

  15. arXiv:2107.03444  [pdf, other

    cs.CL

    Keep it Simple: Unsupervised Simplification of Multi-Paragraph Text

    Authors: Philippe Laban, Tobias Schnabel, Paul Bennett, Marti A. Hearst

    Abstract: This work presents Keep it Simple (KiS), a new approach to unsupervised text simplification which learns to balance a reward across three properties: fluency, salience and simplicity. We train the model with a novel algorithm to optimize the reward (k-SCST), in which the model proposes several candidate simplifications, computes each candidate's reward, and encourages candidates that outperform th… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: Accepted at ACL-IJCNLP 2021, 14 pages, 7 figures

    Journal ref: Association for Computational Linguistics (2021)

  16. What's The Latest? A Question-driven News Chatbot

    Authors: Philippe Laban, John Canny, Marti A. Hearst

    Abstract: This work describes an automatic news chatbot that draws content from a diverse set of news articles and creates conversations with a user about the news. Key components of the system include the automatic organization of news articles into topical chatrooms, integration of automatically generated questions into the conversation, and a novel method for choosing which questions to present which avo… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: ACL2020 Demo Track, 8 pages, 5 figures

    Journal ref: ACL Demos (2020) 380-387

  17. arXiv:2105.05391  [pdf, other

    cs.CL

    News Headline Grouping as a Challenging NLU Task

    Authors: Philippe Laban, Lucas Bandarkar, Marti A. Hearst

    Abstract: Recent progress in Natural Language Understanding (NLU) has seen the latest models outperform human performance on many standard tasks. These impressive results have led the community to introspect on dataset limitations, and iterate on more nuanced challenges. In this paper, we introduce the task of HeadLine Grouping (HLG) and a corresponding dataset (HLGD) consisting of 20,056 pairs of news head… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: NAACL2021, 13 pages, 8 figures

  18. The Summary Loop: Learning to Write Abstractive Summaries Without Examples

    Authors: Philippe Laban, Andrew Hsi, John Canny, Marti A. Hearst

    Abstract: This work presents a new approach to unsupervised abstractive summarization based on maximizing a combination of coverage and fluency for a given length constraint. It introduces a novel method that encourages the inclusion of key terms from the original document into the summary: key terms are masked out of the original document and must be filled in by a coverage model using the current generate… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: ACL2020, 16 pages, 9 figures

    Journal ref: Association for Computational Linguistics (2020) 5135-5150

  19. arXiv:2105.00121  [pdf, other

    cs.DB cs.HC

    Lux: Always-on Visualization Recommendations for Exploratory Dataframe Workflows

    Authors: Doris Jung-Lin Lee, Dixin Tang, Kunal Agarwal, Thyne Boonmark, Caitlyn Chen, Jake Kang, Ujjaini Mukhopadhyay, Jerry Song, Micah Yong, Marti A. Hearst, Aditya G. Parameswaran

    Abstract: Exploratory data science largely happens in computational notebooks with dataframe APIs, such as pandas, that support flexible means to transform, clean, and analyze data. Yet, visually exploring data in dataframes remains tedious, requiring substantial programming effort for visualization and mental effort to determine what analysis to perform next. We propose Lux, an always-on framework for acce… ▽ More

    Submitted 22 December, 2021; v1 submitted 30 April, 2021; originally announced May 2021.

  20. arXiv:2010.05129  [pdf, other

    cs.CL

    Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions

    Authors: Dongyeop Kang, Andrew Head, Risham Sidhu, Kyle Lo, Daniel S. Weld, Marti A. Hearst

    Abstract: The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. Despite prior work on definition detection, current approaches are far from being accurate enough to use in real-world applications. In this paper, we first perform in-depth error analysis of the current best performing definition detection s… ▽ More

    Submitted 10 October, 2020; originally announced October 2020.

    Comments: Workshop on Scholarly Document Processing (SDP), EMNLP 2020

  21. arXiv:2009.14237  [pdf, other

    cs.HC cs.AI cs.CL

    Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols

    Authors: Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S. Weld, Marti A. Hearst

    Abstract: Despite the central importance of research papers to scientific progress, they can be difficult to read. Comprehension is often stymied when the information needed to understand a passage resides somewhere else: in another section, or in another paper. In this work, we envision how interfaces can bring definitions of technical terms and symbols to readers when and where they need them most. We int… ▽ More

    Submitted 27 April, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: 18 pages, 17 figures, 2 tables. To appear at the 2021 ACM CHI Conference on Human Factors in Computing Systems. For associated video, see https://youtu.be/yYcQf-Yq8B0. v2 changes: expanded discussion of design process and implementation; improved figure design. v3 changes: fixed typo in cell of Table 2; updated HEDDEx and Schwarz-Hearst accuracy in Section 5.3

    ACM Class: H.5.2

  22. arXiv:2005.12668  [pdf, other

    cs.IR cs.DL cs.HC cs.LG

    SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search

    Authors: Tom Hope, Jason Portenoy, Kishore Vasan, Jonathan Borchardt, Eric Horvitz, Daniel S. Weld, Marti A. Hearst, Jevin West

    Abstract: The COVID-19 pandemic has sparked unprecedented mobilization of scientists, generating a deluge of papers that makes it hard for researchers to keep track and explore new directions. Search engines are designed for targeted queries, not for discovery of connections across a corpus. In this paper, we present SciSight, a system for exploratory search of COVID-19 research integrating two key capabili… ▽ More

    Submitted 20 September, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: Accepted to EMNLP 2020

  23. Detecting Figures and Part Labels in Patents: Competition-Based Development of Image Processing Algorithms

    Authors: Christoph Riedl, Richard Zanibbi, Marti A. Hearst, Siyu Zhu, Michael Menietti, Jason Crusan, Ivan Metelsky, Karim R. Lakhani

    Abstract: We report the findings of a month-long online competition in which participants developed algorithms for augmenting the digital version of patent documents published by the United States Patent and Trademark Office (USPTO). The goal was to detect figures and part labels in U.S. patent drawing pages. The challenge drew 232 teams of two, of which 70 teams (30%) submitted solutions. Collectively, tea… ▽ More

    Submitted 11 November, 2014; v1 submitted 24 October, 2014; originally announced October 2014.

  24. Adaptive Sentence Boundary Disambiguation

    Authors: David D. Palmer, Marti A. Hearst

    Abstract: Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that… ▽ More

    Submitted 21 November, 1994; v1 submitted 16 November, 1994; originally announced November 1994.

    Comments: This is a Latex version of the previously submitted ps file (formatted as a uuencoded gz-compressed .tar file created by csh script). The software from the work described in this paper is available by contacting dpalmer@cs.berkeley.edu

    Journal ref: Proceedings of ANLP 94

  25. Multi-Paragraph Segmentation of Expository Text

    Authors: Marti A. Hearst

    Abstract: This paper describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which reflect the subtopic structure of the texts. The algorithm uses domain-independent lexical frequency and distribution information to recognize the interactions of multiple simultaneous themes. Two fully-implemented versions of the algorithm are described and shown t… ▽ More

    Submitted 23 June, 1994; originally announced June 1994.

    Comments: To Appear in ACL '94 Proceedings; 8 pages POSTSCRIPT format