Elnaz Nouri


2024

pdf bib
Proceedings of the 6th Workshop on NLP for Conversational AI (NLP4ConvAI 2024)
Elnaz Nouri | Abhinav Rastogi | Georgios Spithourakis | Bing Liu | Yun-Nung Chen | Yu Li | Alon Albalak | Hiromi Wakaki | Alexandros Papangelis
Proceedings of the 6th Workshop on NLP for Conversational AI (NLP4ConvAI 2024)

pdf bib
Solving Data-centric Tasks using Large Language Models
Shraddha Barke | Christian Poelitz | Carina Negreanu | Benjamin Zorn | José Cambronero | Andrew Gordon | Vu Le | Elnaz Nouri | Nadia Polikarpova | Advait Sarkar | Brian Slininger | Neil Toronto | Jack Williams
Findings of the Association for Computational Linguistics: NAACL 2024

Large language models are rapidly replacing help forums like StackOverflow, and are especially helpful to non-professional programmers and end users. These users are often interested in data-centric tasks, like spreadsheet manipulation and data wrangling, which are hard to solve if the intent is only communicated using a natural-language description, without including data. But how do we decide how much data and which data to include in the prompt?This paper makes two contributions towards answering this question. First, we create a dataset of real-world NL-to-code tasks manipulating tabular data, mined from StackOverflow posts. Second, we introduce a novel cluster-then-select prompting technique, which adds the most representative rows from the input data to the LLM prompt. Our experiments show that LLM performance is indeed sensitive to the amount of data passed in the prompt, and that for tasks with a lot of syntactic variation in the input table,our cluster-then-select technique outperforms a random selection baseline.

2023

pdf bib
HELP ME THINK: A Simple Prompting Strategy for Non-experts to Create Customized Content with Models
Swaroop Mishra | Elnaz Nouri
Findings of the Association for Computational Linguistics: ACL 2023

Controlling the text generated by language models and customizing the content has been a long-standing challenge. Existing prompting techniques proposed in pursuit of providing control are task-specific and lack generality; this provides overwhelming choices for non-expert users to find a suitable method for their task. The effort associated with those techniques, such as in writing examples, explanations, instructions, etc. further limits their adoption among non-expert users. In this paper, we propose a simple prompting strategy Help Me Think where we encourage largelanguage models (such as GPT3 and ChatGPT) to help non-expert users by asking a set of relevant questions and leveraging user answers to execute the task. We demonstrate the efficacy of our technique Help Me Think on a variety of tasks. Specifically, we focus on tasks that are hard for average humans and require significant thinking to perform. We hope our work will encourage the development of unconventional ways to harness the power of large language models.

pdf bib
InstructExcel: A Benchmark for Natural Language Instruction in Excel
Justin Payan | Swaroop Mishra | Mukul Singh | Carina Negreanu | Christian Poelitz | Chitta Baral | Subhro Roy | Rasika Chakravarthy | Benjamin Van Durme | Elnaz Nouri
Findings of the Association for Computational Linguistics: EMNLP 2023

With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions. To do so we introduce a new large-scale benchmark, InstructExcel, created by leveraging the ‘Automate’ feature in Excel to automatically generate OfficeScripts from users’ actions. Our benchmark includes over 10k samples covering 170+ Excel operations across 2,000 publicly available Excel spreadsheets. Experiments across various zero-shot and few-shot settings show that InstructExcel is a hard benchmark for state of the art models like GPT-4. We observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.

2022

pdf bib
Handling Comments in Documents through Interactions
Elnaz Nouri | Carlos Toxtli
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

Comments are widely used by users in collaborative documents every day. The documents’ comments enable collaborative editing and review dynamics, transforming each document into a context-sensitive communication channel. Understanding the role of comments in communication dynamics within documents is the first step towards automating their management. In this paper we propose the first ever taxonomy for different types of in-document comments based on analysis of a large scale dataset of public documents from the web. We envision that the next generation of intelligent collaborative document experiences allow interactive creation and consumption of content, there We also introduce the components necessary for developing novel tools that automate the handling of comments through natural language interaction with the documents. We identify the commands that users would use to respond to various types of comments. We train machine learning algorithms to recognize the different types of comments and assess their feasibility. We conclude by discussing some of the implications for the design of automatic document management tools.

pdf bib
Reinforcement Guided Multi-Task Learning Framework for Low-Resource Stereotype Detection
Rajkumar Pujari | Erik Oveson | Priyanka Kulkarni | Elnaz Nouri
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As large Pre-trained Language Models (PLMs) trained on large amounts of data in an unsupervised manner become more ubiquitous, identifying various types of bias in the text has come into sharp focus. Existing ‘Stereotype Detection’ datasets mainly adopt a diagnostic approach toward large PLMs. Blodgett et. al. (2021) show that there are significant reliability issues with the existing benchmark datasets. Annotating a reliable dataset requires a precise understanding of the subtle nuances of how stereotypes manifest in text. In this paper, we annotate a focused evaluation set for ‘Stereotype Detection’ that addresses those pitfalls by de-constructing various ways in which stereotypes manifest in text. Further, we present a multi-task model that leverages the abundance of data-rich neighboring tasks such as hate speech detection, offensive language detection, misogyny detection, etc., to improve the empirical performance on ‘Stereotype Detection’. We then propose a reinforcement-learning agent that guides the multi-task learning model by learning to identify the training examples from the neighboring tasks that help the target task the most. We show that the proposed models achieve significant empirical gains over existing baselines on all the tasks.

pdf bib
Proceedings of the 4th Workshop on NLP for Conversational AI
Bing Liu | Alexandros Papangelis | Stefan Ultes | Abhinav Rastogi | Yun-Nung Chen | Georgios Spithourakis | Elnaz Nouri | Weiyan Shi
Proceedings of the 4th Workshop on NLP for Conversational AI

2021

pdf bib
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI
Alexandros Papangelis | Paweł Budzianowski | Bing Liu | Elnaz Nouri | Abhinav Rastogi | Yun-Nung Chen
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

2020

pdf bib
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks
Angela Lin | Sudha Rao | Asli Celikyilmaz | Elnaz Nouri | Chris Brockett | Debadeepta Dey | Bill Dolan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Many high-level procedural tasks can be decomposed into sequences of instructions that vary in their order and choice of tools. In the cooking domain, the web offers many, partially-overlapping, text and video recipes (i.e. procedures) that describe how to make the same dish (i.e. high-level task). Aligning instructions for the same dish across different sources can yield descriptive visual explanations that are far richer semantically than conventional textual instructions, providing commonsense insight into how real-world procedures are structured. Learning to align these different instruction sets is challenging because: a) different recipes vary in their order of instructions and use of ingredients; and b) video instructions can be noisy and tend to contain far more information than text instructions. To address these challenges, we use an unsupervised alignment algorithm that learns pairwise alignments between instructions of different recipes for the same dish. We then use a graph algorithm to derive a joint alignment between multiple text and multiple video recipes for the same dish. We release the Microsoft Research Multimodal Aligned Recipe Corpus containing ~150K pairwise alignments between recipes across 4262 dishes with rich commonsense information.

2015

pdf bib
Reinforcement Learning in Multi-Party Trading Dialog
Takuya Hiraoka | Kallirroi Georgila | Elnaz Nouri | David Traum | Satoshi Nakamura
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib
Initiative Taking in Negotiation
Elnaz Nouri | David Traum
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)