subscribe to arXiv mailings

PolicyCraft: Supporting Collaborative and Participatory Policy Design through Case-Grounded Deliberation

Authors: Tzu-Sheng Kuo, Quan Ze Chen, Amy X. Zhang, Jane Hsieh, Haiyi Zhu, Kenneth Holstein

Abstract: Community and organizational policies are typically designed in a top-down, centralized fashion, with limited input from impacted stakeholders. This can result in policies that are misaligned with community needs or perceived as illegitimate. How can we support more collaborative, participatory approaches to policy design? In this paper, we present PolicyCraft, a system that structures collaborati… ▽ More Community and organizational policies are typically designed in a top-down, centralized fashion, with limited input from impacted stakeholders. This can result in policies that are misaligned with community needs or perceived as illegitimate. How can we support more collaborative, participatory approaches to policy design? In this paper, we present PolicyCraft, a system that structures collaborative policy design through case-grounded deliberation. Building on past research that highlights the value of concrete cases in establishing common ground, PolicyCraft supports users in collaboratively proposing, critiquing, and revising policies through discussion and voting on cases. A field study across two university courses showed that students using PolicyCraft reached greater consensus and developed better-supported course policies, compared with those using a baseline system that did not scaffold their use of concrete cases. Reflecting on our findings, we discuss opportunities for future HCI systems to help groups more effectively bridge between abstract policies and concrete cases. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.08622 [pdf, other]

Policy Prototyping for LLMs: Pluralistic Alignment via Interactive and Collaborative Policymaking

Authors: K. J. Kevin Feng, Inyoung Cheong, Quan Ze Chen, Amy X. Zhang

Abstract: Emerging efforts in AI alignment seek to broaden participation in shaping model behavior by eliciting and integrating collective input into a policy for model finetuning. While pluralistic, these processes are often linear and do not allow participating stakeholders to confirm whether potential outcomes of their contributions are indeed consistent with their intentions. Design prototyping has long… ▽ More Emerging efforts in AI alignment seek to broaden participation in shaping model behavior by eliciting and integrating collective input into a policy for model finetuning. While pluralistic, these processes are often linear and do not allow participating stakeholders to confirm whether potential outcomes of their contributions are indeed consistent with their intentions. Design prototyping has long advocated for rapid iteration using tight feedback loops of ideation, experimentation, and evaluation to mitigate these issues. We thus propose policy prototyping for LLMs, a new process that draws inspiration from prototyping practices to enable stakeholders to collaboratively and interactively draft LLM policies. Through learnings from a real-world LLM policymaking initiative at an industrial AI lab, we motivate our approach and characterize policy prototyping with four guiding principles. Because policy prototyping emphasizes a contrasting set of priorities compared to previous approaches, we envision our approach to be a valuable addition to the methodological repertoire for pluralistic alignment. △ Less

Submitted 13 September, 2024; originally announced September 2024.

arXiv:2409.03247 [pdf, other]

End User Authoring of Personalized Content Classifiers: Comparing Example Labeling, Rule Writing, and LLM Prompting

Authors: Leijie Wang, Kathryn Yurechko, Pranati Dani, Quan Ze Chen, Amy X. Zhang

Abstract: Existing tools for laypeople to create personal classifiers often assume a motivated user working uninterrupted in a single, lengthy session. However, users tend to engage with social media casually, with many short sessions on an ongoing, daily basis. To make creating personal classifiers for content curation easier for such users, tools should support rapid initialization and iterative refinemen… ▽ More Existing tools for laypeople to create personal classifiers often assume a motivated user working uninterrupted in a single, lengthy session. However, users tend to engage with social media casually, with many short sessions on an ongoing, daily basis. To make creating personal classifiers for content curation easier for such users, tools should support rapid initialization and iterative refinement. In this work, we compare three strategies -- (1) example labeling, (2) rule writing, and (3) large language model (LLM) prompting -- for end users to build personal content classifiers. From an experiment with 37 non-programmers tasked with creating personalized comment moderation filters, we found that with LLM prompting, participants reached 95\% of peak performance in 5 minutes, beating other strategies due to higher recall, but all strategies struggled with iterative refinement. Despite LLM prompting's better performance, participants preferred different strategies in different contexts and, even when prompting, provided examples or wrote rule-like prompts, suggesting hybrid approaches. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2407.17579 [pdf, ps, other]

doi 10.1145/3678884.3681833

Envisioning New Futures of Positive Social Technology: Beyond Paradigms of Fixing, Protecting, and Preventing

Authors: JaeWon Kim, Lindsay Popowski, Anna Fang, Cassidy Pyle, Guo Freeman, Ryan M. Kelly, Angela Y. Lee, Fannie Liu, Angela D. R. Smith, Alexandra To, Amy X. Zhang

Abstract: Social technology research today largely focuses on mitigating the negative impacts of technology and, therefore, often misses the potential of technology to enhance human connections and well-being. However, we see a potential to shift towards a holistic view of social technology's impact on human flourishing. We introduce Positive Social Technology (Positech), a framework that shifts emphasis to… ▽ More Social technology research today largely focuses on mitigating the negative impacts of technology and, therefore, often misses the potential of technology to enhance human connections and well-being. However, we see a potential to shift towards a holistic view of social technology's impact on human flourishing. We introduce Positive Social Technology (Positech), a framework that shifts emphasis toward leveraging social technologies to support and augment human flourishing. This workshop is organized around three themes relevant to Positech: 1) "Exploring Relevant and Adjacent Research" to define and widen the Positech scope with insights from related fields, 2) "Projecting the Landscape of Positech" for participants to outline the domain's key aspects and 3) "Envisioning the Future of Positech," anchored around strategic planning towards a sustainable research community. Ultimately, this workshop will serve as a platform to shift the narrative of social technology research towards a more positive, human-centric approach. It will foster research that goes beyond fixing technologies to protect humans from harm, to also pursue enriching human experiences and connections through technology. △ Less

Submitted 14 October, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

arXiv:2404.04516 [pdf, other]

Language Models as Critical Thinking Tools: A Case Study of Philosophers

Authors: Andre Ye, Jared Moore, Rose Novick, Amy X. Zhang

Abstract: Current work in language models (LMs) helps us speed up or even skip thinking by accelerating and automating cognitive work. But can LMs help us with critical thinking -- thinking in deeper, more reflective ways which challenge assumptions, clarify ideas, and engineer new concepts? We treat philosophy as a case study in critical thinking, and interview 21 professional philosophers about how they e… ▽ More Current work in language models (LMs) helps us speed up or even skip thinking by accelerating and automating cognitive work. But can LMs help us with critical thinking -- thinking in deeper, more reflective ways which challenge assumptions, clarify ideas, and engineer new concepts? We treat philosophy as a case study in critical thinking, and interview 21 professional philosophers about how they engage in critical thinking and on their experiences with LMs. We find that philosophers do not find LMs to be useful because they lack a sense of selfhood (memory, beliefs, consistency) and initiative (curiosity, proactivity). We propose the selfhood-initiative model for critical thinking tools to characterize this gap. Using the model, we formulate three roles LMs could play as critical thinking tools: the Interlocutor, the Monitor, and the Respondent. We hope that our work inspires LM researchers to further develop LMs as critical thinking tools and philosophers and other 'critical thinkers' to imagine intellectually substantive uses of LMs. △ Less

Submitted 7 August, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

arXiv:2403.11169 [pdf, other]

Correcting misinformation on social media with a large language model

Authors: Xinyi Zhou, Ashish Sharma, Amy X. Zhang, Tim Althoff

Abstract: Real-world misinformation, often multimodal, can be partially or fully factual but misleading using diverse tactics like conflating correlation with causation. Such misinformation is severely understudied, challenging to address, and harms various social domains, particularly on social media, where it can spread rapidly. High-quality and timely correction of misinformation that identifies and expl… ▽ More Real-world misinformation, often multimodal, can be partially or fully factual but misleading using diverse tactics like conflating correlation with causation. Such misinformation is severely understudied, challenging to address, and harms various social domains, particularly on social media, where it can spread rapidly. High-quality and timely correction of misinformation that identifies and explains its (in)accuracies effectively reduces false beliefs. Despite the wide acceptance of manual correction, it is difficult to be timely and scalable. While LLMs have versatile capabilities that could accelerate misinformation correction, they struggle due to a lack of recent information, a tendency to produce false content, and limitations in addressing multimodal information. We propose MUSE, an LLM augmented with access to and credibility evaluation of up-to-date information. By retrieving evidence as refutations or supporting context, MUSE identifies and explains content (in)accuracies with references. It conducts multimodal retrieval and interprets visual content to verify and correct multimodal content. Given the absence of a comprehensive evaluation approach, we propose 13 dimensions of misinformation correction quality. Then, fact-checking experts evaluate responses to social media content that are not presupposed to be misinformation but broadly include (partially) incorrect and correct posts that may (not) be misleading. Results demonstrate MUSE's ability to write high-quality responses to potential misinformation--across modalities, tactics, domains, political leanings, and for information that has not previously been fact-checked online--within minutes of its appearance on social media. Overall, MUSE outperforms GPT-4 by 37% and even high-quality responses from laypeople by 29%. Our work provides a general methodological and evaluative framework to correct misinformation at scale. △ Less

Submitted 3 September, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Comments: 50 pages

arXiv:2403.04873 [pdf, other]

The SIDO Performance Model for League of Legends

Authors: Amy X. Zhang, Parth Naidu

Abstract: League of Legends (LoL) has been a dominant esport for a decade, yet the inherent complexity of the game has stymied the creation of analytical measures of player skill and performance. Current industry standards are limited to easy-to-procure individual player statistics that are incomplete and lacking context as they do not take into account teamplay or game state. We present a unified performan… ▽ More League of Legends (LoL) has been a dominant esport for a decade, yet the inherent complexity of the game has stymied the creation of analytical measures of player skill and performance. Current industry standards are limited to easy-to-procure individual player statistics that are incomplete and lacking context as they do not take into account teamplay or game state. We present a unified performance model for League of Legends which blends together measures of a player's contribution within the context of their team, insights from traditional sports metrics such as the Plus-Minus model, and the intricacies of LoL as a complex team invasion sport. Using hierarchical Bayesian models, we outline the use of gold and damage dealt as a measure of skill, detailing players' impact on their own-, their allies'- and their enemies' statistics throughout the course of the game. Our results showcase the model's increased efficacy in separating professional players when compared to a Plus-Minus model and to current esports industry standards, while metric quality is rigorously assessed for discrimination, independence, and stability. Readers might also find additional qualitative analytics which explore champion proficiency and the impact of collaborative team-play. Future work is proposed to refine and expand the SIDO performance model, offering a comprehensive framework for esports analytics in team performance management, scouting and research realms. △ Less

Submitted 6 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.17847 [pdf]

doi 10.1145/3613904.3642241

Mitigating Barriers to Public Social Interaction with Meronymous Communication

Authors: Nouran Soliman, Hyeonsu B Kang, Matthew Latzke, Jonathan Bragg, Joseph Chee Chang, Amy X. Zhang, David R Karger

Abstract: In communities with social hierarchies, fear of judgment can discourage communication. While anonymity may alleviate some social pressure, fully anonymous spaces enable toxic behavior and hide the social context that motivates people to participate and helps them tailor their communication. We explore a design space of meronymous communication, where people can reveal carefully chosen aspects of t… ▽ More In communities with social hierarchies, fear of judgment can discourage communication. While anonymity may alleviate some social pressure, fully anonymous spaces enable toxic behavior and hide the social context that motivates people to participate and helps them tailor their communication. We explore a design space of meronymous communication, where people can reveal carefully chosen aspects of their identity and also leverage trusted endorsers to gain credibility. We implemented these ideas in a system for scholars to meronymously seek and receive paper recommendations on Twitter and Mastodon. A formative study with 20 scholars confirmed that scholars see benefits to participating but are deterred due to social anxiety. From a month-long public deployment, we found that with meronymity, junior scholars could comfortably ask ``newbie'' questions and get responses from senior scholars who they normally found intimidating. Responses were also tailored to the aspects about themselves that junior scholars chose to reveal. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

arXiv:2402.05388 [pdf, other]

doi 10.1145/3641006

Form-From: A Design Space of Social Media Systems

Authors: Amy X. Zhang, Michael S. Bernstein, David R. Karger, Mark S. Ackerman

Abstract: Social media systems are as varied as they are pervasive. They have been almost universally adopted for a broad range of purposes including work, entertainment, activism, and decision making. As a result, they have also diversified, with many distinct designs differing in content type, organization, delivery mechanism, access control, and many other dimensions. In this work, we aim to characterize… ▽ More Social media systems are as varied as they are pervasive. They have been almost universally adopted for a broad range of purposes including work, entertainment, activism, and decision making. As a result, they have also diversified, with many distinct designs differing in content type, organization, delivery mechanism, access control, and many other dimensions. In this work, we aim to characterize and then distill a concise design space of social media systems that can help us understand similarities and differences, recognize potential consequences of design choices, and identify spaces for innovation. Our model, which we call Form-From, characterizes social media based on (1) the form of the content, either threaded or flat, and (2) from where or from whom one might receive content, ranging from spaces to networks to the commons. We derive Form-From inductively from a larger set of 62 dimensions organized into 10 categories. To demonstrate the utility of our model, we trace the history of social media systems as they traverse the Form-From space over time, and we identify common design patterns within cells of the model. △ Less

Submitted 23 March, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Journal ref: Proc. ACM Hum.-Comput. Interact. 8, CSCW1, Article 167 (April 2024), 47 pages

arXiv:2402.03259 [pdf, other]

Meeting Bridges: Designing Information Artifacts that Bridge from Synchronous Meetings to Asynchronous Collaboration

Authors: Ruotong Wang, Lin Qiu, Justin Cranshaw, Amy X. Zhang

Abstract: A recent surge in remote meetings has led to complaints of ``Zoom fatigue'' and ``collaboration overload,'' negatively impacting worker productivity and well-being. One way to alleviate the burden of meetings is to de-emphasize their synchronous participation by shifting work to and enabling sensemaking during post-meeting asynchronous activities. Towards this goal, we propose the design concept o… ▽ More A recent surge in remote meetings has led to complaints of ``Zoom fatigue'' and ``collaboration overload,'' negatively impacting worker productivity and well-being. One way to alleviate the burden of meetings is to de-emphasize their synchronous participation by shifting work to and enabling sensemaking during post-meeting asynchronous activities. Towards this goal, we propose the design concept of meeting bridges, or information artifacts that can encapsulate meeting information towards bridging to and facilitating post-meeting activities. Through 13 interviews and a survey of 198 information workers, we learn how people use online meeting information after meetings are over, finding five main uses: as an archive, as task reminders, to onboard or support inclusion, for group sensemaking, and as a launching point for follow-on collaboration. However, we also find that current common meeting artifacts, such as notes and recordings, present challenges in serving as meeting bridges. After conducting co-design sessions with 16 participants, we distill key principles for the design of meeting bridges to optimally support asynchronous collaboration goals. Overall, our findings point to the opportunity of designing information artifacts that not only support users to access but also continue to transform and engage in meeting information post-meeting. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: accepted to CSCW 2024

arXiv:2402.01864 [pdf, other]

(A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice

Authors: Inyoung Cheong, King Xia, K. J. Kevin Feng, Quan Ze Chen, Amy X. Zhang

Abstract: Large language models (LLMs) are increasingly capable of providing users with advice in a wide range of professional domains, including legal advice. However, relying on LLMs for legal queries raises concerns due to the significant expertise required and the potential real-world consequences of the advice. To explore \textit{when} and \textit{why} LLMs should or should not provide advice to users,… ▽ More Large language models (LLMs) are increasingly capable of providing users with advice in a wide range of professional domains, including legal advice. However, relying on LLMs for legal queries raises concerns due to the significant expertise required and the potential real-world consequences of the advice. To explore \textit{when} and \textit{why} LLMs should or should not provide advice to users, we conducted workshops with 20 legal experts using methods inspired by case-based reasoning. The provided realistic queries ("cases") allowed experts to examine granular, situation-specific concerns and overarching technical and legal constraints, producing a concrete set of contextual considerations for LLM developers. By synthesizing the factors that impacted LLM response appropriateness, we present a 4-dimension framework: (1) User attributes and behaviors, (2) Nature of queries, (3) AI capabilities, and (4) Social impacts. We share experts' recommendations for LLM response strategies, which center around helping users identify `right questions to ask' and relevant information rather than providing definitive legal judgments. Our findings reveal novel legal considerations, such as unauthorized practice of law, confidentiality, and liability for inaccurate advice, that have been overlooked in the literature. The case-based deliberation method enabled us to elicit fine-grained, practice-informed insights that surpass those from de-contextualized surveys or speculative principles. These findings underscore the applicability of our method for translating domain-specific professional knowledge and practices into policies that can guide LLM behavior in a more responsible direction. △ Less

Submitted 3 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 14 pages

arXiv:2401.16610 [pdf, other]

Perceptions of Moderators as a Large-Scale Measure of Online Community Governance

Authors: Galen Weld, Leon Leibmann, Amy X. Zhang, Tim Althoff

Abstract: Millions of online communities are governed by volunteer moderators, who shape their communities by setting and enforcing rules, recruiting additional moderators, and participating in the community themselves. These moderators must regularly make decisions about how to govern, yet measuring the 'success' of governance is complex and nuanced, making it challenging to determine what governance strat… ▽ More Millions of online communities are governed by volunteer moderators, who shape their communities by setting and enforcing rules, recruiting additional moderators, and participating in the community themselves. These moderators must regularly make decisions about how to govern, yet measuring the 'success' of governance is complex and nuanced, making it challenging to determine what governance strategies are most successful. Furthermore, prior work has shown that communities have differing values, suggesting that 'one-size-fits-all' approaches to governance are unlikely to serve all communities well. In this work, we assess governance practices on reddit by classifying the sentiment of community members' public discussion of their own moderators. We label 1.89 million posts and comments made on reddit over an 18 month period. We relate these perceptions to characteristics of community governance, and to different actions that community moderators take. We identify types of communities where moderators are perceived particularly positively and negatively, and highlight promising strategies for moderator teams. Amongst other findings, we show that strict rule enforcement is linked to more favorable perceptions of moderators of communities dedicated to certain topics, such as news communities, than others. We investigate what kinds of moderators are associated with improved community perceptions upon their addition to a mod team, and find that moderators who are active community members before and during their mod tenures are seen more favorably. We make all our models, datasets, and code public. △ Less

Submitted 2 October, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.14000 [pdf, other]

doi 10.1145/3613904.3642120

Mapping the Design Space of Teachable Social Media Feed Experiences

Authors: K. J. Kevin Feng, Xander Koo, Lawrence Tan, Amy Bruckman, David W. McDonald, Amy X. Zhang

Abstract: Social media feeds are deeply personal spaces that reflect individual values and preferences. However, top-down, platform-wide content algorithms can reduce users' sense of agency and fail to account for nuanced experiences and values. Drawing on the paradigm of interactive machine teaching (IMT), an interaction framework for non-expert algorithmic adaptation, we map out a design space for teachab… ▽ More Social media feeds are deeply personal spaces that reflect individual values and preferences. However, top-down, platform-wide content algorithms can reduce users' sense of agency and fail to account for nuanced experiences and values. Drawing on the paradigm of interactive machine teaching (IMT), an interaction framework for non-expert algorithmic adaptation, we map out a design space for teachable social media feed experiences to empower agential, personalized feed curation. To do so, we conducted a think-aloud study (N=24) featuring four social media platforms -- Instagram, Mastodon, TikTok, and Twitter -- to understand key signals users leveraged to determine the value of a post in their feed. We synthesized users' signals into taxonomies that, when combined with user interviews, inform five design principles that extend IMT into the social media setting. We finally embodied our principles into three feed designs that we present as sensitizing concepts for teachable feed experiences moving forward. △ Less

Submitted 29 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: CHI 2024

arXiv:2401.09051 [pdf, other]

Canvil: Designerly Adaptation for LLM-Powered User Experiences

Authors: K. J. Kevin Feng, Q. Vera Liao, Ziang Xiao, Jennifer Wortman Vaughan, Amy X. Zhang, David W. McDonald

Abstract: Advancements in large language models (LLMs) are sparking a proliferation of LLM-powered user experiences (UX). In product teams, designers often craft UX to meet user needs, but it is unclear how they engage with LLMs as a novel design material. Through a formative study with 12 designers, we find that designers seek a translational mechanism that enables design requirements to shape and be shape… ▽ More Advancements in large language models (LLMs) are sparking a proliferation of LLM-powered user experiences (UX). In product teams, designers often craft UX to meet user needs, but it is unclear how they engage with LLMs as a novel design material. Through a formative study with 12 designers, we find that designers seek a translational mechanism that enables design requirements to shape and be shaped by LLM behavior, motivating a need for designerly adaptation to facilitate this translation. We then built Canvil, a Figma widget that operationalizes designerly adaptation. We used Canvil as a technology probe in a group-based design study (6 groups, N=17), finding that designers constructively iterated on both adaptation approaches and interface designs to enhance end-user interaction with LLMs. Furthermore, designers identified promising collaborative workflows for designerly adaptation. Our work opens new avenues for processes and tools that foreground designers' user-centered expertise in LLM-powered applications. Canvil is available for public use at https://www.figma.com/community/widget/1277396720888327660. △ Less

Submitted 13 September, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

arXiv:2312.11678 [pdf, other]

Misinformation as a harm: structured approaches for fact-checking prioritization

Authors: Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, Amy X. Zhang

Abstract: In this work, we examine how fact-checkers prioritize which claims to fact-check and what tools may assist them in their efforts. Through a series of interviews with 23 professional fact-checkers from around the world, we validate that harm assessment is a central component of how fact-checkers triage their work. We also clarify the processes behind fact-checking prioritization, finding that they… ▽ More In this work, we examine how fact-checkers prioritize which claims to fact-check and what tools may assist them in their efforts. Through a series of interviews with 23 professional fact-checkers from around the world, we validate that harm assessment is a central component of how fact-checkers triage their work. We also clarify the processes behind fact-checking prioritization, finding that they are typically ad hoc, and gather suggestions for tools that could help with these processes. To address the needs articulated by fact-checkers, we present a structured framework of questions to help fact-checkers negotiate the priority of claims through assessing potential harms. Our FABLE Framework of Misinformation Harms incorporates five dimensions of magnitude -- (social) Fragmentation, Actionability, Believability, Likelihood of spread, and Exploitativeness -- that can help determine the potential urgency of a specific message or claim when considering misinformation as harm. The result is a practical and conceptual tool to support fact-checkers and others as they make strategic decisions to prioritize their efforts. We conclude with a discussion of computational approaches to support structured prioritization, as well as applications beyond fact-checking to content moderation and curation. △ Less

Submitted 18 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted to CSCW 2024, with clean up for typos and figures

arXiv:2311.10934 [pdf, other]

Case Repositories: Towards Case-Based Reasoning for AI Alignment

Authors: K. J. Kevin Feng, Quan Ze Chen, Inyoung Cheong, King Xia, Amy X. Zhang

Abstract: Case studies commonly form the pedagogical backbone in law, ethics, and many other domains that face complex and ambiguous societal questions informed by human values. Similar complexities and ambiguities arise when we consider how AI should be aligned in practice: when faced with vast quantities of diverse (and sometimes conflicting) values from different individuals and communities, with whose v… ▽ More Case studies commonly form the pedagogical backbone in law, ethics, and many other domains that face complex and ambiguous societal questions informed by human values. Similar complexities and ambiguities arise when we consider how AI should be aligned in practice: when faced with vast quantities of diverse (and sometimes conflicting) values from different individuals and communities, with whose values is AI to align, and how should AI do so? We propose a complementary approach to constitutional AI alignment, grounded in ideas from case-based reasoning (CBR), that focuses on the construction of policies through judgments on a set of cases. We present a process to assemble such a case repository by: 1) gathering a set of ``seed'' cases -- questions one may ask an AI system -- in a particular domain, 2) eliciting domain-specific key dimensions for cases through workshops with domain experts, 3) using LLMs to generate variations of cases not seen in the wild, and 4) engaging with the public to judge and improve cases. We then discuss how such a case repository could assist in AI alignment, both through directly acting as precedents to ground acceptable behaviors, and as a medium for individuals and communities to engage in moral reasoning around AI. △ Less

Submitted 26 November, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: MP2 workshop @ NeurIPS 2023

arXiv:2310.14356 [pdf, other]

Computer Vision Datasets and Models Exhibit Cultural and Linguistic Diversity in Perception

Authors: Andre Ye, Sebastin Santy, Jena D. Hwang, Amy X. Zhang, Ranjay Krishna

Abstract: Computer vision often treats human perception as homogeneous: an implicit assumption that visual stimuli are perceived similarly by everyone. This assumption is reflected in the way researchers collect datasets and train vision models. By contrast, literature in cross-cultural psychology and linguistics has provided evidence that people from different cultural backgrounds observe vastly different… ▽ More Computer vision often treats human perception as homogeneous: an implicit assumption that visual stimuli are perceived similarly by everyone. This assumption is reflected in the way researchers collect datasets and train vision models. By contrast, literature in cross-cultural psychology and linguistics has provided evidence that people from different cultural backgrounds observe vastly different concepts even when viewing the same visual stimuli. In this paper, we study how these differences manifest themselves in vision-language datasets and models, using language as a proxy for culture. By comparing textual descriptions generated across 7 languages for the same images, we find significant differences in the semantic content and linguistic expression. When datasets are multilingual as opposed to monolingual, descriptions have higher semantic coverage on average, where coverage is measured using scene graphs, model embeddings, and linguistic taxonomies. For example, multilingual descriptions have on average 29.9% more objects, 24.5% more relations, and 46.0% more attributes than a set of monolingual captions. When prompted to describe images in different languages, popular models (e.g. LLaVA) inherit this bias and describe different parts of the image. Moreover, finetuning models on captions from one language performs best on corresponding test data from that language, while finetuning on multilingual data performs consistently well across all test data compositions. Our work points towards the need to account for and embrace the diversity of human perception in the computer vision community. △ Less

Submitted 9 March, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.07581 [pdf, other]

Qlarify: Recursively Expandable Abstracts for Directed Information Retrieval over Scientific Papers

Authors: Raymond Fok, Joseph Chee Chang, Tal August, Amy X. Zhang, Daniel S. Weld

Abstract: Navigating the vast scientific literature often starts with browsing a paper's abstract. However, when a reader seeks additional information, not present in the abstract, they face a costly cognitive chasm during their dive into the full text. To bridge this gap, we introduce recursively expandable abstracts, a novel interaction paradigm that dynamically expands abstracts by progressively incorpor… ▽ More Navigating the vast scientific literature often starts with browsing a paper's abstract. However, when a reader seeks additional information, not present in the abstract, they face a costly cognitive chasm during their dive into the full text. To bridge this gap, we introduce recursively expandable abstracts, a novel interaction paradigm that dynamically expands abstracts by progressively incorporating additional information from the papers' full text. This lightweight interaction allows scholars to specify their information needs by quickly brushing over the abstract or selecting AI-suggested expandable entities. Relevant information is synthesized using a retrieval-augmented generation approach, presented as a fluid, threaded expansion of the abstract, and made efficiently verifiable via attribution to relevant source-passages in the paper. Through a series of user studies, we demonstrate the utility of recursively expandable abstracts and identify future opportunities to support low-effort and just-in-time exploration of long-form information contexts through LLM-powered interactions. △ Less

Submitted 15 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 21 pages, 10 figures, 4 tables. arXiv admin note: text overlap with arXiv:2305.14314 by other authors

arXiv:2310.07019 [pdf, other]

Case Law Grounding: Using Precedents to Align Decision-Making for Humans and AI

Authors: Quan Ze Chen, Amy X. Zhang

Abstract: Communities and groups often need to make decisions based on social norms and preferences, such as when moderating content or building AI systems that reflect human values. The prevailing approach has been to first create high-level guidelines -- ``constitutions'' -- and then decide on new cases using the outlined criteria. However, social norms and preferences vary between groups, decision-makers… ▽ More Communities and groups often need to make decisions based on social norms and preferences, such as when moderating content or building AI systems that reflect human values. The prevailing approach has been to first create high-level guidelines -- ``constitutions'' -- and then decide on new cases using the outlined criteria. However, social norms and preferences vary between groups, decision-makers can interpret guidelines inconsistently, and exceptional situations may be under-specified. In this work, we take inspiration from legal systems and introduce ``case law grounding'' (CLG), a novel workflow that uses past cases and decisions (\textbf{precedents}) to help ground future decisions, for both human and LLM-based decision-makers. We evaluate CLG against a constitution-only approach on two tasks for both types of decision-makers, and find that decisions produced with CLG were more accurately aligned to observed ground truth in all cases, producing a 3.3--23.3 \%-points improvement (across different tasks and groups) for humans and 9.2--30.0 \%-points (across different tasks and groups) for LLM agents. We also discuss other aspects where a case-based approach could augment existing ``constitutional'' approaches when it comes to aligning human and AI decisions. △ Less

Submitted 6 September, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.04329 [pdf, other]

Pika: Empowering Non-Programmers to Author Executable Governance Policies in Online Communities

Authors: Leijie Wang, Nicolas Vincent, Julija Rukanskaitė, Amy X. Zhang

Abstract: Internet users have formed a wide array of online communities with nuanced and diverse community goals and norms. However, most online platforms only offer a limited set of governance models in their software infrastructure and leave little room for customization. Consequently, technical proficiency becomes a prerequisite for online communities to build governance policies in code, excluding non-p… ▽ More Internet users have formed a wide array of online communities with nuanced and diverse community goals and norms. However, most online platforms only offer a limited set of governance models in their software infrastructure and leave little room for customization. Consequently, technical proficiency becomes a prerequisite for online communities to build governance policies in code, excluding non-programmers from participation in designing community governance. In this paper, we present Pika, a system that empowers non-programmers to author a wide range of executable governance policies. At its core, Pika incorporates a declarative language that decomposes governance policies into modular components, thereby facilitating expressive policy authoring through a user-friendly, form-based web interface. Our user studies with 17 participants show that Pika can empower non-programmers to author governance policies approximately 2.5 times faster than programmers who author in code. We also provide insights about Pika's expressivity in supporting diverse policies that online communities want. △ Less

Submitted 27 February, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: Conditionally accepted by CHI'2024

arXiv:2308.15224 [pdf, other]

doi 10.1145/3586183.3606770

Papeos: Augmenting Research Papers with Talk Videos

Authors: Tae Soo Kim, Matt Latzke, Jonathan Bragg, Amy X. Zhang, Joseph Chee Chang

Abstract: Research consumption has been traditionally limited to the reading of academic papers-a static, dense, and formally written format. Alternatively, pre-recorded conference presentation videos, which are more dynamic, concise, and colloquial, have recently become more widely available but potentially under-utilized. In this work, we explore the design space and benefits for combining academic papers… ▽ More Research consumption has been traditionally limited to the reading of academic papers-a static, dense, and formally written format. Alternatively, pre-recorded conference presentation videos, which are more dynamic, concise, and colloquial, have recently become more widely available but potentially under-utilized. In this work, we explore the design space and benefits for combining academic papers and talk videos to leverage their complementary nature to provide a rich and fluid research consumption experience. Based on formative and co-design studies, we present Papeos, a novel reading and authoring interface that allow authors to augment their papers by segmenting and localizing talk videos alongside relevant paper passages with automatically generated suggestions. With Papeos, readers can visually skim a paper through clip thumbnails, and fluidly switch between consuming dense text in the paper or visual summaries in the video. In a comparative lab study (n=16), Papeos reduced mental load, scaffolded navigation, and facilitated more comprehensive reading of papers. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: Accepted to UIST 2023

arXiv:2306.10478 [pdf, other]

"Is Reporting Worth the Sacrifice of Revealing What I Have Sent?": Privacy Considerations When Reporting on End-to-End Encrypted Platforms

Authors: Leijie Wang, Ruotong Wang, Sterling Williams-Ceci, Sanketh Menda, Amy X. Zhang

Abstract: User reporting is an essential component of content moderation on many online platforms -- in particular, on end-to-end encrypted (E2EE) messaging platforms where platform operators cannot proactively inspect message contents. However, users' privacy concerns when considering reporting may impede the effectiveness of this strategy in regulating online harassment. In this paper, we conduct intervie… ▽ More User reporting is an essential component of content moderation on many online platforms -- in particular, on end-to-end encrypted (E2EE) messaging platforms where platform operators cannot proactively inspect message contents. However, users' privacy concerns when considering reporting may impede the effectiveness of this strategy in regulating online harassment. In this paper, we conduct interviews with 16 users of E2EE platforms to understand users' mental models of how reporting works and their resultant privacy concerns and considerations surrounding reporting. We find that users expect platforms to store rich longitudinal reporting datasets, recognizing both their promise for better abuse mitigation and the privacy risk that platforms may exploit or fail to protect them. We also find that users have preconceptions about the respective capabilities and risks of moderators at the platform versus community level -- for instance, users trust platform moderators more to not abuse their power but think community moderators have more time to attend to reports. These considerations, along with perceived effectiveness of reporting and how to provide sufficient evidence while maintaining privacy, shape how users decide whether, to whom, and how much to report. We conclude with design implications for a more privacy-preserving reporting system on E2EE messaging platforms. △ Less

Submitted 18 June, 2023; originally announced June 2023.

Comments: accepted to SOUPS 2023

arXiv:2305.09072 [pdf, other]

doi 10.1145/3593013.3594114

Skin Deep: Investigating Subjectivity in Skin Tone Annotations for Computer Vision Benchmark Datasets

Authors: Teanna Barrett, Quan Ze Chen, Amy X. Zhang

Abstract: To investigate the well-observed racial disparities in computer vision systems that analyze images of humans, researchers have turned to skin tone as more objective annotation than race metadata for fairness performance evaluations. However, the current state of skin tone annotation procedures is highly varied. For instance, researchers use a range of untested scales and skin tone categories, have… ▽ More To investigate the well-observed racial disparities in computer vision systems that analyze images of humans, researchers have turned to skin tone as more objective annotation than race metadata for fairness performance evaluations. However, the current state of skin tone annotation procedures is highly varied. For instance, researchers use a range of untested scales and skin tone categories, have unclear annotation procedures, and provide inadequate analyses of uncertainty. In addition, little attention is paid to the positionality of the humans involved in the annotation process--both designers and annotators alike--and the historical and sociological context of skin tone in the United States. Our work is the first to investigate the skin tone annotation process as a sociotechnical project. We surveyed recent skin tone annotation procedures and conducted annotation experiments to examine how subjective understandings of skin tone are embedded in skin tone annotation procedures. Our systematic literature review revealed the uninterrogated association between skin tone and race and the limited effort to analyze annotator uncertainty in current procedures for skin tone annotation in computer vision evaluation. Our experiments demonstrated that design decisions in the annotation procedure such as the order in which the skin tone scale is presented or additional context in the image (i.e., presence of a face) significantly affected the resulting inter-annotator agreement and individual uncertainty of skin tone annotations. We call for greater reflexivity in the design, analysis, and documentation of procedures for evaluation using skin tone. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: To appear in FAcct '23

arXiv:2305.01615 [pdf, other]

doi 10.1145/3610074

Judgment Sieve: Reducing Uncertainty in Group Judgments through Interventions Targeting Ambiguity versus Disagreement

Authors: Quan Ze Chen, Amy X. Zhang

Abstract: When groups of people are tasked with making a judgment, the issue of uncertainty often arises. Existing methods to reduce uncertainty typically focus on iteratively improving specificity in the overall task instruction. However, uncertainty can arise from multiple sources, such as ambiguity of the item being judged due to limited context, or disagreements among the participants due to different p… ▽ More When groups of people are tasked with making a judgment, the issue of uncertainty often arises. Existing methods to reduce uncertainty typically focus on iteratively improving specificity in the overall task instruction. However, uncertainty can arise from multiple sources, such as ambiguity of the item being judged due to limited context, or disagreements among the participants due to different perspectives and an under-specified task. A one-size-fits-all intervention may be ineffective if it is not targeted to the right source of uncertainty. In this paper we introduce a new workflow, Judgment Sieve, to reduce uncertainty in tasks involving group judgment in a targeted manner. By utilizing measurements that separate different sources of uncertainty during an initial round of judgment elicitation, we can then select a targeted intervention adding context or deliberation to most effectively reduce uncertainty on each item being judged. We test our approach on two tasks: rating word pair similarity and toxicity of online comments, showing that targeted interventions reduced uncertainty for the most uncertain cases. In the top 10% of cases, we saw an ambiguity reduction of 21.4% and 25.7%, and a disagreement reduction of 22.2% and 11.2% for the two tasks respectively. We also found through a simulation that our targeted approach reduced the average uncertainty scores for both sources of uncertainty as opposed to uniform approaches where reductions in average uncertainty from one source came with an increase for the other. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2303.14334 [pdf, other]

The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

Authors: Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney , et al. (30 additional authors not shown)

Abstract: Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has chan… ▽ More Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has changed little in decades. The PDF format for sharing research papers is widely used due to its portability, but it has significant downsides including: static content, poor accessibility for low-vision readers, and difficulty reading on mobile devices. This paper explores the question "Can recent advances in AI and HCI power intelligent, interactive, and accessible reading interfaces -- even for legacy PDFs?" We describe the Semantic Reader Project, a collaborative effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers. Through this project, we've developed ten research prototype interfaces and conducted usability studies with more than 300 participants and real-world users showing improved reading experiences for scholars. We've also released a production reading interface for research papers that will incorporate the best features as they mature. We structure this paper around challenges scholars and the public face when reading research papers -- Discovery, Efficiency, Comprehension, Synthesis, and Accessibility -- and present an overview of our progress and remaining open challenges. △ Less

Submitted 23 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.12118 [pdf, other]

Examining the Impact of Provenance-Enabled Media on Trust and Accuracy Perceptions

Authors: K. J. Kevin Feng, Nick Ritchie, Pia Blumenthal, Andy Parsons, Amy X. Zhang

Abstract: In recent years, industry leaders and researchers have proposed to use technical provenance standards to address visual misinformation spread through digitally altered media. By adding immutable and secure provenance information such as authorship and edit date to media metadata, social media users could potentially better assess the validity of the media they encounter. However, it is unclear how… ▽ More In recent years, industry leaders and researchers have proposed to use technical provenance standards to address visual misinformation spread through digitally altered media. By adding immutable and secure provenance information such as authorship and edit date to media metadata, social media users could potentially better assess the validity of the media they encounter. However, it is unclear how end users would respond to provenance information, or how to best design provenance indicators to be understandable to laypeople. We conducted an online experiment with 595 participants from the US and UK to investigate how provenance information altered users' accuracy perceptions and trust in visual content shared on social media. We found that provenance information often lowered trust and caused users to doubt deceptive media, particularly when it revealed that the media was composited. We additionally tested conditions where the provenance information itself was shown to be incomplete or invalid, and found that these states have a significant impact on participants' accuracy perceptions and trust in media, leading them, in some cases, to disbelieve honest media. Our findings show that provenance, although enlightening, is still not a concept well-understood by users, who confuse media credibility with the orthogonal (albeit related) concept of provenance credibility. We discuss how design choices may contribute to provenance (mis)understanding, and conclude with implications for usable provenance systems, including clearer interfaces and user education. △ Less

Submitted 10 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: Accepted to CSCW 2023

arXiv:2302.11845 [pdf, other]

doi 10.1145/3544548.3581273

Understanding Collaborative Practices and Tools of Professional UX Practitioners in Software Organizations

Authors: K. J. Kevin Feng, Tony W. Li, Amy X. Zhang

Abstract: User experience (UX) has undergone a revolution in collaborative practices, due to tools that enable quick feedback and continuous collaboration with a varied team across a design's lifecycle. However, it is unclear how this shift in collaboration has been received in professional UX practice, and whether new pain points have arisen. To this end, we conducted a survey (N=114) with UX practitioners… ▽ More User experience (UX) has undergone a revolution in collaborative practices, due to tools that enable quick feedback and continuous collaboration with a varied team across a design's lifecycle. However, it is unclear how this shift in collaboration has been received in professional UX practice, and whether new pain points have arisen. To this end, we conducted a survey (N=114) with UX practitioners at software organizations based in the U.S. to better understand their collaborative practices and tools used throughout the design process. We found that while an increase in collaborative activity enhanced many aspects of UX work, some long-standing challenges -- such as handing off designs to developers -- still persist. Moreover, we observed new challenges emerging from activities enabled by collaborative tools such as design system management. Based on our findings, we discuss how UX practices can improve collaboration moving forward and provide concrete design implications for collaborative UX tools. △ Less

Submitted 26 February, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.07302 [pdf, other]

doi 10.1145/3544548.3580847

CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context

Authors: Joseph Chee Chang, Amy X. Zhang, Jonathan Bragg, Andrew Head, Kyle Lo, Doug Downey, Daniel S. Weld

Abstract: When reading a scholarly article, inline citations help researchers contextualize the current article and discover relevant prior work. However, it can be challenging to prioritize and make sense of the hundreds of citations encountered during literature reviews. This paper introduces CiteSee, a paper reading tool that leverages a user's publishing, reading, and saving activities to provide person… ▽ More When reading a scholarly article, inline citations help researchers contextualize the current article and discover relevant prior work. However, it can be challenging to prioritize and make sense of the hundreds of citations encountered during literature reviews. This paper introduces CiteSee, a paper reading tool that leverages a user's publishing, reading, and saving activities to provide personalized visual augmentations and context around citations. First, CiteSee connects the current paper to familiar contexts by surfacing known citations a user had cited or opened. Second, CiteSee helps users prioritize their exploration by highlighting relevant but unknown citations based on saving and reading history. We conducted a lab study that suggests CiteSee is significantly more effective for paper discovery than three baselines. A field deployment study shows CiteSee helps participants keep track of their explorations and leads to better situational awareness and increased paper discovery via inline citation when conducting real-world literature reviews. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2302.06754 [pdf, other]

doi 10.1145/3544548.3580841

Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections

Authors: Srishti Palani, Aakanksha Naik, Doug Downey, Amy X. Zhang, Jonathan Bragg, Joseph Chee Chang

Abstract: Scholars who want to research a scientific topic must take time to read, extract meaning, and identify connections across many papers. As scientific literature grows, this becomes increasingly challenging. Meanwhile, authors summarize prior research in papers' related work sections, though this is scoped to support a single paper. A formative study found that while reading multiple related work pa… ▽ More Scholars who want to research a scientific topic must take time to read, extract meaning, and identify connections across many papers. As scientific literature grows, this becomes increasingly challenging. Meanwhile, authors summarize prior research in papers' related work sections, though this is scoped to support a single paper. A formative study found that while reading multiple related work paragraphs helps overview a topic, it is hard to navigate overlapping and diverging references and research foci. In this work, we design a system, Relatedly, that scaffolds exploring and reading multiple related work paragraphs on a topic, with features including dynamic re-ranking and highlighting to spotlight unexplored dissimilar information, auto-generated descriptive paragraph headings, and low-lighting of redundant information. From a within-subjects user study (n=15), we found that scholars generate more coherent, insightful, and comprehensive topic outlines using Relatedly compared to a baseline paper list. △ Less

Submitted 13 February, 2023; originally announced February 2023.

arXiv:2202.06393 [pdf, other]

doi 10.1145/3512929

Comparing the Perceived Legitimacy of Content Moderation Processes: Contractors, Algorithms, Expert Panels, and Digital Juries

Authors: Christina A. Pan, Sahil Yakhmi, Tara P. Iyer, Evan Strasnick, Amy X. Zhang, Michael S. Bernstein

Abstract: While research continues to investigate and improve the accuracy, fairness, and normative appropriateness of content moderation processes on large social media platforms, even the best process cannot be effective if users reject its authority as illegitimate. We present a survey experiment comparing the perceived institutional legitimacy of four popular content moderation processes. We conducted a… ▽ More While research continues to investigate and improve the accuracy, fairness, and normative appropriateness of content moderation processes on large social media platforms, even the best process cannot be effective if users reject its authority as illegitimate. We present a survey experiment comparing the perceived institutional legitimacy of four popular content moderation processes. We conducted a within-subjects experiment in which we showed US Facebook users moderation decisions and randomized the description of whether those decisions were made by paid contractors, algorithms, expert panels, or juries of users. Prior work suggests that juries will have the highest perceived legitimacy due to the benefits of judicial independence and democratic representation. However, expert panels had greater perceived legitimacy than algorithms or juries. Moreover, outcome alignment - agreement with the decision - played a larger role than process in determining perceived legitimacy. These results suggest benefits to incorporating expert oversight in content moderation and underscore that any process will face legitimacy challenges derived from disagreement about outcomes. △ Less

Submitted 6 October, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

Comments: This paper will appear at CSCW 2022

arXiv:2111.05835 [pdf, other]

What Makes Online Communities 'Better'? Measuring Values, Consensus, and Conflict across Thousands of Subreddits

Authors: Galen Weld, Amy X. Zhang, Tim Althoff

Abstract: Making online social communities 'better' is a challenging undertaking, as online communities are extraordinarily varied in their size, topical focus, and governance. As such, what is valued by one community may not be valued by another. However, community values are challenging to measure as they are rarely explicitly stated. In this work, we measure community values through the first large-scale… ▽ More Making online social communities 'better' is a challenging undertaking, as online communities are extraordinarily varied in their size, topical focus, and governance. As such, what is valued by one community may not be valued by another. However, community values are challenging to measure as they are rarely explicitly stated. In this work, we measure community values through the first large-scale survey of community values, including 2,769 reddit users in 2,151 unique subreddits. Through a combination of survey responses and a quantitative analysis of public reddit data, we characterize how these values vary within and across communities. Amongst other findings, we show that community members disagree about how safe their communities are, that longstanding communities place 30.1% more importance on trustworthiness than newer communities, and that community moderators want their communities to be 56.7% less democratic than non-moderator community members. These findings have important implications, including suggesting that care must be taken to protect vulnerable community members, and that participatory governance strategies may be difficult to implement. Accurate and scalable modeling of community values enables research and governance which is tuned to each community's different values. To this end, we demonstrate that a small number of automatically quantifiable features capture a significant yet limited amount of the variation in values between communities with a ROC AUC of 0.667 on a binary classification task. However, substantial variation remains, and modeling community values remains an important topic for future work. We make our models and data public to inform community design and governance. △ Less

Submitted 9 May, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

Comments: 12 pages, 8 figures, 4 tables; to appear at ICWSM 2022

arXiv:2109.05152 [pdf, other]

Making Online Communities 'Better': A Taxonomy of Community Values on Reddit

Authors: Galen Weld, Amy X. Zhang, Tim Althoff

Abstract: Many researchers studying online communities seek to make them better. However, beyond a small set of widely-held values, such as combating misinformation and abuse, determining what 'better' means can be challenging, as community members may disagree, values may be in conflict, and different communities may have differing preferences as a whole. In this work, we present the first study that elici… ▽ More Many researchers studying online communities seek to make them better. However, beyond a small set of widely-held values, such as combating misinformation and abuse, determining what 'better' means can be challenging, as community members may disagree, values may be in conflict, and different communities may have differing preferences as a whole. In this work, we present the first study that elicits values directly from members across a diverse set of communities. We survey 212 members of 627 unique subreddits and ask them to describe their values for their communities in their own words. Through iterative categorization of 1,481 responses, we develop and validate a comprehensive taxonomy of community values, consisting of 29 subcategories within nine top-level categories, enabling principled, quantitative study of community values by researchers. Using our taxonomy, we reframe existing research problems, such as managing influxes of new members, as tensions between different values, and we identify understudied values, such as those regarding content quality and community size. We call for greater attention to vulnerable community members' values, and we make our codebook public for use in future research. △ Less

Submitted 20 September, 2023; v1 submitted 10 September, 2021; originally announced September 2021.

Comments: to appear at ICWSM 2024

arXiv:2108.01799 [pdf, other]

doi 10.1145/3476076

Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty

Authors: Quanze Chen, Daniel S. Weld, Amy X. Zhang

Abstract: Human ratings have become a crucial resource for training and evaluating machine learning systems. However, traditional elicitation methods for absolute and comparative rating suffer from issues with consistency and often do not distinguish between uncertainty due to disagreement between annotators and ambiguity inherent to the item being rated. In this work, we present Goldilocks, a novel crowd r… ▽ More Human ratings have become a crucial resource for training and evaluating machine learning systems. However, traditional elicitation methods for absolute and comparative rating suffer from issues with consistency and often do not distinguish between uncertainty due to disagreement between annotators and ambiguity inherent to the item being rated. In this work, we present Goldilocks, a novel crowd rating elicitation technique for collecting calibrated scalar annotations that also distinguishes inherent ambiguity from inter-annotator disagreement. We introduce two main ideas: grounding absolute rating scales with examples and using a two-step bounding process to establish a range for an item's placement. We test our designs in three domains: judging toxicity of online comments, estimating satiety of food depicted in images, and estimating age based on portraits. We show that (1) Goldilocks can improve consistency in domains where interpretation of the scale is not universal, and that (2) representing items with ranges lets us simultaneously capture different sources of uncertainty leading to better estimates of pairwise relationship distributions. △ Less

Submitted 3 August, 2021; originally announced August 2021.

Comments: CSCW '21

arXiv:2101.11824 [pdf, other]

doi 10.1145/3449092

Exploring Lightweight Interventions at Posting Time to Reduce the Sharing of Misinformation on Social Media

Authors: Farnaz Jahanbakhsh, Amy X. Zhang, Adam J. Berinsky, Gordon Pennycook, David G. Rand, David R. Karger

Abstract: When users on social media share content without considering its veracity, they may unwittingly be spreading misinformation. In this work, we investigate the design of lightweight interventions that nudge users to assess the accuracy of information as they share it. Such assessment may deter users from posting misinformation in the first place, and their assessments may also provide useful guidanc… ▽ More When users on social media share content without considering its veracity, they may unwittingly be spreading misinformation. In this work, we investigate the design of lightweight interventions that nudge users to assess the accuracy of information as they share it. Such assessment may deter users from posting misinformation in the first place, and their assessments may also provide useful guidance to friends aiming to assess those posts themselves. In support of lightweight assessment, we first develop a taxonomy of the reasons why people believe a news claim is or is not true; this taxonomy yields a checklist that can be used at posting time. We conduct evaluations to demonstrate that the checklist is an accurate and comprehensive encapsulation of people's free-response rationales. In a second experiment, we study the effects of three behavioral nudges -- 1) checkboxes indicating whether headings are accurate, 2) tagging reasons (from our taxonomy) that a post is accurate via a checklist and 3) providing free-text rationales for why a headline is or is not accurate -- on people's intention of sharing the headline on social media. From an experiment with 1668 participants, we find that both providing accuracy assessment and rationale reduce the sharing of false content. They also reduce the sharing of true content, but to a lesser degree that yields an overall decrease in the fraction of shared content that is false. Our findings have implications for designing social media and news sharing platforms that draw from richer signals of content credibility contributed by users. In addition, our validated taxonomy can be used by platforms and researchers as a way to gather rationales in an easier fashion than free-response. △ Less

Submitted 23 May, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

Comments: In CSCW'21

ACM Class: H.5.3; J.4

Journal ref: Proc. ACM Hum.-Comput. Interact., Vol. 5, No. CSCW1, Article 18. Publication date: April 2021

arXiv:2011.14238 [pdf, other]

doi 10.1080/10618600.2024.2404711

Approximate Cross-validated Mean Estimates for Bayesian Hierarchical Regression Models

Authors: Amy X. Zhang, Le Bao, Changcheng Li, Michael J. Daniels

Abstract: We introduce a novel procedure for obtaining cross-validated predictive estimates for Bayesian hierarchical regression models (BHRMs). Bayesian hierarchical models are popular for their ability to model complex dependence structures and provide probabilistic uncertainty estimates, but can be computationally expensive to run. Cross-validation (CV) is therefore not a common practice to evaluate the… ▽ More We introduce a novel procedure for obtaining cross-validated predictive estimates for Bayesian hierarchical regression models (BHRMs). Bayesian hierarchical models are popular for their ability to model complex dependence structures and provide probabilistic uncertainty estimates, but can be computationally expensive to run. Cross-validation (CV) is therefore not a common practice to evaluate the predictive performance of BHRMs. Our method circumvents the need to re-run computationally costly estimation methods for each cross-validation fold and makes CV more feasible for large BHRMs. By conditioning on the variance-covariance parameters, we shift the CV problem from probability-based sampling to a simple and familiar optimization problem. In many cases, this produces estimates which are equivalent to full CV. We provide theoretical results and demonstrate its efficacy on publicly available data and in simulations. △ Less

Submitted 27 September, 2024; v1 submitted 28 November, 2020; originally announced November 2020.

Comments: 25 pages, 2 figures

Journal ref: Journal of Computational and Graphical Statistics (2024) 1-17

arXiv:2009.09053 [pdf, other]

doi 10.1145/3432912

CommunityClick: Capturing and Reporting Community Feedback from Town Halls to Improve Inclusivity

Authors: Mahmood Jasim, Pooya Khaloo, Somin Wadhwa, Amy X. Zhang, Ali Sarvghad, Narges Mahyar

Abstract: Local governments still depend on traditional town halls for community consultation, despite problems such as a lack of inclusive participation for attendees and difficulty for civic organizers to capture attendees' feedback in reports. Building on a formative study with 66 town hall attendees and 20 organizers, we designed and developed CommunityClick, a communitysourcing system that captures att… ▽ More Local governments still depend on traditional town halls for community consultation, despite problems such as a lack of inclusive participation for attendees and difficulty for civic organizers to capture attendees' feedback in reports. Building on a formative study with 66 town hall attendees and 20 organizers, we designed and developed CommunityClick, a communitysourcing system that captures attendees' feedback in an inclusive manner and enables organizers to author more comprehensive reports. During the meeting, in addition to recording meeting audio to capture vocal attendees' feedback, we modify iClickers to give voice to reticent attendees by allowing them to provide real-time feedback beyond a binary signal. This information then automatically feeds into a meeting transcript augmented with attendees' feedback and organizers' tags. The augmented transcript along with a feedback-weighted summary of the transcript generated from text analysis methods is incorporated into an interactive authoring tool for organizers to write reports. From a field experiment at a town hall meeting, we demonstrate how CommunityClick can improve inclusivity by providing multiple avenues for attendees to share opinions. Additionally, interviews with eight expert organizers demonstrate CommunityClick's utility in creating more comprehensive and accurate reports to inform critical civic decision-making. We discuss the possibility of integrating CommunityClick with town hall meetings in the future as well as expanding to other domains. △ Less

Submitted 18 September, 2020; originally announced September 2020.

Comments: 32 pages, 5 figures, 4 tables, to be published in ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW)

Report number: 213

Journal ref: Proceedings of the ACM on Human-Computer Interaction, January 2021

arXiv:2009.07446 [pdf, other]

doi 10.1145/3432940

A System for Interleaving Discussion and Summarization in Online Collaboration

Authors: Sunny Tian, Amy X. Zhang, David Karger

Abstract: In many instances of online collaboration, ideation and deliberation about what to write happen separately from the synthesis of the deliberation into a cohesive document. However, this may result in a final document that has little connection to the discussion that came before. In this work, we present interleaved discussion and summarization, a process where discussion and summarization are wove… ▽ More In many instances of online collaboration, ideation and deliberation about what to write happen separately from the synthesis of the deliberation into a cohesive document. However, this may result in a final document that has little connection to the discussion that came before. In this work, we present interleaved discussion and summarization, a process where discussion and summarization are woven together in a single space, and collaborators can switch back and forth between discussing ideas and summarizing discussion until it results in a final document that incorporates and references all discussion points. We implement this process into a tool called Wikum+ that allows groups working together on a project to create living summaries-artifacts that can grow as new collaborators, ideas, and feedback arise and shrink as collaborators come to consensus. We conducted studies where groups of six people each collaboratively wrote a proposal using Wikum+ and a proposal using a messaging platform along with Google Docs. We found that Wikum+'s integration of discussion and summarization helped users be more organized, allowing for light-weight coordination and iterative improvements throughout the collaboration process. A second study demonstrated that in larger groups, Wikum+ is more inclusive of all participants and more comprehensive in the final document compared to traditional tools. △ Less

Submitted 31 October, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

arXiv:2008.09533 [pdf, other]

doi 10.1145/3415164

Investigating Differences in Crowdsourced News Credibility Assessment: Raters, Tasks, and Expert Criteria

Authors: Md Momen Bhuiyan, Amy X. Zhang, Connie Moon Sehat, Tanushree Mitra

Abstract: Misinformation about critical issues such as climate change and vaccine safety is oftentimes amplified on online social and search platforms. The crowdsourcing of content credibility assessment by laypeople has been proposed as one strategy to combat misinformation by attempting to replicate the assessments of experts at scale. In this work, we investigate news credibility assessments by crowds ve… ▽ More Misinformation about critical issues such as climate change and vaccine safety is oftentimes amplified on online social and search platforms. The crowdsourcing of content credibility assessment by laypeople has been proposed as one strategy to combat misinformation by attempting to replicate the assessments of experts at scale. In this work, we investigate news credibility assessments by crowds versus experts to understand when and how ratings between them differ. We gather a dataset of over 4,000 credibility assessments taken from 2 crowd groups---journalism students and Upwork workers---as well as 2 expert groups---journalists and scientists---on a varied set of 50 news articles related to climate science, a topic with widespread disconnect between public opinion and expert consensus. Examining the ratings, we find differences in performance due to the makeup of the crowd, such as rater demographics and political leaning, as well as the scope of the tasks that the crowd is assigned to rate, such as the genre of the article and partisanship of the publication. Finally, we find differences between expert assessments due to differing expert criteria that journalism versus science experts use---differences that may contribute to crowd discrepancies, but that also suggest a way to reduce the gap by designing crowd tasks tailored to specific expert criteria. From these findings, we outline future research directions to better design crowd processes that are tailored to specific crowds and types of content. △ Less

Submitted 21 August, 2020; originally announced August 2020.

arXiv:2008.04236 [pdf, other]

doi 10.1145/3379337.3415858

PolicyKit: Building Governance in Online Communities

Authors: Amy X. Zhang, Grant Hugh, Michael S. Bernstein

Abstract: The software behind online community platforms encodes a governance model that represents a strikingly narrow set of governance possibilities focused on moderators and administrators. When online communities desire other forms of government, such as ones that take many members' opinions into account or that distribute power in non-trivial ways, communities must resort to laborious manual effort. I… ▽ More The software behind online community platforms encodes a governance model that represents a strikingly narrow set of governance possibilities focused on moderators and administrators. When online communities desire other forms of government, such as ones that take many members' opinions into account or that distribute power in non-trivial ways, communities must resort to laborious manual effort. In this paper, we present PolicyKit, a software infrastructure that empowers online community members to concisely author a wide range of governance procedures and automatically carry out those procedures on their home platforms. We draw on political science theory to encode community governance into policies, or short imperative functions that specify a procedure for determining whether a user-initiated action can execute. Actions that can be governed by policies encompass everyday activities such as posting or moderating a message, but actions can also encompass changes to the policies themselves, enabling the evolution of governance over time. We demonstrate the expressivity of PolicyKit through implementations of governance models such as a random jury deliberation, a multi-stage caucus, a reputation system, and a promotion procedure inspired by Wikipedia's Request for Adminship (RfA) process. △ Less

Submitted 17 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

Comments: to be published in ACM UIST 2020

ACM Class: H.5.3

arXiv:2005.13701 [pdf, other]

doi 10.1145/3449090

Modular Politics: Toward a Governance Layer for Online Communities

Authors: Nathan Schneider, Primavera De Filippi, Seth Frey, Joshua Z. Tan, Amy X. Zhang

Abstract: Governance in online communities is an increasingly high-stakes challenge, and yet many basic features of offline governance legacies--juries, political parties, term limits, and formal debates, to name a few--are not in the feature-sets of the software most community platforms use. Drawing on the paradigm of Institutional Analysis and Development, this paper proposes a strategy for addressing thi… ▽ More Governance in online communities is an increasingly high-stakes challenge, and yet many basic features of offline governance legacies--juries, political parties, term limits, and formal debates, to name a few--are not in the feature-sets of the software most community platforms use. Drawing on the paradigm of Institutional Analysis and Development, this paper proposes a strategy for addressing this lapse by specifying basic features of a generalizable paradigm for online governance called Modular Politics. Whereas classical governance typologies tend to present a choice among wholesale ideologies, such as democracy or oligarchy, Modular Politics would enable platform operators and their users to build bottom-up governance processes from computational components that are modular and composable, highly versatile in their expressiveness, portable from one context to another, and interoperable across platforms. This kind of approach could implement pre-digital governance systems as well as accelerate innovation in uniquely digital techniques. As diverse communities share and connect their components and data, governance could occur through a ubiquitous network layer. To that end, this paper proposes the development of an open standard for networked governance. △ Less

Submitted 12 March, 2021; v1 submitted 27 May, 2020; originally announced May 2020.

Comments: In CSCW '21

Journal ref: Proc. ACM Hum.-Comput. Interact., Vol. 5, No. CSCW1, Article 16. Publication date: April 2021

arXiv:2001.06684 [pdf, other]

How do Data Science Workers Collaborate? Roles, Workflows, and Tools

Authors: Amy X. Zhang, Michael Muller, Dakuo Wang

Abstract: Today, the prominence of data science within organizations has given rise to teams of data science workers collaborating on extracting insights from data, as opposed to individual data scientists working alone. However, we still lack a deep understanding of how data science workers collaborate in practice. In this work, we conducted an online survey with 183 participants who work in various aspect… ▽ More Today, the prominence of data science within organizations has given rise to teams of data science workers collaborating on extracting insights from data, as opposed to individual data scientists working alone. However, we still lack a deep understanding of how data science workers collaborate in practice. In this work, we conducted an online survey with 183 participants who work in various aspects of data science. We focused on their reported interactions with each other (e.g., managers with engineers) and with different tools (e.g., Jupyter Notebook). We found that data science teams are extremely collaborative and work with a variety of stakeholders and tools during the six common steps of a data science workflow (e.g., clean data and train model). We also found that the collaborative practices workers employ, such as documentation, vary according to the kinds of tools they use. Based on these findings, we discuss design implications for supporting data science team collaborations and future research directions. △ Less

Submitted 16 April, 2020; v1 submitted 18 January, 2020; originally announced January 2020.

Comments: CSCW'2020

arXiv:1409.8152 [pdf, other]

Controversy and Sentiment in Online News

Authors: Yelena Mejova, Amy X. Zhang, Nicholas Diakopoulos, Carlos Castillo

Abstract: How do news sources tackle controversial issues? In this work, we take a data-driven approach to understand how controversy interplays with emotional expression and biased language in the news. We begin by introducing a new dataset of controversial and non-controversial terms collected using crowdsourcing. Then, focusing on 15 major U.S. news outlets, we compare millions of articles discussing con… ▽ More How do news sources tackle controversial issues? In this work, we take a data-driven approach to understand how controversy interplays with emotional expression and biased language in the news. We begin by introducing a new dataset of controversial and non-controversial terms collected using crowdsourcing. Then, focusing on 15 major U.S. news outlets, we compare millions of articles discussing controversial and non-controversial issues over a span of 7 months. We find that in general, when it comes to controversial issues, the use of negative affect and biased language is prevalent, while the use of strong emotion is tempered. We also observe many differences across news sources. Using these findings, we show that we can indicate to what extent an issue is controversial, by comparing it with other issues in terms of how they are portrayed across different media. △ Less

Submitted 29 September, 2014; originally announced September 2014.

Comments: Computation+Journalism Symposium 2014

arXiv:1308.3657 [pdf, ps, other]

doi 10.1109/SocialCom.2013.17

Hoodsquare: Modeling and Recommending Neighborhoods in Location-based Social Networks

Authors: Amy X. Zhang, Anastasios Noulas, Salvatore Scellato, Cecilia Mascolo

Abstract: Information garnered from activity on location-based social networks can be harnessed to characterize urban spaces and organize them into neighborhoods. In this work, we adopt a data-driven approach to the identification and modeling of urban neighborhoods using location-based social networks. We represent geographic points in the city using spatio-temporal information about Foursquare user check-… ▽ More Information garnered from activity on location-based social networks can be harnessed to characterize urban spaces and organize them into neighborhoods. In this work, we adopt a data-driven approach to the identification and modeling of urban neighborhoods using location-based social networks. We represent geographic points in the city using spatio-temporal information about Foursquare user check-ins and semantic information about places, with the goal of developing features to input into a novel neighborhood detection algorithm. The algorithm first employs a similarity metric that assesses the homogeneity of a geographic area, and then with a simple mechanism of geographic navigation, it detects the boundaries of a city's neighborhoods. The models and algorithms devised are subsequently integrated into a publicly available, map-based tool named Hoodsquare that allows users to explore activities and neighborhoods in cities around the world. Finally, we evaluate Hoodsquare in the context of a recommendation application where user profiles are matched to urban neighborhoods. By comparing with a number of baselines, we demonstrate how Hoodsquare can be used to accurately predict the home neighborhood of Twitter users. We also show that we are able to suggest neighborhoods geographically constrained in size, a desirable property in mobile recommendation scenarios for which geographical precision is key. △ Less

Submitted 16 August, 2013; originally announced August 2013.

Comments: ASE/IEEE SocialCom 2013

Showing 1–43 of 43 results for author: Zhang, A X