Towards understanding sycophancy in language models

M Sharma, M Tong, T Korbak, D Duvenaud…�- arXiv preprint arXiv�…, 2023 - arxiv.org
… in preference data is responsible for sycophancy in AI assistants, we then analyze whether
sycophancy increases when optimizing language model responses using preference models

Towards Analyzing and Mitigating Sycophancy in Large Vision-Language Models

Y Zhao, R Zhang, J Xiao, C Ke, R Hou, Y Hao…�- arXiv preprint arXiv�…, 2024 - arxiv.org
sycophantic behavior varies significantly among models, our analysis reveals the severe
deficiency of all LVLMs in resilience of sycophancy … (LQCD), a model-agnostic method focusing …

[PDF][PDF] Deliberation in the Age of Deception: Measuring Sycophancy in Large Language Models

M Malik - 2024 - lup.lub.lu.se
… This thesis posits that large language models (LLMs) exhibiting sycophantic behaviour will
demonstrate consistent results across two testing conditions: explicit testing, where political …

Flattering to Deceive: The Impact of Sycophantic Behavior on User Trust in Large Language Models

MV Carro - openreview.net
… Given that sycophancy is often linked to human 5 feedback training mechanisms, this study
… whether sycophantic tendencies 6 negatively impact user trust in large language models or, …

From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning

W Chen, Z Huang, L Xie, B Lin, H Li, L Lu…�- arXiv preprint arXiv�…, 2024 - arxiv.org
… This work leverages path patching to find circuits on models with more than 7B parameters,
which shows the scalability of the method. We conceptualize the language model as a …

Accounting for Sycophancy in Language Model Uncertainty Estimation

A Sicilia, M Inan, M Alikhani�- arXiv preprint arXiv:2410.14746, 2024 - arxiv.org
… uncertainty may be a promising avenue for annotators to identify sycophancy. Likewise, …
sycophancy bias (� 4.2) because language models effectively condition on hedging language. …

Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies

A RRV, N Tyagi, MN Uddin, N Varshney…�- arXiv preprint arXiv�…, 2024 - arxiv.org
… This study explores the sycophantic tendencies of Large Language Models (LLMs), where
these models tend to provide answers that match what users want to hear, even if they are not …

GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy

J Batzner, V Stocker, S Schmid, G Kasneci�- arXiv preprint arXiv�…, 2024 - arxiv.org
… Our results contribute to a more nuanced understanding of sycophancy, steerability, and
political bias in LLM output evaluations. Our study also emphasizes the context dependency of …

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

S Li, T Ji, X Fan, L Lu, L Yang, Y Yang, Z Xi…�- arXiv preprint arXiv�…, 2024 - arxiv.org
… As LLMs expand into other modalities like vision-language models (VLMs), the saying “…
exhibit sycophancy when given images as evidence? This paper presents the first sycophancy

Trustllm: Trustworthiness in large language models

L Sun, Y Huang, H Wang, S Wu, Q Zhang…�- arXiv preprint arXiv�…, 2024 - arxiv.org
… 5) Additionally, we find a positive correlation between sycophancy and adversarial actuality.
Models with lower sycophancy levels are more effective in identifying and highlighting …