Google Scholar

Towards understanding sycophancy in language models

M Sharma, M Tong, T Korbak, D Duvenaud…�- arXiv preprint arXiv�…, 2023 - arxiv.org

… in preference data is responsible for sycophancy in AI assistants, we then analyze whether
sycophancy increases when optimizing language model responses using preference models …

Save Cite Cited by 104 Related articles All 3 versions View as HTML

[PDF] arxiv.org

Towards Analyzing and Mitigating Sycophancy in Large Vision-Language Models

Y Zhao, R Zhang, J Xiao, C Ke, R Hou, Y Hao…�- arXiv preprint arXiv�…, 2024 - arxiv.org

… sycophantic behavior varies significantly among models, our analysis reveals the severe
deficiency of all LVLMs in resilience of sycophancy … (LQCD), a model-agnostic method focusing …

Save Cite Cited by 1 Related articles All 2 versions View as HTML

[PDF] lu.se

[PDF][PDF] Deliberation in the Age of Deception: Measuring Sycophancy in Large Language Models

M Malik - 2024 - lup.lub.lu.se

… This thesis posits that large language models (LLMs) exhibiting sycophantic behaviour will
demonstrate consistent results across two testing conditions: explicit testing, where political …

[PDF] openreview.net

Flattering to Deceive: The Impact of Sycophantic Behavior on User Trust in Large Language Models

MV Carro - openreview.net

… Given that sycophancy is often linked to human 5 feedback training mechanisms, this study
… whether sycophantic tendencies 6 negatively impact user trust in large language models or, …

Save Cite Related articles View as HTML

[PDF] arxiv.org

From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning

W Chen, Z Huang, L Xie, B Lin, H Li, L Lu…�- arXiv preprint arXiv�…, 2024 - arxiv.org

… This work leverages path patching to find circuits on models with more than 7B parameters,
which shows the scalability of the method. We conceptualize the language model as a …

Save Cite Related articles View as HTML

[PDF] arxiv.org

Accounting for Sycophancy in Language Model Uncertainty Estimation

A Sicilia, M Inan, M Alikhani�- arXiv preprint arXiv:2410.14746, 2024 - arxiv.org

… uncertainty may be a promising avenue for annotators to identify sycophancy. Likewise, …
sycophancy bias (� 4.2) because language models effectively condition on hedging language. …

Save Cite Related articles View as HTML

[PDF] arxiv.org

Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies

A RRV, N Tyagi, MN Uddin, N Varshney…�- arXiv preprint arXiv�…, 2024 - arxiv.org

… This study explores the sycophantic tendencies of Large Language Models (LLMs), where
these models tend to provide answers that match what users want to hear, even if they are not …

[PDF] arxiv.org

GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy

J Batzner, V Stocker, S Schmid, G Kasneci�- arXiv preprint arXiv�…, 2024 - arxiv.org

… Our results contribute to a more nuanced understanding of sycophancy, steerability, and
political bias in LLM output evaluations. Our study also emphasizes the context dependency of …

[PDF] arxiv.org

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

S Li, T Ji, X Fan, L Lu, L Yang, Y Yang, Z Xi…�- arXiv preprint arXiv�…, 2024 - arxiv.org

… As LLMs expand into other modalities like vision-language models (VLMs), the saying “…
exhibit sycophancy when given images as evidence? This paper presents the first sycophancy …

[PDF] arxiv.org

Trustllm: Trustworthiness in large language models

L Sun, Y Huang, H Wang, S Wu, Q Zhang…�- arXiv preprint arXiv�…, 2024 - arxiv.org

… 5) Additionally, we find a positive correlation between sycophancy and adversarial actuality.
Models with lower sycophancy levels are more effective in identifying and highlighting …

Save Cite Cited by 156 Related articles All 4 versions View as HTML

Cite

Advanced search

Saved to My library

Towards understanding sycophancy in language models

Towards Analyzing and Mitigating Sycophancy in Large Vision-Language Models

[PDF][PDF] Deliberation in the Age of Deception: Measuring Sycophancy in Large Language Models

Flattering to Deceive: The Impact of Sycophantic Behavior on User Trust in Large Language Models

From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning

Accounting for Sycophancy in Language Model Uncertainty Estimation

Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies

GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

Trustllm: Trustworthiness in large language models

Related searches