Skip to main content

Showing 1–20 of 20 results for author: Berendt, B

  1. arXiv:2407.16496  [pdf, other

    cs.CY cs.AI cs.LG

    Articulation Work and Tinkering for Fairness in Machine Learning

    Authors: Miriam Fahimi, Mayra Russo, Kristen M. Scott, Maria-Esther Vidal, Bettina Berendt, Katharina Kinder-Kurlanda

    Abstract: The field of fair AI aims to counter biased algorithms through computational modelling. However, it faces increasing criticism for perpetuating the use of overly technical and reductionist methods. As a result, novel approaches appear in the field to address more socially-oriented and interdisciplinary (SOI) perspectives on fair AI. In this paper, we take this dynamic as the starting point to stud… ▽ More

    Submitted 28 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    ACM Class: K.4.3; I.2.0

  2. arXiv:2407.06631  [pdf, other

    cs.SI cs.CY cs.HC cs.NI

    A Systematic Review of Echo Chamber Research: Comparative Analysis of Conceptualizations, Operationalizations, and Varying Outcomes

    Authors: David Hartmann, Lena Pohlmann, Sonja Mei Wang, Bettina Berendt

    Abstract: This systematic review synthesizes current research on echo chambers and filter bubbles to highlight the reasons for the dissent in echo chamber research on the existence, antecedents, and effects of the phenomenon. The review of 112 studies reveals that the lack of consensus in echo chamber research is based on different conceptualizations and operationalizations of echo chambers. While studies t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  3. arXiv:2405.01097  [pdf, other

    cs.CY cs.CL cs.HC cs.IR cs.SE

    Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification

    Authors: Dimitri Staufer, Frank Pallas, Bettina Berendt

    Abstract: Whistleblowing is essential for ensuring transparency and accountability in both public and private sectors. However, (potential) whistleblowers often fear or face retaliation, even when reporting anonymously. The specific content of their disclosures and their distinct writing style may re-identify them as the source. Legal measures, such as the EU WBD, are limited in their scope and effectivenes… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at the ACM Conference on Fairness, Accountability, and Transparency 2024 (ACM FAccT'24). This is a preprint manuscript (authors' own version before final copy-editing)

    ACM Class: H.3; K.4; H.5; K.5; D.2; J.4

  4. arXiv:2403.07904  [pdf, ps, other

    cs.CY cs.AI cs.LG

    Addressing the Regulatory Gap: Moving Towards an EU AI Audit Ecosystem Beyond the AIA by Including Civil Society

    Authors: David Hartmann, José Renato Laranjeira de Pereira, Chiara Streitbörger, Bettina Berendt

    Abstract: The European legislature has proposed the Digital Services Act (DSA) and Artificial Intelligence Act (AIA) to regulate platforms and Artificial Intelligence (AI) products. We review to what extent third-party audits are part of both laws and to what extent access to models and data is provided. By considering the value of third-party audits and third-party data access in an audit ecosystem, we ide… ▽ More

    Submitted 17 May, 2024; v1 submitted 26 February, 2024; originally announced March 2024.

  5. arXiv:2310.03477  [pdf, other

    cs.CL cs.AI

    Tik-to-Tok: Translating Language Models One Token at a Time: An Embedding Initialization Strategy for Efficient Language Adaptation

    Authors: François Remy, Pieter Delobelle, Bettina Berendt, Kris Demuynck, Thomas Demeester

    Abstract: Training monolingual language models for low and mid-resource languages is made challenging by limited and often inadequate pretraining data. In this study, we propose a novel model conversion strategy to address this issue, adapting high-resources monolingual language models to a new target language. By generalizing over a word translation dictionary encompassing both the source and target langua… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: As first reviewed at TACL

  6. arXiv:2303.07207  [pdf

    cs.CY cs.AI

    Bias, diversity, and challenges to fairness in classification and automated text analysis. From libraries to AI and back

    Authors: Bettina Berendt, Özgür Karadeniz, Sercan Kıyak, Stefan Mertens, Leen d'Haenens

    Abstract: Libraries are increasingly relying on computational methods, including methods from Artificial Intelligence (AI). This increasing usage raises concerns about the risks of AI that are currently broadly discussed in scientific literature, the media and law-making. In this article we investigate the risks surrounding bias and unfairness in AI usage in classification and automated text analysis within… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: 14 pages

  7. Domain Adaptive Decision Trees: Implications for Accuracy and Fairness

    Authors: Jose M. Alvarez, Kristen M. Scott, Salvatore Ruggieri, Bettina Berendt

    Abstract: In uses of pre-trained machine learning models, it is a known issue that the target population in which the model is being deployed may not have been reflected in the source population with which the model was trained. This can result in a biased model when deployed, leading to a reduction in model performance. One risk is that, as the population changes, certain demographic groups will be under-s… ▽ More

    Submitted 31 May, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: *Both authors contributed equally to this work. Accepted at FAccT '23

    Journal ref: FAccT '23: the 2023 ACM Conference on Fairness, Accountability, and Transparency Chicago IL USA June 12 - 15, 2023

  8. arXiv:2301.12855  [pdf, other

    cs.CL

    How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

    Authors: Ewoenam Tokpo, Pieter Delobelle, Bettina Berendt, Toon Calders

    Abstract: To mitigate gender bias in contextualized language models, different intrinsic mitigation strategies have been proposed, alongside many bias metrics. Considering that the end use of these language models is for downstream tasks like text classification, it is important to understand how these intrinsic bias mitigation strategies actually translate to fairness in downstream tasks and the extent of… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  9. arXiv:2301.00671  [pdf

    cs.CL cs.AI cs.CY cs.DB

    Political representation bias in DBpedia and Wikidata as a challenge for downstream processing

    Authors: Ozgur Karadeniz, Bettina Berendt, Sercan Kiyak, Stefan Mertens, Leen d'Haenens

    Abstract: Diversity Searcher is a tool originally developed to help analyse diversity in news media texts. It relies on a form of automated content analysis and thus rests on prior assumptions and depends on certain design choices related to diversity and fairness. One such design choice is the external knowledge source(s) used. In this article, we discuss implications that these sources can have on the res… ▽ More

    Submitted 29 December, 2022; originally announced January 2023.

  10. arXiv:2211.08192  [pdf, other

    cs.CL cs.LG

    RobBERT-2022: Updating a Dutch Language Model to Account for Evolving Language Use

    Authors: Pieter Delobelle, Thomas Winters, Bettina Berendt

    Abstract: Large transformer-based language models, e.g. BERT and GPT-3, outperform previous architectures on most natural language processing tasks. Such language models are first pre-trained on gigantic corpora of text and later used as base-model for finetuning on a particular task. Since the pre-training step is usually not repeated, base models are not up-to-date with the latest information. In this pap… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: 9 pages, 1 figure, 3 tables

  11. arXiv:2207.04546  [pdf, other

    cs.CL cs.CY cs.LG

    FairDistillation: Mitigating Stereotyping in Language Models

    Authors: Pieter Delobelle, Bettina Berendt

    Abstract: Large pre-trained language models are successfully being used in a variety of tasks, across many languages. With this ever-increasing usage, the risk of harmful side effects also rises, for example by reproducing and reinforcing stereotypes. However, detecting and mitigating these harms is difficult to do in general and becomes computationally expensive when tackling multiple languages or when con… ▽ More

    Submitted 16 September, 2022; v1 submitted 10 July, 2022; originally announced July 2022.

    Comments: Accepted at ECML-PKDD 2022

  12. arXiv:2204.13511  [pdf, other

    cs.CL

    RobBERTje: a Distilled Dutch BERT Model

    Authors: Pieter Delobelle, Thomas Winters, Bettina Berendt

    Abstract: Pre-trained large-scale language models such as BERT have gained a lot of attention thanks to their outstanding performance on a wide range of natural language tasks. However, due to their large number of parameters, they are resource-intensive both to deploy and to fine-tune. Researchers have created several methods for distilling language models into smaller ones to increase efficiency, with a s… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

    Comments: Published in CLIN journal

    Journal ref: Computational Linguistics in the Netherlands Journal 2021

  13. arXiv:2112.07447  [pdf, other

    cs.CL cs.CY cs.LG

    Measuring Fairness with Biased Rulers: A Survey on Quantifying Biases in Pretrained Language Models

    Authors: Pieter Delobelle, Ewoenam Kwaku Tokpo, Toon Calders, Bettina Berendt

    Abstract: An increasing awareness of biased patterns in natural language processing resources, like BERT, has motivated many metrics to quantify `bias' and `fairness'. But comparing the results of different metrics and the works that evaluate with such metrics remains difficult, if not outright impossible. We survey the existing literature on fairness metrics for pretrained language models and experimentall… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

    Comments: 15 pages, 4 figures, 3 tables

  14. arXiv:2111.02825  [pdf

    cs.AI cs.CY

    Whistleblower protection in the digital age -- why 'anonymous' is not enough. From technology to a wider view of governance

    Authors: Bettina Berendt, Stefan Schiffner

    Abstract: When technology enters applications and processes with a long tradition of controversial societal debate, multi-faceted new ethical and legal questions arise. This paper focusses on the process of whistleblowing, an activity with large impacts on democracy and business. Computer science can, for the first time in history, provide for truly anonymous communication. We investigate this in relation t… ▽ More

    Submitted 27 February, 2023; v1 submitted 4 November, 2021; originally announced November 2021.

    Comments: 16 pages, 1 figure

    Journal ref: International Review of Information Ethics, 31(1), 2022, https://informationethics.ca/index.php/irie/article/view/479/440

  15. arXiv:2104.09947  [pdf, other

    cs.CL cs.SI

    Measuring Shifts in Attitudes Towards COVID-19 Measures in Belgium Using Multilingual BERT

    Authors: Kristen Scott, Pieter Delobelle, Bettina Berendt

    Abstract: We classify seven months' worth of Belgian COVID-related Tweets using multilingual BERT and relate them to their governments' COVID measures. We classify Tweets by their stated opinion on Belgian government curfew measures (too strict, ok, too loose). We examine the change in topics discussed and views expressed over time and in reference to dates of related events such as implementation of new me… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: 5 pages, 2 figures

  16. arXiv:2005.06852  [pdf, other

    cs.LG cs.AI stat.ML

    Ethical Adversaries: Towards Mitigating Unfairness with Adversarial Machine Learning

    Authors: Pieter Delobelle, Paul Temple, Gilles Perrouin, Benoît Frénay, Patrick Heymans, Bettina Berendt

    Abstract: Machine learning is being integrated into a growing number of critical systems with far-reaching impacts on society. Unexpected behaviour and unfair decision processes are coming under increasing scrutiny due to this widespread use and its theoretical considerations. Individuals, as well as organisations, notice, test, and criticize unfair results to hold model designers and deployers accountable.… ▽ More

    Submitted 1 September, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

    Comments: 15 pages, 3 figures, 1 table

  17. arXiv:2001.09762  [pdf, other

    cs.CY

    Bias in Data-driven AI Systems -- An Introductory Survey

    Authors: Eirini Ntoutsi, Pavlos Fafalios, Ujwal Gadiraju, Vasileios Iosifidis, Wolfgang Nejdl, Maria-Esther Vidal, Salvatore Ruggieri, Franco Turini, Symeon Papadopoulos, Emmanouil Krasanakis, Ioannis Kompatsiaris, Katharina Kinder-Kurlanda, Claudia Wagner, Fariba Karimi, Miriam Fernandez, Harith Alani, Bettina Berendt, Tina Kruegel, Christian Heinze, Klaus Broelemann, Gjergji Kasneci, Thanassis Tiropanis, Steffen Staab

    Abstract: AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their desig… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: 19 pages, 1 figure

  18. arXiv:2001.06286  [pdf, other

    cs.CL cs.LG

    RobBERT: a Dutch RoBERTa-based Language Model

    Authors: Pieter Delobelle, Thomas Winters, Bettina Berendt

    Abstract: Pre-trained language models have been dominating the field of natural language processing in recent years, and have led to significant performance gains for various complex natural language tasks. One of the most prominent pre-trained language models is BERT, which was released as an English as well as a multilingual version. Although multilingual BERT performs well on many tasks, recent studies s… ▽ More

    Submitted 16 September, 2020; v1 submitted 17 January, 2020; originally announced January 2020.

    Comments: 11 pages, 4 tables, 3 figures. Accepted in EMNLP Findings

  19. arXiv:1910.13793  [pdf, other

    cs.CL

    Time to Take Emoji Seriously: They Vastly Improve Casual Conversational Models

    Authors: Pieter Delobelle, Bettina Berendt

    Abstract: Graphical emoji are ubiquitous in modern-day online conversations. So is a single thumbs-up emoji able to signify an agreement, without any words. We argue that the current state-of-the-art systems are ill-equipped to correctly interpret these emoji, especially in a conversational context. However, in a casual context, the benefits might be high: a better understanding of users' utterances and mor… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: Accepted at Benelearn 2019

  20. arXiv:1810.12847  [pdf, ps, other

    cs.AI cs.CY

    AI for the Common Good?! Pitfalls, challenges, and Ethics Pen-Testing

    Authors: Bettina Berendt

    Abstract: Recently, many AI researchers and practitioners have embarked on research visions that involve doing AI for "Good". This is part of a general drive towards infusing AI research and practice with ethical thinking. One frequent theme in current ethical guidelines is the requirement that AI be good for all, or: contribute to the Common Good. But what is the Common Good, and is it enough to want to be… ▽ More

    Submitted 1 November, 2018; v1 submitted 30 October, 2018; originally announced October 2018.

    Comments: to appear in Paladyn. Journal of Behavioral Robotics; accepted on 27-10-2018