-
Computational analysis of US Congressional speeches reveals a shift from evidence to intuition
Authors:
Segun Taofeek Aroyehun,
Almog Simchon,
Fabio Carrella,
Jana Lasser,
Stephan Lewandowsky,
David Garcia
Abstract:
Pursuit of honest and truthful decision-making is crucial for governance and accountability in democracies. However, people sometimes take different perspectives of what it means to be honest and how to pursue truthfulness. Here we explore a continuum of perspectives from evidence-based reasoning, rooted in ascertainable facts and data, at one end, to intuitive decisions that are driven by feeling…
▽ More
Pursuit of honest and truthful decision-making is crucial for governance and accountability in democracies. However, people sometimes take different perspectives of what it means to be honest and how to pursue truthfulness. Here we explore a continuum of perspectives from evidence-based reasoning, rooted in ascertainable facts and data, at one end, to intuitive decisions that are driven by feelings and subjective interpretations, at the other. We analyze the linguistic traces of those contrasting perspectives in Congressional speeches from 1879 to 2022. We find that evidence-based language has continued to decline since the mid-1970s, together with a decline in legislative productivity. The decline was accompanied by increasing partisan polarization in Congress and rising income inequality in society. Results highlight the importance of evidence-based language in political decision-making.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Collective moderation of hate, toxicity, and extremity in online discussions
Authors:
Jana Lasser,
Alina Herderich,
Joshua Garland,
Segun Taofeek Aroyehun,
David Garcia,
Mirta Galesic
Abstract:
How can citizens address hate in online discourse? We analyze a large corpus of more than 130,000 discussions on Twitter over four years. With the help of human annotators, language models and machine learning classifiers, we identify different dimensions of discourse that might be related to the probability of hate speech in subsequent tweets. We use a matching approach and longitudinal statistic…
▽ More
How can citizens address hate in online discourse? We analyze a large corpus of more than 130,000 discussions on Twitter over four years. With the help of human annotators, language models and machine learning classifiers, we identify different dimensions of discourse that might be related to the probability of hate speech in subsequent tweets. We use a matching approach and longitudinal statistical analyses to discern the effectiveness of different counter speech strategies on the micro-level (individual tweet pairs), meso-level (discussion trees) and macro-level (days) of discourse. We find that expressing simple opinions, not necessarily supported by facts, but without insults, relates to the least hate in subsequent discussions. Sarcasm can be helpful as well, in particular in the presence of organized extreme groups. Mentioning either outgroups or ingroups is typically related to a deterioration of discourse. A pronounced emotional tone, either negative such as anger or fear, or positive such as enthusiasm and pride, also leads to worse discourse quality. We obtain similar results for other measures of quality of discourse beyond hate speech, including toxicity, extremity of speech, and the presence of extreme speakers. Going beyond one-shot analyses on smaller samples of discourse, our findings have implications for the successful management of online commons through collective civic moderation.
△ Less
Submitted 11 December, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
The right to audit and power asymmetries in algorithm auditing
Authors:
Aleksandra Urman,
Ivan Smirnov,
Jana Lasser
Abstract:
In this paper, we engage with and expand on the keynote talk about the Right to Audit given by Prof. Christian Sandvig at the IC2S2 2021 through a critical reflection on power asymmetries in the algorithm auditing field. We elaborate on the challenges and asymmetries mentioned by Sandvig - such as those related to legal issues and the disparity between early-career and senior researchers. We also…
▽ More
In this paper, we engage with and expand on the keynote talk about the Right to Audit given by Prof. Christian Sandvig at the IC2S2 2021 through a critical reflection on power asymmetries in the algorithm auditing field. We elaborate on the challenges and asymmetries mentioned by Sandvig - such as those related to legal issues and the disparity between early-career and senior researchers. We also contribute a discussion of the asymmetries that were not covered by Sandvig but that we find critically important: those related to other disparities between researchers, incentive structures related to the access to data from companies, targets of auditing and users and their rights. We also discuss the implications these asymmetries have for algorithm auditing research such as the Western-centrism and the lack of the diversity of perspectives. While we focus on the field of algorithm auditing specifically, we suggest some of the discussed asymmetries affect Computational Social Science more generally and need to be reflected on and addressed.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Just Another Day on Twitter: A Complete 24 Hours of Twitter Data
Authors:
Juergen Pfeffer,
Daniel Matter,
Kokil Jaidka,
Onur Varol,
Afra Mashhadi,
Jana Lasser,
Dennis Assenmacher,
Siqi Wu,
Diyi Yang,
Cornelia Brantner,
Daniel M. Romero,
Jahna Otterbacher,
Carsten Schwemmer,
Kenneth Joseph,
David Garcia,
Fred Morstatter
Abstract:
At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site…
▽ More
At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected all 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change.
△ Less
Submitted 11 April, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
From alternative conceptions of honesty to alternative facts in communications by U.S. politicians
Authors:
Jana Lasser,
Segun Taofeek Aroyehun,
Fabio Carrella,
Almog Simchon,
David Garcia,
Stephan Lewandowsky
Abstract:
The spread of online misinformation on social media is increasingly perceived as a problem for societal cohesion and democracy. The role of political leaders in this process has attracted less research attention, even though politicians who "speak their mind" are perceived by segments of the public as authentic and honest even if their statements are unsupported by evidence. Analyzing communicatio…
▽ More
The spread of online misinformation on social media is increasingly perceived as a problem for societal cohesion and democracy. The role of political leaders in this process has attracted less research attention, even though politicians who "speak their mind" are perceived by segments of the public as authentic and honest even if their statements are unsupported by evidence. Analyzing communications by members of the U.S. Congress on Twitter between 2011 and 2022, we show that politicians' conception of honesty has undergone a distinct shift, with authentic belief-speaking that may be decoupled from evidence becoming more prominent and more differentiated from explicitly evidence-based truth seeking. We show that for Republicans - but not Democrats - an increase of belief-speaking of 10% is associated with a decrease of 12.8 points of quality (NewsGuard scoring system) in the sources shared in a tweet. Conversely, an increase in truth-seeking language is associated with an increase in quality of sources for both parties. The results support the hypothesis that the current dissemination of misinformation in political discourse is in part driven by an alternative understanding of truth and honesty that emphasizes invocation of subjective belief at the expense of reliance on evidence.
△ Less
Submitted 14 June, 2023; v1 submitted 23 August, 2022;
originally announced August 2022.
-
Social media sharing by political elites: An asymmetric American exceptionalism
Authors:
Jana Lasser,
Segun Taofeek Aroyehun,
Almog Simchon,
Fabio Carrella,
David Garcia,
Stephan Lewandowsky
Abstract:
Increased sharing of untrustworthy information on social media platforms is one of the main challenges of our modern information society. Because information disseminated by political elites is known to shape citizen and media discourse, it is particularly important to examine the quality of information shared by politicians. Here we show that from 2016 onward, members of the Republican party in t…
▽ More
Increased sharing of untrustworthy information on social media platforms is one of the main challenges of our modern information society. Because information disseminated by political elites is known to shape citizen and media discourse, it is particularly important to examine the quality of information shared by politicians. Here we show that from 2016 onward, members of the Republican party in the U.S. Congress have been increasingly sharing links to untrustworthy sources. The proportion of untrustworthy information posted by Republicans versus Democrats is diverging at an accelerating rate, and this divergence has worsened since president Biden was elected. This divergence between parties seems to be unique to the U.S. as it cannot be observed in other western democracies such as Germany and the United Kingdom, where left-right disparities are smaller and have remained largely constant.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
This Sample seems to be good enough! Assessing Coverage and Temporal Reliability of Twitter's Academic API
Authors:
Juergen Pfeffer,
Angelina Mooseder,
Jana Lasser,
Luca Hammer,
Oliver Stritzel,
David Garcia
Abstract:
Because of its willingness to share data with academia and industry, Twitter has been the primary social media platform for scientific research as well as for consulting businesses and governments in the last decade. In recent years, a series of publications have studied and criticized Twitter's APIs and Twitter has partially adapted its existing data streams. The newest Twitter API for Academic R…
▽ More
Because of its willingness to share data with academia and industry, Twitter has been the primary social media platform for scientific research as well as for consulting businesses and governments in the last decade. In recent years, a series of publications have studied and criticized Twitter's APIs and Twitter has partially adapted its existing data streams. The newest Twitter API for Academic Research allows to "access Twitter's real-time and historical public data with additional features and functionality that support collecting more precise, complete, and unbiased datasets." The main new feature of this API is the possibility of accessing the full archive of all historic Tweets. In this article, we will take a closer look at the Academic API and will try to answer two questions. First, are the datasets collected with the Academic API complete? Secondly, since Twitter's Academic API delivers historic Tweets as represented on Twitter at the time of data collection, we need to understand how much data is lost over time due to Tweet and account removal from the platform. Our work shows evidence that Twitter's Academic API can indeed create (almost) complete samples of Twitter data based on a wide variety of search terms. We also provide evidence that Twitter's data endpoint v2 delivers better samples than the previously used endpoint v1.1. Furthermore, collecting Tweets with the Academic API at the time of studying a phenomenon rather than creating local archives of stored Tweets, allows for a straightforward way of following Twitter's developer agreement. Finally, we will also discuss technical artifacts and implications of the Academic API. We hope that our work can add another layer of understanding of Twitter data collections leading to more reliable studies of human behavior via social media data.
△ Less
Submitted 11 April, 2023; v1 submitted 4 April, 2022;
originally announced April 2022.
-
Stress-testing the Resilience of the Austrian Healthcare System Using Agent-Based Simulation
Authors:
Michaela Kaleta,
Jana Lasser,
Elma Dervic,
Liuhuaying Yang,
Johannes Sorger,
Ruggiero Lo Sardo,
Stefan Thurner,
Alexandra Kautzky-Willer,
Peter Klimek
Abstract:
Patients do not access physicians at random but rather via naturally emerging networks of patient flows between them. As retirements, mass quarantines and absence due to sickness during pandemics, or other shocks thin out these networks, the system might be pushed closer to a tipping point where it loses its ability to deliver care to the population. Here we propose a data-driven framework to quan…
▽ More
Patients do not access physicians at random but rather via naturally emerging networks of patient flows between them. As retirements, mass quarantines and absence due to sickness during pandemics, or other shocks thin out these networks, the system might be pushed closer to a tipping point where it loses its ability to deliver care to the population. Here we propose a data-driven framework to quantify the regional resilience to such shocks of primary and secondary care in Austria via an agent-based model. For each region and medical specialty we construct detailed patient-sharing networks from administrative data and stress-test these networks by removing increasing numbers of physicians from the system. This allows us to measure regional resilience indicators describing how many physicians can be removed from a certain area before individual patients won't be treated anymore. We find that such tipping points do indeed exist and that regions and medical specialties differ substantially in their resilience. These systemic differences can be related to indicators for individual physicians by quantifying how much their hypothetical removal would stress the system (risk score) or how much of the stress from the removal of other physicians they would be able to absorb (benefit score). Our stress-testing framework could enable health authorities to rapidly identify bottlenecks in access to care as well as to inspect these naturally emerging physician networks and how potential absences would impact them.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Social media emotion macroscopes reflect emotional experiences in society at large
Authors:
David Garcia,
Max Pellert,
Jana Lasser,
Hannah Metzler
Abstract:
Social media generate data on human behaviour at large scales and over long periods of time, posing a complementary approach to traditional methods in the social sciences. Millions of texts from social media can be processed with computational methods to study emotions over time and across regions. However, recent research has shown weak correlations between social media emotions and affect questi…
▽ More
Social media generate data on human behaviour at large scales and over long periods of time, posing a complementary approach to traditional methods in the social sciences. Millions of texts from social media can be processed with computational methods to study emotions over time and across regions. However, recent research has shown weak correlations between social media emotions and affect questionnaires at the individual level and between static regional aggregates of social media emotion and subjective well-being at the population level, questioning the validity of social media data to study emotions. Yet, to date, no research has tested the validity of social media emotion macroscopes to track the temporal evolution of emotions at the level of a whole society. Here we present a pre-registered prediction study that shows how gender-rescaled time series of Twitter emotional expression at the national level substantially correlate with aggregates of self-reported emotions in a weekly representative survey in the United Kingdom. A follow-up exploratory analysis shows a high prevalence of third-person references in emotionally-charged tweets, indicating that social media data provide a way of social sensing the emotions of others rather than just the emotional experiences of users. These results show that, despite the issues that social media have in terms of representativeness and algorithmic confounding, the combination of advanced text analysis methods with user demographic information in social media emotion macroscopes can provide measures that are informative of the general population beyond social media users.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Agent-based simulations for protecting nursing homes with prevention and vaccination strategies
Authors:
Jana Lasser,
Johannes Zuber,
Johannes Sorger,
Elma Dervic,
Katharina Ledebur,
Simon David Lindner,
Elisabeth Klager,
Maria Kletečka-Pulker,
Harald Willschke,
Katrin Stangl,
Sarah Stadtmann,
Christian Haslinger,
Peter Klimek,
Thomas Wochele-Thoma
Abstract:
Due to its high lethality amongst the elderly, the safety of nursing homes has been of central importance during the COVID-19 pandemic. With test procedures becoming available at scale, such as antigen or RT-LAMP tests, and increasing availability of vaccinations, nursing homes might be able to safely relax prohibitory measures while controlling the spread of infections (meaning an average of one…
▽ More
Due to its high lethality amongst the elderly, the safety of nursing homes has been of central importance during the COVID-19 pandemic. With test procedures becoming available at scale, such as antigen or RT-LAMP tests, and increasing availability of vaccinations, nursing homes might be able to safely relax prohibitory measures while controlling the spread of infections (meaning an average of one or less secondary infections per index case). Here, we develop a detailed agent-based epidemiological model for the spread of SARS-CoV-2 in nursing homes to identify optimal prevention strategies. The model is microscopically calibrated to high-resolution data from nursing homes in Austria, including detailed social contact networks and information on past outbreaks. We find that the effectiveness of mitigation testing depends critically on the timespan between test and test result, the detection threshold of the viral load for the test to give a positive result, and the screening frequencies of residents and employees. Under realistic conditions and in absence of an effective vaccine, we find that preventive screening of employees only might be sufficient to control outbreaks in nursing homes, provided that turnover times and detection thresholds of the tests are low enough. If vaccines that are moderately effective against infection and transmission are available, control is achieved if 80% or more of the inhabitants are vaccinated, even if no preventive testing is in place and residents are allowed to have visitors. Since these results strongly depend on vaccine efficacy against infection, retention of testing infrastructures, regular voluntary screening and sequencing of virus genomes is advised to enable early identification of new variants of concern.
△ Less
Submitted 14 June, 2021; v1 submitted 16 November, 2020;
originally announced April 2021.
-
Dashboard of sentiment in Austrian social media during COVID-19
Authors:
Max Pellert,
Jana Lasser,
Hannah Metzler,
David Garcia
Abstract:
To track online emotional expressions of the Austrian population close to real-time during the COVID-19 pandemic, we build a self-updating monitor of emotion dynamics using digital traces from three different data sources. This enables decision makers and the interested public to assess issues such as the attitude towards counter-measures taken during the pandemic and the possible emergence of a (…
▽ More
To track online emotional expressions of the Austrian population close to real-time during the COVID-19 pandemic, we build a self-updating monitor of emotion dynamics using digital traces from three different data sources. This enables decision makers and the interested public to assess issues such as the attitude towards counter-measures taken during the pandemic and the possible emergence of a (mental) health crisis early on. We use web scraping and API access to retrieve data from the news platform derstandard.at, Twitter and a chat platform for students. We document the technical details of our workflow in order to provide materials for other researchers interested in building a similar tool for different contexts. Automated text analysis allows us to highlight changes of language use during COVID-19 in comparison to a neutral baseline. We use special word clouds to visualize that overall difference. Longitudinally, our time series show spikes in anxiety that can be linked to several events and media reporting. Additionally, we find a marked decrease in anger. The changes last for remarkably long periods of time (up to 12 weeks). We discuss these and more patterns and connect them to the emergence of collective emotions. The interactive dashboard showcasing our data is available online under http://www.mpellert.at/covid19_monitor_austria/. Our work has attracted media attention and is part of an web archive of resources on COVID-19 collected by the Austrian National Library.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.