Skip to main content

Showing 1–32 of 32 results for author: Koychev, I

  1. arXiv:2409.00527  [pdf, other

    cs.CL cs.DL cs.LG

    Post-OCR Text Correction for Bulgarian Historical Documents

    Authors: Angel Beshirov, Milena Dobreva, Dimitar Dimitrov, Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: The digitization of historical documents is crucial for preserving the cultural heritage of the society. An important step in this process is converting scanned images to text using Optical Character Recognition (OCR), which can enable further search, information extraction, etc. Unfortunately, this is a hard problem as standard OCR tools are not tailored to deal with historical orthography as wel… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: Accepted for publication in the International Journal on Digital Libraries

  2. arXiv:2403.10378  [pdf, other

    cs.CL cs.CV

    EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models

    Authors: Rocktim Jyoti Das, Simeon Emilov Hristov, Haonan Li, Dimitar Iliyanov Dimitrov, Ivan Koychev, Preslav Nakov

    Abstract: We introduce EXAMS-V, a new challenging multi-discipline multimodal multilingual exam benchmark for evaluating vision language models. It consists of 20,932 multiple-choice questions across 20 school disciplines covering natural science, social science, and other miscellaneous studies, e.g., religion, fine arts, business, etc. EXAMS-V includes a variety of multimodal features such as text, images,… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  3. arXiv:2310.07807  [pdf, other

    cs.LG

    FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning

    Authors: Ensiye Kiyamousavi, Boris Kraychev, Ivan Koychev

    Abstract: Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately. Its goal is to create a robust and accurate model by aggregating and retraining local models over multiple rounds. However, FL faces challenges regarding data heterogeneity and model aggregation effectiveness. In order to simulate real-world data, researchers use methods for data… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  4. arXiv:2309.06844  [pdf, other

    cs.CL cs.AI cs.MM

    Gpachov at CheckThat! 2023: A Diverse Multi-Approach Ensemble for Subjectivity Detection in News Articles

    Authors: Georgi Pachov, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov

    Abstract: The wide-spread use of social networks has given rise to subjective, misleading, and even false information on the Internet. Thus, subjectivity detection can play an important role in ensuring the objectiveness and the quality of a piece of information. This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task~2 on subjectivity detection. Three different rese… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  5. arXiv:2306.05535  [pdf, other

    cs.CL cs.AI cs.IR cs.LG cs.SD eess.AS

    Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data

    Authors: Petar Ivanov, Ivan Koychev, Momchil Hardalov, Preslav Nakov

    Abstract: Developing tools to automatically detect check-worthy claims in political debates and speeches can greatly help moderators of debates, journalists, and fact-checkers. While previous work on this problem has focused exclusively on the text modality, here we explore the utility of the audio modality as an additional input. We create a new multimodal dataset (text and audio in English) containing 48… ▽ More

    Submitted 17 January, 2024; v1 submitted 24 May, 2023; originally announced June 2023.

    Comments: Check-Worthiness, Fact-Checking, Fake News, Misinformation, Disinformation, Political Debates, Multimodality

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: ICASSP 2024

  6. arXiv:2306.02349  [pdf, other

    cs.CL cs.IR cs.LG

    bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark

    Authors: Momchil Hardalov, Pepa Atanasova, Todor Mihaylov, Galia Angelova, Kiril Simov, Petya Osenova, Ves Stoyanov, Ivan Koychev, Preslav Nakov, Dragomir Radev

    Abstract: We present bgGLUE(Bulgarian General Language Understanding Evaluation), a benchmark for evaluating language models on Natural Language Understanding (NLU) tasks in Bulgarian. Our benchmark includes NLU tasks targeting a variety of NLP problems (e.g., natural language inference, fact-checking, named entity recognition, sentiment analysis, question answering, etc.) and machine learning tasks (sequen… ▽ More

    Submitted 6 June, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023 (Main Conference)

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: ACL 2023

  7. DuoSearch: A Novel Search Engine for Bulgarian Historical Documents

    Authors: Angel Beshirov, Suzan Hadzhieva, Ivan Koychev, Milena Dobreva

    Abstract: Search in collections of digitised historical documents is hindered by a two-prong problem, orthographic variety and optical character recognition (OCR) mistakes. We present a new search engine for historical documents, DuoSearch, which uses ElasticSearch and machine learning methods based on deep neural networks to offer a solution to this problem. It was tested on a collection of historical news… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted to ECIR 2022 (Demo paper)

  8. arXiv:2210.04447  [pdf, other

    cs.CL cs.IR cs.LG cs.SI

    CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media

    Authors: Momchil Hardalov, Anton Chernyavskiy, Ivan Koychev, Dmitry Ilvovsky, Preslav Nakov

    Abstract: While there has been substantial progress in developing systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. This is a sensible… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted to AACL-IJCNLP 2022 (Main Conference)

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: AACL-IJCNLP 2022

  9. arXiv:2201.09012  [pdf, other

    cs.CL cs.AI

    Leaf: Multiple-Choice Question Generation

    Authors: Kristiyan Vachev, Momchil Hardalov, Georgi Karadzhov, Georgi Georgiev, Ivan Koychev, Preslav Nakov

    Abstract: Testing with quiz questions has proven to be an effective way to assess and improve the educational process. However, manually creating quizzes is tedious and time-consuming. To address this challenge, we present Leaf, a system for generating multiple-choice questions from factual text. In addition to being very well suited for the classroom, Leaf could also be used in an industrial setting, e.g.,… ▽ More

    Submitted 22 January, 2022; originally announced January 2022.

    Comments: Accepted to ECIR 2022 (Demo)

  10. arXiv:2109.15120  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG cs.SI

    SUper Team at SemEval-2016 Task 3: Building a feature-rich system for community question answering

    Authors: Tsvetomila Mihaylova, Pepa Gencheva, Martin Boyanov, Ivana Yovcheva, Todor Mihaylov, Momchil Hardalov, Yasen Kiprov, Daniel Balchev, Ivan Koychev, Preslav Nakov, Ivelina Nikolova, Galia Angelova

    Abstract: We present the system we built for participating in SemEval-2016 Task 3 on Community Question Answering. We achieved the best results on subtask C, and strong results on subtasks A and B, by combining a rich set of various types of features: semantic, lexical, metadata, and user-related. The most important group turned out to be the metadata for the question and for the comment, semantic vectors t… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: community question answering, question-question similarity, question-comment similarity, answer reranking

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: SemEval-2016

  11. arXiv:2109.13726  [pdf, other

    cs.LG cs.CL cs.IR cs.SI

    Exposing Paid Opinion Manipulation Trolls

    Authors: Todor Mihaylov, Ivan Koychev, Georgi Georgiev, Preslav Nakov

    Abstract: Recently, Web forums have been invaded by opinion manipulation trolls. Some trolls try to influence the other users driven by their own convictions, while in other cases they can be organized and paid, e.g., by a political party or a PR agency that gives them specific instructions what to write. Finding paid trolls automatically using machine learning is a hard task, as there is no enough training… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: opinion manipulation trolls, trolls, opinion manipulation, community forums, news media

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: RANLP-2015

  12. arXiv:2108.12898  [pdf, ps, other

    cs.CL cs.AI cs.CY cs.IR cs.LG

    Generating Answer Candidates for Quizzes and Answer-Aware Question Generators

    Authors: Kristiyan Vachev, Momchil Hardalov, Georgi Karadzhov, Georgi Georgiev, Ivan Koychev, Preslav Nakov

    Abstract: In education, open-ended quiz questions have become an important tool for assessing the knowledge of students. Yet, manually preparing such questions is a tedious task, and thus automatic question generation has been proposed as a possible alternative. So far, the vast majority of research has focused on generating the question text, relying on question answering datasets with readily picked answe… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

    Comments: answer generation, question generation, answer-aware question generation, quiz questions, question answering

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: RANLP-2021 (SRW)

  13. arXiv:2108.12519  [pdf, other

    cs.CL cs.IR cs.LG cs.SI

    Predicting the Factuality of Reporting of News Media Using Observations About User Attention in Their YouTube Channels

    Authors: Krasimira Bozhanova, Yoan Dinkov, Ivan Koychev, Maria Castaldo, Tommaso Venturini, Preslav Nakov

    Abstract: We propose a novel framework for predicting the factuality of reporting of news media outlets by studying the user attention cycles in their YouTube channels. In particular, we design a rich set of features derived from the temporal evolution of the number of views, likes, dislikes, and comments for a video, which we then aggregate to the channel level. We develop and release a dataset for the tas… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

    Comments: Factuality, disinformation, misinformation, fake news, Youtube channels, propaganda, attention cycles

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: RANLP-2021

  14. arXiv:2011.03080  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    EXAMS: A Multi-Subject High School Examinations Dataset for Cross-Lingual and Multilingual Question Answering

    Authors: Momchil Hardalov, Todor Mihaylov, Dimitrina Zlatkova, Yoan Dinkov, Ivan Koychev, Preslav Nakov

    Abstract: We propose EXAMS -- a new benchmark dataset for cross-lingual and multilingual question answering for high school examinations. We collected more than 24,000 high-quality high school exam questions in 16 languages, covering 8 language families and 24 school subjects from Natural Sciences and Social Sciences, among others. EXAMS offers a fine-grained evaluation framework across multiple languages… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

    Comments: EMNLP 2020, 17 pages, 6 figures, 8 tables

  15. arXiv:2009.02931  [pdf, ps, other

    cs.CL cs.IR cs.LG cs.SI

    Team Alex at CLEF CheckThat! 2020: Identifying Check-Worthy Tweets With Transformer Models

    Authors: Alex Nikolov, Giovanni Da San Martino, Ivan Koychev, Preslav Nakov

    Abstract: While misinformation and disinformation have been thriving in social media for years, with the emergence of the COVID-19 pandemic, the political and the health misinformation merged, thus elevating the problem to a whole new level and giving rise to the first global infodemic. The fight against this infodemic has many aspects, with fact-checking and debunking false and misleading claims being amon… ▽ More

    Submitted 7 September, 2020; originally announced September 2020.

    Comments: Check-worthiness; Fact-Checking; Veracity

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: CLEF-2020

  16. arXiv:2004.14848  [pdf, other

    cs.CL

    Enriched Pre-trained Transformers for Joint Slot Filling and Intent Detection

    Authors: Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: Detecting the user's intent and finding the corresponding slots among the utterance's words are important tasks in natural language understanding. Their interconnected nature makes their joint modeling a standard part of training such models. Moreover, data scarceness and specialized vocabularies pose additional challenges. Recently, the advances in pre-trained language models, namely contextualiz… ▽ More

    Submitted 5 October, 2021; v1 submitted 30 April, 2020; originally announced April 2020.

  17. arXiv:1912.08084  [pdf, other

    cs.CL cs.IR cs.LG

    A Context-Aware Approach for Detecting Check-Worthy Claims in Political Debates

    Authors: Pepa Gencheva, Ivan Koychev, Lluís Màrquez, Alberto Barrón-Cedeño, Preslav Nakov

    Abstract: In the context of investigative journalism, we address the problem of automatically identifying which claims in a given document are most worthy and should be prioritized for fact-checking. Despite its importance, this is a relatively understudied problem. Thus, we create a new dataset of political debates, containing statements that have been fact-checked by nine reputable sources, and we train m… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

    Comments: Check-worthiness; Fact-Checking; Veracity; Neural Networks. arXiv admin note: substantial text overlap with arXiv:1908.01328

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: RANLP-2017

  18. arXiv:1911.08125  [pdf, other

    cs.CL cs.AI cs.IR

    In Search of Credible News

    Authors: Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: We study the problem of finding fake online news. This is an important problem as news of questionable credibility have recently been proliferating in social media at an alarming scale. As this is an understudied problem, especially for languages other than English, we first collect and release to the research community three new balanced credible vs. fake news datasets derived from four online so… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

    Comments: Credibility, veracity, fact checking, humor detection

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: AIMSA-2016

  19. arXiv:1910.08948  [pdf, other

    cs.CL cs.IR cs.SD eess.AS

    Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information

    Authors: Yoan Dinkov, Ahmed Ali, Ivan Koychev, Preslav Nakov

    Abstract: We address the problem of predicting the leading political ideology, i.e., left-center-right bias, for YouTube channels of news media. Previous work on the problem has focused exclusively on text and on analysis of the language used, topics discussed, sentiment, and the like. In contrast, here we study videos, which yields an interesting multimodal setup. Starting with gold annotations about the l… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: media bias, political ideology, Youtube channels, propaganda, disinformation, fake news

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: INTERSPEECH-2019

  20. arXiv:1910.01990  [pdf, other

    cs.CL cs.AI eess.AS

    Detecting Deception in Political Debates Using Acoustic and Textual Features

    Authors: Daniel Kopev, Ahmed Ali, Ivan Koychev, Preslav Nakov

    Abstract: We present work on deception detection, where, given a spoken claim, we aim to predict its factuality. While previous work in the speech community has relied on recordings from staged setups where people were asked to tell the truth or to lie and their statements were recorded, here we use real-world political debates. Thanks to the efforts of fact-checking organizations, it is possible to obtain… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: ASRU-2019

  21. arXiv:1908.11722  [pdf, other

    cs.CL cs.AI cs.CV cs.IR

    Fact-Checking Meets Fauxtography: Verifying Claims About Images

    Authors: Dimitrina Zlatkova, Preslav Nakov, Ivan Koychev

    Abstract: The recent explosion of false claims in social media and on the Web in general has given rise to a lot of manual fact-checking initiatives. Unfortunately, the number of claims that need to be fact-checked is several orders of magnitude larger than what humans can handle manually. Thus, there has been a lot of research aiming at automating the process. Interestingly, previous work has largely ignor… ▽ More

    Submitted 30 August, 2019; originally announced August 2019.

    Comments: Claims about Images; Fauxtography; Fact-Checking; Veracity; Fake News

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: EMNLP-2019

  22. arXiv:1908.09785  [pdf, other

    cs.CL cs.IR

    Detecting Toxicity in News Articles: Application to Bulgarian

    Authors: Yoan Dinkov, Ivan Koychev, Preslav Nakov

    Abstract: Online media aim for reaching ever bigger audience and for attracting ever longer attention span. This competition creates an environment that rewards sensational, fake, and toxic news. To help limit their spread and impact, we propose and develop a news toxicity detector that can recognize various types of toxic content. While previous research primarily focused on English, here we target Bulgari… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

    Comments: Fact-checking, source reliability, political ideology, news media, Bulgarian, RANLP-2019. arXiv admin note: text overlap with arXiv:1810.01765

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: RANLP-2019

  23. arXiv:1908.01519  [pdf, other

    cs.CL cs.IR

    Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian

    Authors: Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: Recently, reading comprehension models achieved near-human performance on large-scale datasets such as SQuAD, CoQA, MS Macro, RACE, etc. This is largely due to the release of pre-trained contextualized representations such as BERT and ELMo, which can be fine-tuned for the target task. Despite those advances and the creation of more challenging datasets, most of the work is still done for English.… ▽ More

    Submitted 6 September, 2019; v1 submitted 5 August, 2019; originally announced August 2019.

    Comments: Accepted at RANLP 2019 (13 pages, 2 figures, 6 tables)

  24. Recursive Style Breach Detection with Multifaceted Ensemble Learning

    Authors: Daniel Kopev, Dimitrina Zlatkova, Kristiyan Mitov, Atanas Atanasov, Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: We present a supervised approach for style change detection, which aims at predicting whether there are changes in the style in a given text document, as well as at finding the exact positions where such changes occur. In particular, we combine a TF.IDF representation of the document with features specifically engineered for the task, and we make predictions via an ensemble of diverse classifiers… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

    Comments: Accepted as regular paper at AIMSA 2018

  25. Machine Reading Comprehension for Answer Re-Ranking in Customer Support Chatbots

    Authors: Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: Recent advances in deep neural networks, language modeling and language generation have introduced new ideas to the field of conversational agents. As a result, deep neural models such as sequence-to-sequence, Memory Networks, and the Transformer have become key ingredients of state-of-the-art dialog systems. While those models are able to generate meaningful responses even in unseen situation, th… ▽ More

    Submitted 26 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: 13 pages, 1 figure, 4 tables

    Journal ref: Information 2019, 10, 82

  26. Towards Automated Customer Support

    Authors: Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: Recent years have seen growing interest in conversational agents, such as chatbots, which are a very good fit for automated customer support because the domain in which they need to operate is narrow. This interest was in part inspired by recent advances in neural machine translation, esp. the rise of sequence-to-sequence (seq2seq) and attention-based models such as the Transformer, which have bee… ▽ More

    Submitted 2 September, 2018; originally announced September 2018.

    Comments: Accepted as regular paper at AIMSA 2018

  27. We Built a Fake News & Click-bait Filter: What Happened Next Will Blow Your Mind!

    Authors: Georgi Karadzhov, Pepa Gencheva, Preslav Nakov, Ivan Koychev

    Abstract: It is completely amazing! Fake news and click-baits have totally invaded the cyber space. Let us face it: everybody hates them for three simple reasons. Reason #2 will absolutely amaze you. What these can achieve at the time of election will completely blow your mind! Now, we all agree, this cannot go on, you know, somebody has to stop it. So, we did this research on fake news/click-bait detection… ▽ More

    Submitted 10 March, 2018; originally announced March 2018.

    Comments: RANLP'2017, 7 pages, 1 figure

  28. arXiv:1712.08350  [pdf

    cs.IR

    Finding People's Professions and Nationalities Using Distant Supervision - The FMI@SU "goosefoot" team at the WSDM Cup 2017 Triple Scoring Task

    Authors: Valentin Zmiycharov, Dimitar Alexandrov, Preslav Nakov, Ivan Koychev, Yasen Kiprov

    Abstract: We describe the system that our FMI@SU student's team built for participating in the Triple Scoring task at the WSDM Cup 2017. Given a triple from a "type-like" relation, profession or nationality, the goal is to produce a score, on a scale from 0 to 7, that measures the relevance of the statement expressed by the triple: e.g., how well does the profession of an Actor fit for Quentin Tarantino? We… ▽ More

    Submitted 22 December, 2017; originally announced December 2017.

    Comments: Triple Scorer at WSDM Cup 2017, see arXiv:1712.08081

    ACM Class: H.3

  29. arXiv:1710.00689  [pdf, ps, other

    cs.CL

    Building Chatbots from Forum Data: Model Selection Using Question Answering Metrics

    Authors: Martin Boyanov, Ivan Koychev, Preslav Nakov, Alessandro Moschitti, Giovanni Da San Martino

    Abstract: We propose to use question answering (QA) data from Web forums to train chatbots from scratch, i.e., without dialog training data. First, we extract pairs of question and answer sentences from the typically much longer texts of questions and answers in a forum. We then use these shorter texts to train seq2seq models in a more efficient way. We further improve the parameter optimization using a new… ▽ More

    Submitted 2 October, 2017; originally announced October 2017.

    Comments: RANLP-2017

    MSC Class: 68T50 ACM Class: I.2.7

  30. arXiv:1710.00341  [pdf, other

    cs.CL

    Fully Automated Fact Checking Using External Sources

    Authors: Georgi Karadzhov, Preslav Nakov, Lluis Marquez, Alberto Barron-Cedeno, Ivan Koychev

    Abstract: Given the constantly growing proliferation of false claims online in recent years, there has been also a growing research interest in automatically distinguishing false rumors from factually true claims. Here, we propose a general-purpose framework for fully-automatic fact checking using external sources, tapping the potential of the entire Web as a knowledge source to confirm or reject a claim. O… ▽ More

    Submitted 1 October, 2017; originally announced October 2017.

    Comments: RANLP-2017

    MSC Class: 68T50 ACM Class: I.2.7

  31. arXiv:1707.06378  [pdf, ps, other

    cs.CL

    Large-Scale Goodness Polarity Lexicons for Community Question Answering

    Authors: Todor Mihaylov, Daniel Belchev, Yasen Kiprov, Ivan Koychev, Preslav Nakov

    Abstract: We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary an… ▽ More

    Submitted 20 July, 2017; originally announced July 2017.

    Comments: SIGIR '17, August 07-11, 2017, Shinjuku, Tokyo, Japan; Community Question Answering; Goodness polarity lexicons; Sentiment Analysis

  32. arXiv:1707.03736  [pdf, ps, other

    cs.CL

    The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

    Authors: Georgi Karadjov, Tsvetomila Mihaylova, Yasen Kiprov, Georgi Georgiev, Ivan Koychev, Preslav Nakov

    Abstract: Users posting online expect to remain anonymous unless they have logged in, which is often needed for them to be able to discuss freely on various topics. Preserving the anonymity of a text's writer can be also important in some other contexts, e.g., in the case of witness protection or anonymity programs. However, each person has his/her own style of writing, which can be analyzed using stylometr… ▽ More

    Submitted 28 July, 2017; v1 submitted 12 July, 2017; originally announced July 2017.

    Comments: Best of the Labs Track at CLEF-2017