-
SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval
Authors:
Hossein A. Rahmani,
Xi Wang,
Emine Yilmaz,
Nick Craswell,
Bhaskar Mitra,
Paul Thomas
Abstract:
Large-scale test collections play a crucial role in Information Retrieval (IR) research. However, according to the Cranfield paradigm and the research into publicly available datasets, the existing information retrieval research studies are commonly developed on small-scale datasets that rely on human assessors for relevance judgments - a time-intensive and expensive process. Recent studies have s…
▽ More
Large-scale test collections play a crucial role in Information Retrieval (IR) research. However, according to the Cranfield paradigm and the research into publicly available datasets, the existing information retrieval research studies are commonly developed on small-scale datasets that rely on human assessors for relevance judgments - a time-intensive and expensive process. Recent studies have shown the strong capability of Large Language Models (LLMs) in producing reliable relevance judgments with human accuracy but at a greatly reduced cost. In this paper, to address the missing large-scale ad-hoc document retrieval dataset, we extend the TREC Deep Learning Track (DL) test collection via additional language model synthetic labels to enable researchers to test and evaluate their search systems at a large scale. Specifically, such a test collection includes more than 1,900 test queries from the previous years of tracks. We compare system evaluation with past human labels from past years and find that our synthetically created large-scale test collection can lead to highly correlated system rankings.
△ Less
Submitted 30 August, 2024; v1 submitted 29 August, 2024;
originally announced August 2024.
-
LLMJudge: LLMs for Relevance Judgments
Authors:
Hossein A. Rahmani,
Emine Yilmaz,
Nick Craswell,
Bhaskar Mitra,
Paul Thomas,
Charles L. A. Clarke,
Mohammad Aliannejadi,
Clemencia Siro,
Guglielmo Faggioli
Abstract:
The LLMJudge challenge is organized as part of the LLM4Eval workshop at SIGIR 2024. Test collections are essential for evaluating information retrieval (IR) systems. The evaluation and tuning of a search system is largely based on relevance labels, which indicate whether a document is useful for a specific search and user. However, collecting relevance judgments on a large scale is costly and reso…
▽ More
The LLMJudge challenge is organized as part of the LLM4Eval workshop at SIGIR 2024. Test collections are essential for evaluating information retrieval (IR) systems. The evaluation and tuning of a search system is largely based on relevance labels, which indicate whether a document is useful for a specific search and user. However, collecting relevance judgments on a large scale is costly and resource-intensive. Consequently, typical experiments rely on third-party labelers who may not always produce accurate annotations. The LLMJudge challenge aims to explore an alternative approach by using LLMs to generate relevance judgments. Recent studies have shown that LLMs can generate reliable relevance judgments for search systems. However, it remains unclear which LLMs can match the accuracy of human labelers, which prompts are most effective, how fine-tuned open-source LLMs compare to closed-source LLMs like GPT-4, whether there are biases in synthetically generated data, and if data leakage affects the quality of generated labels. This challenge will investigate these questions, and the collected data will be released as a package to support automatic relevance judgment research in information retrieval and search.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Report on the 1st Workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) at SIGIR 2024
Authors:
Hossein A. Rahmani,
Clemencia Siro,
Mohammad Aliannejadi,
Nick Craswell,
Charles L. A. Clarke,
Guglielmo Faggioli,
Bhaskar Mitra,
Paul Thomas,
Emine Yilmaz
Abstract:
The first edition of the workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) took place in July 2024, co-located with the ACM SIGIR Conference 2024 in the USA (SIGIR 2024). The aim was to bring information retrieval researchers together around the topic of LLMs for evaluation in information retrieval that gathered attention with the advancement of large languag…
▽ More
The first edition of the workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) took place in July 2024, co-located with the ACM SIGIR Conference 2024 in the USA (SIGIR 2024). The aim was to bring information retrieval researchers together around the topic of LLMs for evaluation in information retrieval that gathered attention with the advancement of large language models and generative AI. Given the novelty of the topic, the workshop was focused around multi-sided discussions, namely panels and poster sessions of the accepted proceedings papers.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Adaptive Retrieval-Augmented Generation for Conversational Systems
Authors:
Xi Wang,
Procheta Sen,
Ruizhe Li,
Emine Yilmaz
Abstract:
Despite the success of integrating large language models into the development of conversational systems, many studies have shown the effectiveness of retrieving and augmenting external knowledge for informative responses. Hence, many existing studies commonly assume the always need for Retrieval Augmented Generation (RAG) in a conversational system without explicit control. This raises a research…
▽ More
Despite the success of integrating large language models into the development of conversational systems, many studies have shown the effectiveness of retrieving and augmenting external knowledge for informative responses. Hence, many existing studies commonly assume the always need for Retrieval Augmented Generation (RAG) in a conversational system without explicit control. This raises a research question about such a necessity. In this study, we propose to investigate the need for each turn of system response to be augmented with external knowledge. In particular, by leveraging human judgements on the binary choice of adaptive augmentation, we develop RAGate, a gating model, which models conversation context and relevant inputs to predict if a conversational system requires RAG for improved responses. We conduct extensive experiments on devising and applying RAGate to conversational models and well-rounded analyses of different conversational scenarios. Our experimental results and analysis indicate the effective application of RAGate in RAG-based conversational systems in identifying system responses for appropriate RAG with high-quality responses and a high generation confidence. This study also identifies the correlation between the generation's confidence level and the relevance of the augmented knowledge.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
On Nonlinear Closures for Moment Equations Based on Orthogonal Polynomials
Authors:
Eda Yilmaz,
Georgii Oblapenko,
Manuel Torrilhon
Abstract:
In the present work, an approach to the moment closure problem on the basis of orthogonal polynomials derived from Gram matrices is proposed. Its properties are studied in the context of the moment closure problem arising in gas kinetic theory, for which the proposed approach is proven to have multiple attractive mathematical properties. Numerical studies are carried out for model gas particle dis…
▽ More
In the present work, an approach to the moment closure problem on the basis of orthogonal polynomials derived from Gram matrices is proposed. Its properties are studied in the context of the moment closure problem arising in gas kinetic theory, for which the proposed approach is proven to have multiple attractive mathematical properties. Numerical studies are carried out for model gas particle distributions and the approach is compared to other moment closure methods, such as Grad's closure and the maximum-entropy method. The proposed ``Gramian'' closure is shown to provide very accurate results for a wide range of distribution functions.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Understanding the Role of User Profile in the Personalization of Large Language Models
Authors:
Bin Wu,
Zhengyan Shi,
Hossein A. Rahmani,
Varsha Ramineni,
Emine Yilmaz
Abstract:
Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance the performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs remains unclear. This study first confirms that the effectiveness of user profiles is primarily due to personalization information rather than semantic information. Furthermore, we inves…
▽ More
Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance the performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs remains unclear. This study first confirms that the effectiveness of user profiles is primarily due to personalization information rather than semantic information. Furthermore, we investigate how user profiles affect the personalization of LLMs. Within the user profile, we reveal that it is the historical personalized response produced or approved by users that plays a pivotal role in personalizing LLMs. This discovery unlocks the potential of LLMs to incorporate a greater number of user profiles within the constraints of limited input length. As for the position of user profiles, we observe that user profiles integrated into different positions of the input context do not contribute equally to personalization. Instead, where the user profile that is closer to the beginning affects more on the personalization of LLMs. Our findings reveal the role of user profiles for the personalization of LLMs, and showcase how incorporating user profiles impacts performance providing insight to leverage user profiles effectively.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Location-based Radiology Report-Guided Semi-supervised Learning for Prostate Cancer Detection
Authors:
Alex Chen,
Nathan Lay,
Stephanie Harmon,
Kutsev Ozyoruk,
Enis Yilmaz,
Brad J. Wood,
Peter A. Pinto,
Peter L. Choyke,
Baris Turkbey
Abstract:
Prostate cancer is one of the most prevalent malignancies in the world. While deep learning has potential to further improve computer-aided prostate cancer detection on MRI, its efficacy hinges on the exhaustive curation of manually annotated images. We propose a novel methodology of semisupervised learning (SSL) guided by automatically extracted clinical information, specifically the lesion locat…
▽ More
Prostate cancer is one of the most prevalent malignancies in the world. While deep learning has potential to further improve computer-aided prostate cancer detection on MRI, its efficacy hinges on the exhaustive curation of manually annotated images. We propose a novel methodology of semisupervised learning (SSL) guided by automatically extracted clinical information, specifically the lesion locations in radiology reports, allowing for use of unannotated images to reduce the annotation burden. By leveraging lesion locations, we refined pseudo labels, which were then used to train our location-based SSL model. We show that our SSL method can improve prostate lesion detection by utilizing unannotated images, with more substantial impacts being observed when larger proportions of unannotated images are used.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Universal behavior of the Covid-19 tails: Inverse power-law distribution
Authors:
E. Aydiner,
E. Yilmaz
Abstract:
Power-law distribution is one of the most important laws known in nature. Such a special universal behavior is known to occur in very few physical systems. In this Letter, we analyzed the mortality distribution of the Covid-19 pandemic tails for different countries and continents to discuss the possible universal behavior of the pandemic. Surprisingly, we found that the mortality distribution of C…
▽ More
Power-law distribution is one of the most important laws known in nature. Such a special universal behavior is known to occur in very few physical systems. In this Letter, we analyzed the mortality distribution of the Covid-19 pandemic tails for different countries and continents to discuss the possible universal behavior of the pandemic. Surprisingly, we found that the mortality distribution of Covid-19 final i.e., the latest tails in 2023 follows inverse power-law decays. These universal behaviors for the pandemic are reported in the present work for the first time. Additionally, we showed that these mortality tails also decay with time obeying to the inverse power-law.
△ Less
Submitted 13 September, 2024; v1 submitted 30 May, 2024;
originally announced June 2024.
-
Instruction Tuning With Loss Over Instructions
Authors:
Zhengyan Shi,
Adam X. Yang,
Bin Wu,
Laurence Aitchison,
Emine Yilmaz,
Aldo Lipani
Abstract:
Instruction tuning plays a crucial role in shaping the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can…
▽ More
Instruction tuning plays a crucial role in shaping the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks (e.g., MMLU, TruthfulQA, and HumanEval) and open-ended generation benchmarks (e.g., MT-Bench and AlpacaEval). Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%. We identify two key factors influencing the effectiveness of IM: (1) The ratio between instruction length and output length in the training data; and (2) The number of training examples. We observe that IM is especially beneficial when trained on datasets with lengthy instructions paired with brief outputs, or under the Superficial Alignment Hypothesis (SAH) where a small amount of training examples are used for instruction tuning. Further analysis substantiates our hypothesis that our improvement can be attributed to reduced overfitting to instruction tuning datasets. It is worth noting that we are not proposing \ours as a replacement for current fine-tuning processes. Instead, our work aims to provide practical guidance for instruction tuning LMs, especially in low-resource scenarios.
△ Less
Submitted 2 October, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Applications of Quantum Machine Learning for Quantitative Finance
Authors:
Piotr Mironowicz,
Akshata Shenoy H.,
Antonio Mandarino,
A. Ege Yilmaz,
Thomas Ankenbrand
Abstract:
Machine learning and quantum machine learning (QML) have gained significant importance, as they offer powerful tools for tackling complex computational problems across various domains. This work gives an extensive overview of QML uses in quantitative finance, an important discipline in the financial industry. We examine the connection between quantum computing and machine learning in financial app…
▽ More
Machine learning and quantum machine learning (QML) have gained significant importance, as they offer powerful tools for tackling complex computational problems across various domains. This work gives an extensive overview of QML uses in quantitative finance, an important discipline in the financial industry. We examine the connection between quantum computing and machine learning in financial applications, spanning a range of use cases including fraud detection, underwriting, Value at Risk, stock market prediction, portfolio optimization, and option pricing by overviewing the corpus of literature concerning various financial subdomains.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Synthetic Test Collections for Retrieval Evaluation
Authors:
Hossein A. Rahmani,
Nick Craswell,
Emine Yilmaz,
Bhaskar Mitra,
Daniel Campos
Abstract:
Test collections play a vital role in evaluation of information retrieval (IR) systems. Obtaining a diverse set of user queries for test collection construction can be challenging, and acquiring relevance judgments, which indicate the appropriateness of retrieved documents to a query, is often costly and resource-intensive. Generating synthetic datasets using Large Language Models (LLMs) has recen…
▽ More
Test collections play a vital role in evaluation of information retrieval (IR) systems. Obtaining a diverse set of user queries for test collection construction can be challenging, and acquiring relevance judgments, which indicate the appropriateness of retrieved documents to a query, is often costly and resource-intensive. Generating synthetic datasets using Large Language Models (LLMs) has recently gained significant attention in various applications. In IR, while previous work exploited the capabilities of LLMs to generate synthetic queries or documents to augment training data and improve the performance of ranking models, using LLMs for constructing synthetic test collections is relatively unexplored. Previous studies demonstrate that LLMs have the potential to generate synthetic relevance judgments for use in the evaluation of IR systems. In this paper, we comprehensively investigate whether it is possible to use LLMs to construct fully synthetic test collections by generating not only synthetic judgments but also synthetic queries. In particular, we analyse whether it is possible to construct reliable synthetic test collections and the potential risks of bias such test collections may exhibit towards LLM-based models. Our experiments indicate that using LLMs it is possible to construct synthetic test collections that can reliably be used for retrieval evaluation.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking
Authors:
Stavros Orfanoudakis,
Cesar Diaz-Londono,
Yunus E. Yılmaz,
Peter Palensky,
Pedro P. Vergara
Abstract:
As electric vehicle (EV) numbers rise, concerns about the capacity of current charging and power grid infrastructure grow, necessitating the development of smart charging solutions. While many smart charging simulators have been developed in recent years, only a few support the development of Reinforcement Learning (RL) algorithms in the form of a Gym environment, and those that do usually lack de…
▽ More
As electric vehicle (EV) numbers rise, concerns about the capacity of current charging and power grid infrastructure grow, necessitating the development of smart charging solutions. While many smart charging simulators have been developed in recent years, only a few support the development of Reinforcement Learning (RL) algorithms in the form of a Gym environment, and those that do usually lack depth in modeling Vehicle-to-Grid (V2G) scenarios. To address the aforementioned issues, this paper introduces the EV2Gym, a realistic simulator platform for the development and assessment of small and large-scale smart charging algorithms within a standardized platform. The proposed simulator is populated with comprehensive EV, charging station, power transformer, and EV behavior models validated using real data. EV2Gym has a highly customizable interface empowering users to choose from pre-designed case studies or craft their own customized scenarios to suit their specific requirements. Moreover, it incorporates a diverse array of RL, mathematical programming, and heuristic algorithms to speed up the development and benchmarking of new solutions. By offering a unified and standardized platform, EV2Gym aims to provide researchers and practitioners with a robust environment for advancing and assessing smart charging algorithms.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples
Authors:
Eda Yilmaz,
Hacer Yalim Keles
Abstract:
We introduce Adversarial Sparse Teacher (AST), a robust defense method against distillation-based model stealing attacks. Our approach trains a teacher model using adversarial examples to produce sparse logit responses and increase the entropy of the output distribution. Typically, a model generates a peak in its output corresponding to its prediction. By leveraging adversarial examples, AST modif…
▽ More
We introduce Adversarial Sparse Teacher (AST), a robust defense method against distillation-based model stealing attacks. Our approach trains a teacher model using adversarial examples to produce sparse logit responses and increase the entropy of the output distribution. Typically, a model generates a peak in its output corresponding to its prediction. By leveraging adversarial examples, AST modifies the teacher model's original response, embedding a few altered logits into the output while keeping the primary response slightly higher. Concurrently, all remaining logits are elevated to further increase the output distribution's entropy. All these complex manipulations are performed using an optimization function with our proposed Exponential Predictive Divergence (EPD) loss function. EPD allows us to maintain higher entropy levels compared to traditional KL divergence, effectively confusing attackers. Experiments on CIFAR-10 and CIFAR-100 datasets demonstrate that AST outperforms state-of-the-art methods, providing effective defense against model stealing while preserving high accuracy. The source codes will be made publicly available here soon.
△ Less
Submitted 20 July, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
TurtleRabbit 2024 SSL Team Description Paper
Authors:
Linh Trinh,
Alif Anzuman,
Eric Batkhuu,
Dychen Chan,
Lisa Graf,
Darpan Gurung,
Tharunimm Jamal,
Jigme Namgyal,
Jason Ng,
Wing Lam Tsang,
X. Rosalind Wang,
Eren Yilmaz,
Oliver Obst
Abstract:
TurtleRabbit is a new RoboCup SSL team from Western Sydney University. This team description paper presents our approach in navigating some of the challenges in developing a new SSL team from scratch. SSL is dominated by teams with extensive experience and customised equipment that has been developed over many years. Here, we outline our approach in overcoming some of the complexities associated w…
▽ More
TurtleRabbit is a new RoboCup SSL team from Western Sydney University. This team description paper presents our approach in navigating some of the challenges in developing a new SSL team from scratch. SSL is dominated by teams with extensive experience and customised equipment that has been developed over many years. Here, we outline our approach in overcoming some of the complexities associated with replicating advanced open-sourced designs and managing the high costs of custom components. Opting for simplicity and cost-effectiveness, our strategy primarily employs off-the-shelf electronics components and ``hobby'' brushless direct current (BLDC) motors, complemented by 3D printing and CNC milling. This approach helped us to streamline the development process and, with our open-sourced hardware design, hopefully will also lower the bar for other teams to enter RoboCup SSL in the future. The paper details the specific hardware choices, their approximate costs, the integration of electronics and mechanics, and the initial steps taken in software development, for our entry into SSL that aims to be simple yet competitive.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness
Authors:
Hossein A. Rahmani,
Xi Wang,
Mohammad Aliannejadi,
Mohammadmehdi Naghiaei,
Emine Yilmaz
Abstract:
Clarifying questions are an integral component of modern information retrieval systems, directly impacting user satisfaction and overall system performance. Poorly formulated questions can lead to user frustration and confusion, negatively affecting the system's performance. This research addresses the urgent need to identify and leverage key features that contribute to the classification of clari…
▽ More
Clarifying questions are an integral component of modern information retrieval systems, directly impacting user satisfaction and overall system performance. Poorly formulated questions can lead to user frustration and confusion, negatively affecting the system's performance. This research addresses the urgent need to identify and leverage key features that contribute to the classification of clarifying questions, enhancing user satisfaction. To gain deeper insights into how different features influence user satisfaction, we conduct a comprehensive analysis, considering a broad spectrum of lexical, semantic, and statistical features, such as question length and sentiment polarity. Our empirical results provide three main insights into the qualities of effective query clarification: (1) specific questions are more effective than generic ones; (2) the subjectivity and emotional tone of a question play a role; and (3) shorter and more ambiguous queries benefit significantly from clarification. Based on these insights, we implement feature-integrated user satisfaction prediction using various classifiers, both traditional and neural-based, including random forest, BERT, and large language models. Our experiments show a consistent and significant improvement, particularly in traditional classifiers, with a minimum performance boost of 45\%. This study presents invaluable guidelines for refining the formulation of clarifying questions and enhancing both user satisfaction and system performance.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
A quantitative analysis of the effect of box size in N-body simulations of the matter power spectrum
Authors:
Maxim Eingorn,
Ezgi Yilmaz,
A. Emrah Yükselci,
Alexander Zhuk
Abstract:
We study the effect of box size on the matter power spectrum obtained via cosmological N-body simulations. Within the framework of the cosmic screening approach, we show that the relative deviation between the spectra for our largest comoving box with L = 5632 Mpc/h and those for L = 280, 560, 1680, 4480, 5120 Mpc/h boxes consistently increases with decreasing box size in the latter set in the red…
▽ More
We study the effect of box size on the matter power spectrum obtained via cosmological N-body simulations. Within the framework of the cosmic screening approach, we show that the relative deviation between the spectra for our largest comoving box with L = 5632 Mpc/h and those for L = 280, 560, 1680, 4480, 5120 Mpc/h boxes consistently increases with decreasing box size in the latter set in the redshift range $0\leq z\leq 80$ for the considered values. As an additional demonstrative example, at redshift zero, we determine the values $k_{1\%}$ corresponding to the modes at which relative deviations reach 1\%.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Benchmarking LLMs via Uncertainty Quantification
Authors:
Fanghua Ye,
Mingming Yang,
Jianhui Pang,
Longyue Wang,
Derek F. Wong,
Emine Yilmaz,
Shuming Shi,
Zhaopeng Tu
Abstract:
The proliferation of open-source Large Language Models (LLMs) from various institutions has highlighted the urgent need for comprehensive evaluation methods. However, current evaluation platforms, such as the widely recognized HuggingFace open LLM leaderboard, neglect a crucial aspect -- uncertainty, which is vital for thoroughly assessing LLMs. To bridge this gap, we introduce a new benchmarking…
▽ More
The proliferation of open-source Large Language Models (LLMs) from various institutions has highlighted the urgent need for comprehensive evaluation methods. However, current evaluation platforms, such as the widely recognized HuggingFace open LLM leaderboard, neglect a crucial aspect -- uncertainty, which is vital for thoroughly assessing LLMs. To bridge this gap, we introduce a new benchmarking approach for LLMs that integrates uncertainty quantification. Our examination involves eight LLMs (LLM series) spanning five representative natural language processing tasks. Our findings reveal that: I) LLMs with higher accuracy may exhibit lower certainty; II) Larger-scale LLMs may display greater uncertainty compared to their smaller counterparts; and III) Instruction-finetuning tends to increase the uncertainty of LLMs. These results underscore the significance of incorporating uncertainty in the evaluation of LLMs.
△ Less
Submitted 25 April, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
A Toolbox for Modelling Engagement with Educational Videos
Authors:
Yuxiang Qiu,
Karim Djemili,
Denis Elezi,
Aaneel Shalman,
María Pérez-Ortiz,
Emine Yilmaz,
John Shawe-Taylor,
Sahan Bulathwela
Abstract:
With the advancement and utility of Artificial Intelligence (AI), personalising education to a global population could be a cornerstone of new educational systems in the future. This work presents the PEEKC dataset and the TrueLearn Python library, which contains a dataset and a series of online learner state models that are essential to facilitate research on learner engagement modelling.TrueLear…
▽ More
With the advancement and utility of Artificial Intelligence (AI), personalising education to a global population could be a cornerstone of new educational systems in the future. This work presents the PEEKC dataset and the TrueLearn Python library, which contains a dataset and a series of online learner state models that are essential to facilitate research on learner engagement modelling.TrueLearn family of models was designed following the "open learner" concept, using humanly-intuitive user representations. This family of scalable, online models also help end-users visualise the learner models, which may in the future facilitate user interaction with their models/recommenders. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytics practitioners. The experiments show the utility of both the dataset and the library with predictive performance significantly exceeding comparative baseline models. The dataset contains a large amount of AI-related educational videos, which are of interest for building and validating AI-specific educational recommenders.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
A Grating Based High-Frequency Motion Stimulus Paradigm for Steady-State Motion Visual Evoked Potentials
Authors:
Bartu Atabek,
Efecan Yilmaz,
Cengiz Acarturk,
Murat Perit Cakir
Abstract:
Objective: This paper proposes a novel type of stimulus in the shape of sinusoidal gratings displayed with an imperceptibly high-frequency motion. The stimulus has been designed for use in BCI (Brain Computer Interface) applications that employ visually evoked potentials (VEPs) in an effort to mitigate discomfort associated with VEPs. The stimuli set included traditional VEP stimuli, already estab…
▽ More
Objective: This paper proposes a novel type of stimulus in the shape of sinusoidal gratings displayed with an imperceptibly high-frequency motion. The stimulus has been designed for use in BCI (Brain Computer Interface) applications that employ visually evoked potentials (VEPs) in an effort to mitigate discomfort associated with VEPs. The stimuli set included traditional VEP stimuli, already established in the literature, allowing comparative analyses. We conducted analyses of signal distinction measures by calculating the signal-to-noise ratio and the classification performance of its evoked potentials. Methods: Fourteen participants were seated in a dimly lit room facing a display. Participants' fixation on the central stimulus was controlled by means of a desktop eye tracker. Participants attended a flicker-based steady-state VEP (SSVEP) task, a motion-based steady-state-motion VEP (SSMVEP) task, and the novel stimulus task (the imperceptible grating SSMVEP). Participants were asked to complete behavioral fatigue scale tasks. Results: A significant effect of stimulus type was observed, accompanied by insignificant differences in prediction accuracy. Partially significant task effects were obtained in fatigue scale tasks. Conclusion: The study revealed that the imperceptible grating SSMVEP stimulus successfully evoked SSMVEP responses within acceptable margins in the related cortical regions. This novel stimulus contributes to BCI research by providing an imperceptible interface, improving already established stimuli design in the SSVEP and the SSMVEP literature. Significance: The present paper provides a novel SSMVEP stimulus type that may inform the future design of effective VEP-based BCI paradigms that allow seamless interaction with computer interfaces.
△ Less
Submitted 17 May, 2024; v1 submitted 25 December, 2023;
originally announced December 2023.
-
Direct Fabrication of Atomically Defined Pores in MXenes
Authors:
Matthew G. Boebinger,
Dundar E. Yilmaz,
Ayana Ghosh,
Sudhajit Misra,
Tyler S. Mathis,
Sergei V. Kalinin,
Stephen Jesse,
Yury Gogotsi,
Adri C. T. van Duin,
Raymond R. Unocic
Abstract:
Controlled fabrication of nanopores in atomically thin two-dimensional material offers the means to create robust membranes needed for ion transport, nanofiltration, and DNA sensing. Techniques for creating nanopores have relied upon either plasma etching or direct irradiation using electrons or ions; however, aberration-corrected scanning transmission electron microscopy (STEM) offers the advanta…
▽ More
Controlled fabrication of nanopores in atomically thin two-dimensional material offers the means to create robust membranes needed for ion transport, nanofiltration, and DNA sensing. Techniques for creating nanopores have relied upon either plasma etching or direct irradiation using electrons or ions; however, aberration-corrected scanning transmission electron microscopy (STEM) offers the advantage of combining a highly energetic, sub-angstrom sized electron beam for atomic manipulation along with atomic resolution imaging. Here, we utilize a method for automated nanopore fabrication with real-time atomic visualization to enhance our mechanistic understanding of beam-induced transformations. Additionally, an electron beam simulation technique, Electron-Beam Simulator (E-BeamSim) was developed to observe the atomic movements and interactions resulting from electron beam irradiation. Using the 2D MXene Ti3C2Tx, we explore the influence of temperature on nanopore fabrication by tracking atomic transformation pathways and find that at room temperature, electron beam irradiation induces random displacement of atoms and results in a pileup of titanium atoms at the nanopore edge. This pileup was confirmed and demonstrated in E-BeamSim simulations around the small, milled area in the MXene monolayer. At elevated temperatures, the surface functional groups on MXene are effectively removed, and the mobility of atoms increases, which results in atomic transformations that lead to the selective removal of atoms layer by layer. Through controllable manufacture using e-beam milling fabrication, the production and then characterization of the fabricated defects can be better understood for future work. This work can lead to the development of defect engineering techniques within functionalized MXene layers.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation
Authors:
Xi Wang,
Hossein A. Rahmani,
Jiqun Liu,
Emine Yilmaz
Abstract:
Conversational Recommendation System (CRS) is a rapidly growing research area that has gained significant attention alongside advancements in language modelling techniques. However, the current state of conversational recommendation faces numerous challenges due to its relative novelty and limited existing contributions. In this study, we delve into benchmark datasets for developing CRS models and…
▽ More
Conversational Recommendation System (CRS) is a rapidly growing research area that has gained significant attention alongside advancements in language modelling techniques. However, the current state of conversational recommendation faces numerous challenges due to its relative novelty and limited existing contributions. In this study, we delve into benchmark datasets for developing CRS models and address potential biases arising from the feedback loop inherent in multi-turn interactions, including selection bias and multiple popularity bias variants. Drawing inspiration from the success of generative data via using language models and data augmentation techniques, we present two novel strategies, 'Once-Aug' and 'PopNudge', to enhance model performance while mitigating biases. Through extensive experiments on ReDial and TG-ReDial benchmark datasets, we show a consistent improvement of CRS techniques with our data augmentation approaches and offer additional insights on addressing multiple newly formulated biases.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting
Authors:
Fanghua Ye,
Meng Fang,
Shenghui Li,
Emine Yilmaz
Abstract:
Query rewriting plays a vital role in enhancing conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverage human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large languag…
▽ More
Query rewriting plays a vital role in enhancing conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverage human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large language models (LLMs) as query rewriters, enabling the generation of informative query rewrites through well-designed instructions. We define four essential properties for well-formed rewrites and incorporate all of them into the instruction. In addition, we introduce the role of rewrite editors for LLMs when initial query rewrites are available, forming a "rewrite-then-edit" process. Furthermore, we propose distilling the rewriting capabilities of LLMs into smaller models to reduce rewriting latency. Our experimental evaluation on the QReCC dataset demonstrates that informative query rewrites can yield substantially improved retrieval performance compared to human rewrites, especially with sparse retrievers.
△ Less
Submitted 18 October, 2023; v1 submitted 14 October, 2023;
originally announced October 2023.
-
Magnetically Levitated Microrobotic Mixer
Authors:
Ecenur Can Yılmaz,
Abdurrahim Yılmaz,
Ali Anıl Demirçalı,
Efehan Topçu,
Lila Kaman,
Hüseyin Üvet
Abstract:
Microfluidic systems, when combined with microrobots, offer enhanced precision in chemical synthesis by precisely controlling reaction conditions. These systems, when integrated with analytical tools, allow for real-time monitoring and are cost-efficient due to their minimal volume requirements, thereby reducing risks associated with hazardous chemicals. In our study, we have investigated the mixi…
▽ More
Microfluidic systems, when combined with microrobots, offer enhanced precision in chemical synthesis by precisely controlling reaction conditions. These systems, when integrated with analytical tools, allow for real-time monitoring and are cost-efficient due to their minimal volume requirements, thereby reducing risks associated with hazardous chemicals. In our study, we have investigated the mixing efficiency of Thymolphthalein indicator with NaOH solution in a magnetically levitated microrobotic mixer. A PMMA microfluidic chip was used to transfer fluid containing two different solutions and achieve fast and efficient mixing. By adjusting five different flow rates and altering the rotational speeds of the microrobots, the mixing efficiency was observed. The studies were carried out under the laminar regime, with incompressible Newtonian flow rates and varying actuator speeds. The measurement of mixing efficiency was accomplished through the calculation of changes in pixel intensity observed in microscopic images acquired throughout the mixing process. The presence of the microrobots resulted in the best efficiency at 80.37% at 500 rpm and 7 mL/min flow rate. Their potential in advanced reactions, such as nanoparticle synthesis and encapsulation, suggests promising avenues for improving product yields.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Performance of a plastic scintillator developed using styrene monomer polymerization
Authors:
A. Sadigov,
F. Ahmadov,
G. Ahmadov,
E. Aksu,
D. Berikov,
S. Nuruyev,
R. Akbarov,
M. Holik,
J. Nagiyev,
S. Gurbuz Guner,
A. Mammadli,
N. Suleymanova,
C. Abbasova,
S. Melikova,
E. Yilmaz,
O. Tagiyev,
S. Lyubchyk,
Z. Sadygov
Abstract:
This paper presents a newly developed plastic scintillator produced in collaboration with Turkiye Energy, Nuclear and Mineral Research Agency (TENMAK). The scintillator is manufactured using thermal polymerization of commercially available styrene monomer. The absorption spectrum of the scintillator exhibited two absorption bands at 225 nm and 340 nm, with an absorption edge observed at 410 nm. Th…
▽ More
This paper presents a newly developed plastic scintillator produced in collaboration with Turkiye Energy, Nuclear and Mineral Research Agency (TENMAK). The scintillator is manufactured using thermal polymerization of commercially available styrene monomer. The absorption spectrum of the scintillator exhibited two absorption bands at 225 nm and 340 nm, with an absorption edge observed at 410 nm. The wavelength of the emitted light was measured in the range of 400-800 nm, with a maximum intensity at 427 nm. Monoenergetic electrons from the 137Cs source were used to evaluate the characteristics of the new scintillator, particularly its light yield. As the light readout the MAPD-3NM type silicon photomultiplier array (4 x 4) with an active area of 15 x 15 mm2, assembled using single MAPDs with an active area of 3.7 x 3.7 mm2, was used. The light yield of the scintillator was determined to be 6134 photons/MeV. In addition, the efficiency of the scintillator for gamma rays with an energy of 662 keV was found to be approximately 1.8 %. A CmBe neutron source was employed to evaluate its fast neutron detection performance. However, neutron/gamma discrimination using pulse shape discrimination (charge integration) method was not observed. The results demonstrate the potential of a newly produced plastic scintillator for various applications, particularly in radiation monitoring and detection systems.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Mass density vs. energy density at cosmological scales
Authors:
Maxim Eingorn,
Ezgi Yilmaz,
A. Emrah Yükselci,
Alexander Zhuk
Abstract:
In the presence of the gravitational field, the energy density of matter no longer coincides with its mass density. A discrepancy exists, of course, also between the associated power spectra. Within the $Λ$CDM model, we derive a formula that relates the power spectrum of the energy density to that of the mass density and test it with the help of N-body simulations run in comoving boxes of 2.816 Gp…
▽ More
In the presence of the gravitational field, the energy density of matter no longer coincides with its mass density. A discrepancy exists, of course, also between the associated power spectra. Within the $Λ$CDM model, we derive a formula that relates the power spectrum of the energy density to that of the mass density and test it with the help of N-body simulations run in comoving boxes of 2.816 Gpc/$h$. The results confirm the validity of the derived formula and simultaneously show that the power spectra diverge significantly from one another at large cosmological scales.
△ Less
Submitted 5 April, 2024; v1 submitted 6 September, 2023;
originally announced September 2023.
-
Preventing Others from Commercializing Your Innovation: Evidence from Creative Commons Licenses
Authors:
Erdem Dogukan Yilmaz,
Tim Meyer,
Milan Miric
Abstract:
Online innovation communities are an important source of innovation for many organizations. While contributions to such communities are typically made without financial compensation, these contributions are often governed by licenses such as Creative Commons that may prevent others from building upon and commercializing them. While this can diminish the usefulness of contributions, there is limite…
▽ More
Online innovation communities are an important source of innovation for many organizations. While contributions to such communities are typically made without financial compensation, these contributions are often governed by licenses such as Creative Commons that may prevent others from building upon and commercializing them. While this can diminish the usefulness of contributions, there is limited work analyzing what leads individuals to impose restrictions on the use of their work. In this paper, we examine innovators imposing restrictive licenses within the 3D-printable design community Thingiverse. Our analyses suggest that innovators are more likely to restrict commercialization of their contributions as their reputation increases and when reusing contributions created by others. These findings contribute to innovation communities and the growing literature on property rights in digital markets.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Grover Search for Portfolio Selection
Authors:
A. Ege Yilmaz,
Stefan Stettler,
Thomas Ankenbrand,
Urs Rhyner
Abstract:
We present explicit oracles designed to be used in Grover's algorithm to match investor preferences. Specifically, the oracles select portfolios with returns and standard deviations exceeding and falling below certain thresholds, respectively. One potential use case for the oracles is selecting portfolios with the best Sharpe ratios. We have implemented these algorithms using quantum simulators.
We present explicit oracles designed to be used in Grover's algorithm to match investor preferences. Specifically, the oracles select portfolios with returns and standard deviations exceeding and falling below certain thresholds, respectively. One potential use case for the oracles is selecting portfolios with the best Sharpe ratios. We have implemented these algorithms using quantum simulators.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Multi-Modal Multi-Task (3MT) Road Segmentation
Authors:
Erkan Milli,
Özgür Erkent,
Asım Egemen Yılmaz
Abstract:
Multi-modal systems have the capacity of producing more reliable results than systems with a single modality in road detection due to perceiving different aspects of the scene. We focus on using raw sensor inputs instead of, as it is typically done in many SOTA works, leveraging architectures that require high pre-processing costs such as surface normals or dense depth predictions. By using raw se…
▽ More
Multi-modal systems have the capacity of producing more reliable results than systems with a single modality in road detection due to perceiving different aspects of the scene. We focus on using raw sensor inputs instead of, as it is typically done in many SOTA works, leveraging architectures that require high pre-processing costs such as surface normals or dense depth predictions. By using raw sensor inputs, we aim to utilize a low-cost model thatminimizes both the pre-processing andmodel computation costs. This study presents a cost-effective and highly accurate solution for road segmentation by integrating data from multiple sensorswithin a multi-task learning architecture.Afusion architecture is proposed in which RGB and LiDAR depth images constitute the inputs of the network. Another contribution of this study is to use IMU/GNSS (inertial measurement unit/global navigation satellite system) inertial navigation system whose data is collected synchronously and calibrated with a LiDAR-camera to compute aggregated dense LiDAR depth images. It has been demonstrated by experiments on the KITTI dataset that the proposed method offers fast and high-performance solutions. We have also shown the performance of our method on Cityscapes where raw LiDAR data is not available. The segmentation results obtained for both full and half resolution images are competitive with existing methods. Therefore, we conclude that our method is not dependent only on raw LiDAR data; rather, it can be used with different sensor modalities. The inference times obtained in all experiments are very promising for real-time experiments.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Suppression of matter density growth at scales exceeding the cosmic screening length
Authors:
Maxim Eingorn,
Ezgi Yilmaz,
A. Emrah Yükselci,
Alexander Zhuk
Abstract:
One of the main objectives of modern cosmology is to explain the origin and evolution of cosmic structures at different scales. The principal force responsible for the formation of such structures is gravity. In a general relativistic framework, we have shown that matter density contrasts do not grow over time at scales exceeding the cosmic screening length, which corresponds to a cosmological sca…
▽ More
One of the main objectives of modern cosmology is to explain the origin and evolution of cosmic structures at different scales. The principal force responsible for the formation of such structures is gravity. In a general relativistic framework, we have shown that matter density contrasts do not grow over time at scales exceeding the cosmic screening length, which corresponds to a cosmological scale of the order of two to three gigaparsecs at the present time, at which gravitational interactions exhibit an exponential cut-off. This is a purely relativistic effect. To demonstrate the suppression of density growth, we have performed N-body simulations in a box with a comoving size of $5.632\,{\rm Gpc}/h$ and obtained the power spectrum of the mass density contrast. We have shown that it becomes independent of time for scales beyond the cosmic screening length as a clear manifestation of the cosmic screening effect.
△ Less
Submitted 28 May, 2024; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues
Authors:
Yue Feng,
Yunlong Jiao,
Animesh Prasad,
Nikolaos Aletras,
Emine Yilmaz,
Gabriella Kazai
Abstract:
User Satisfaction Modeling (USM) is one of the popular choices for task-oriented dialogue systems evaluation, where user satisfaction typically depends on whether the user's task goals were fulfilled by the system. Task-oriented dialogue systems use task schema, which is a set of task attributes, to encode the user's task goals. Existing studies on USM neglect explicitly modeling the user's task g…
▽ More
User Satisfaction Modeling (USM) is one of the popular choices for task-oriented dialogue systems evaluation, where user satisfaction typically depends on whether the user's task goals were fulfilled by the system. Task-oriented dialogue systems use task schema, which is a set of task attributes, to encode the user's task goals. Existing studies on USM neglect explicitly modeling the user's task goals fulfillment using the task schema. In this paper, we propose SG-USM, a novel schema-guided user satisfaction modeling framework. It explicitly models the degree to which the user's preferences regarding the task attributes are fulfilled by the system for predicting the user's satisfaction level. SG-USM employs a pre-trained language model for encoding dialogue context and task attributes. Further, it employs a fulfillment representation layer for learning how many task attributes have been fulfilled in the dialogue, an importance predictor component for calculating the importance of task attributes. Finally, it predicts the user satisfaction based on task attribute fulfillment and task attribute importance. Experimental results on benchmark datasets (i.e. MWOZ, SGD, ReDial, and JDDC) show that SG-USM consistently outperforms competitive existing methods. Our extensive analysis demonstrates that SG-USM can improve the interpretability of user satisfaction modeling, has good scalability as it can effectively deal with unseen tasks and can also effectively work in low-resource settings by leveraging unlabeled data.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
A Survey on Asking Clarification Questions Datasets in Conversational Systems
Authors:
Hossein A. Rahmani,
Xi Wang,
Yue Feng,
Qiang Zhang,
Emine Yilmaz,
Aldo Lipani
Abstract:
The ability to understand a user's underlying needs is critical for conversational systems, especially with limited input from users in a conversation. Thus, in such a domain, Asking Clarification Questions (ACQs) to reveal users' true intent from their queries or utterances arise as an essential task. However, it is noticeable that a key limitation of the existing ACQs studies is their incomparab…
▽ More
The ability to understand a user's underlying needs is critical for conversational systems, especially with limited input from users in a conversation. Thus, in such a domain, Asking Clarification Questions (ACQs) to reveal users' true intent from their queries or utterances arise as an essential task. However, it is noticeable that a key limitation of the existing ACQs studies is their incomparability, from inconsistent use of data, distinct experimental setups and evaluation strategies. Therefore, in this paper, to assist the development of ACQs techniques, we comprehensively analyse the current ACQs research status, which offers a detailed comparison of publicly available datasets, and discusses the applied evaluation metrics, joined with benchmarks for multiple ACQs-related tasks. In particular, given a thorough analysis of the ACQs task, we discuss a number of corresponding research directions for the investigation of ACQs as well as the development of conversational systems.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Towards Asking Clarification Questions for Information Seeking on Task-Oriented Dialogues
Authors:
Yue Feng,
Hossein A. Rahmani,
Aldo Lipani,
Emine Yilmaz
Abstract:
Task-oriented dialogue systems aim at providing users with task-specific services. Users of such systems often do not know all the information about the task they are trying to accomplish, requiring them to seek information about the task. To provide accurate and personalized task-oriented information seeking results, task-oriented dialogue systems need to address two potential issues: 1) users' i…
▽ More
Task-oriented dialogue systems aim at providing users with task-specific services. Users of such systems often do not know all the information about the task they are trying to accomplish, requiring them to seek information about the task. To provide accurate and personalized task-oriented information seeking results, task-oriented dialogue systems need to address two potential issues: 1) users' inability to describe their complex information needs in their requests; and 2) ambiguous/missing information the system has about the users. In this paper, we propose a new Multi-Attention Seq2Seq Network, named MAS2S, which can ask questions to clarify the user's information needs and the user's profile in task-oriented information seeking. We also extend an existing dataset for task-oriented information seeking, leading to the \ourdataset which contains about 100k task-oriented information seeking dialogues that are made publicly available\footnote{Dataset and code is available at \href{https://github.com/sweetalyssum/clarit}{https://github.com/sweetalyssum/clarit}.}. Experimental results on \ourdataset show that MAS2S outperforms baselines on both clarification question generation and answer prediction.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Rethinking Semi-supervised Learning with Language Models
Authors:
Zhengxiang Shi,
Francesco Tonolini,
Nikolaos Aletras,
Emine Yilmaz,
Gabriella Kazai,
Yunlong Jiao
Abstract:
Semi-supervised learning (SSL) is a popular setting aiming to effectively utilize unlabelled data to improve model performance in downstream natural language processing (NLP) tasks. Currently, there are two popular approaches to make use of unlabelled data: Self-training (ST) and Task-adaptive pre-training (TAPT). ST uses a teacher model to assign pseudo-labels to the unlabelled data, while TAPT c…
▽ More
Semi-supervised learning (SSL) is a popular setting aiming to effectively utilize unlabelled data to improve model performance in downstream natural language processing (NLP) tasks. Currently, there are two popular approaches to make use of unlabelled data: Self-training (ST) and Task-adaptive pre-training (TAPT). ST uses a teacher model to assign pseudo-labels to the unlabelled data, while TAPT continues pre-training on the unlabelled data before fine-tuning. To the best of our knowledge, the effectiveness of TAPT in SSL tasks has not been systematically studied, and no previous work has directly compared TAPT and ST in terms of their ability to utilize the pool of unlabelled data. In this paper, we provide an extensive empirical study comparing five state-of-the-art ST approaches and TAPT across various NLP tasks and data sizes, including in- and out-of-domain settings. Surprisingly, we find that TAPT is a strong and more robust SSL learner, even when using just a few hundred unlabelled samples or in the presence of domain shifts, compared to more sophisticated ST approaches, and tends to bring greater improvements in SSL than in fully-supervised settings. Our further analysis demonstrates the risks of using ST approaches when the size of labelled or unlabelled data is small or when domain shifts exist. We offer a fresh perspective for future SSL research, suggesting the use of unsupervised pre-training objectives over dependency on pseudo labels.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Modeling User Satisfaction Dynamics in Dialogue via Hawkes Process
Authors:
Fanghua Ye,
Zhiyuan Hu,
Emine Yilmaz
Abstract:
Dialogue systems have received increasing attention while automatically evaluating their performance remains challenging. User satisfaction estimation (USE) has been proposed as an alternative. It assumes that the performance of a dialogue system can be measured by user satisfaction and uses an estimator to simulate users. The effectiveness of USE depends heavily on the estimator. Existing estimat…
▽ More
Dialogue systems have received increasing attention while automatically evaluating their performance remains challenging. User satisfaction estimation (USE) has been proposed as an alternative. It assumes that the performance of a dialogue system can be measured by user satisfaction and uses an estimator to simulate users. The effectiveness of USE depends heavily on the estimator. Existing estimators independently predict user satisfaction at each turn and ignore satisfaction dynamics across turns within a dialogue. In order to fully simulate users, it is crucial to take satisfaction dynamics into account. To fill this gap, we propose a new estimator ASAP (sAtisfaction eStimation via HAwkes Process) that treats user satisfaction across turns as an event sequence and employs a Hawkes process to effectively model the dynamics in this sequence. Experimental results on four benchmark dialogue datasets demonstrate that ASAP can substantially outperform state-of-the-art baseline estimators.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
Scalable Educational Question Generation with Pre-trained Language Models
Authors:
Sahan Bulathwela,
Hamze Muse,
Emine Yilmaz
Abstract:
The automatic generation of educational questions will play a key role in scaling online education, enabling self-assessment at scale when a global population is manoeuvring their personalised learning journeys. We develop \textit{EduQG}, a novel educational question generation model built by adapting a large language model. Our extensive experiments demonstrate that \textit{EduQG} can produce sup…
▽ More
The automatic generation of educational questions will play a key role in scaling online education, enabling self-assessment at scale when a global population is manoeuvring their personalised learning journeys. We develop \textit{EduQG}, a novel educational question generation model built by adapting a large language model. Our extensive experiments demonstrate that \textit{EduQG} can produce superior educational questions by further pre-training and fine-tuning a pre-trained language model on the scientific text and science question data.
△ Less
Submitted 13 May, 2023;
originally announced May 2023.
-
Query-specific Variable Depth Pooling via Query Performance Prediction towards Reducing Relevance Assessment Effort
Authors:
Debasis Ganguly,
Emine Yilmaz
Abstract:
Due to the massive size of test collections, a standard practice in IR evaluation is to construct a 'pool' of candidate relevant documents comprised of the top-k documents retrieved by a wide range of different retrieval systems - a process called depth-k pooling. A standard practice is to set the depth (k) to a constant value for each query constituting the benchmark set. However, in this paper w…
▽ More
Due to the massive size of test collections, a standard practice in IR evaluation is to construct a 'pool' of candidate relevant documents comprised of the top-k documents retrieved by a wide range of different retrieval systems - a process called depth-k pooling. A standard practice is to set the depth (k) to a constant value for each query constituting the benchmark set. However, in this paper we argue that the annotation effort can be substantially reduced if the depth of the pool is made a variable quantity for each query, the rationale being that the number of documents relevant to the information need can widely vary across queries. Our hypothesis is that a lower depth for the former class of queries and a higher depth for the latter can potentially reduce the annotation effort without a significant change in retrieval effectiveness evaluation. We make use of standard query performance prediction (QPP) techniques to estimate the number of potentially relevant documents for each query, which is then used to determine the depth of the pool. Our experiments conducted on standard test collections demonstrate that this proposed method of employing query-specific variable depths is able to adequately reflect the relative effectiveness of IR systems with a substantially smaller annotation effort.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Task2KB: A Public Task-Oriented Knowledge Base
Authors:
Procheta Sen,
Xi Wang,
Ruiqing Xu,
Emine Yilmaz
Abstract:
Search engines and conversational assistants are commonly used to help users complete their every day tasks such as booking travel, cooking, etc. While there are some existing datasets that can be used for this purpose, their coverage is limited to very few domains. In this paper, we propose a novel knowledge base, 'Task2KB', which is constructed using data crawled from WikiHow, an online knowledg…
▽ More
Search engines and conversational assistants are commonly used to help users complete their every day tasks such as booking travel, cooking, etc. While there are some existing datasets that can be used for this purpose, their coverage is limited to very few domains. In this paper, we propose a novel knowledge base, 'Task2KB', which is constructed using data crawled from WikiHow, an online knowledge resource offering instructional articles on a wide range of tasks. Task2KB encapsulates various types of task-related information and attributes, such as requirements, detailed step description, and available methods to complete tasks. Due to its higher coverage compared to existing related knowledge graphs, Task2KB can be highly useful in the development of general purpose task completion assistants
△ Less
Submitted 24 January, 2023;
originally announced February 2023.
-
Atomic-scale modeling of the thermal decomposition of titanium(IV)-isopropoxide
Authors:
Benazir Fazlioglu Yalcin,
Dundar E. Yilmaz,
Adri CT van Duin,
Roman Engel-Herbert
Abstract:
The metal-organic (MO) compound titanium(IV)-isopropoxide (Ti(OiPr)4, TTIP) has tremendous technological relevance for thin film growth and coating technologies, offering a low-temperature deposition route for titania and titanium-oxide-based compounds. Thermal decomposition via the release of organic ligands, a key process in any TTIP-based synthesis approach, is commonly assumed to take place on…
▽ More
The metal-organic (MO) compound titanium(IV)-isopropoxide (Ti(OiPr)4, TTIP) has tremendous technological relevance for thin film growth and coating technologies, offering a low-temperature deposition route for titania and titanium-oxide-based compounds. Thermal decomposition via the release of organic ligands, a key process in any TTIP-based synthesis approach, is commonly assumed to take place only via the beta-hydride elimination process. Here, we present reactive force field molecular dynamics (ReaxFF-MD) and metadynamics simulations that challenge this conventionally assumed scenario by revealing different, energetically preferred reaction pathways. The complete reaction scheme for the TTIP thermolysis, along with the statistics for the different ligand liberation steps and the associated reaction barriers for the bond dissociation events is presented. ReaxFF-MD simulations performed in the dilute limit realistically capture typical thin film deposition conditions, which in combination with metadynamics data, which produces free energies, constitutes a very powerful tool to quantitatively analyze the reaction dynamics of MO-based thin film growth processes and provide an atomic-scale understanding of how the remaining organic ligands detach from different titanium-containing MO fragments. The approach presented here allows for effective and straightforward identification of the undesirable temperature biasing effects in ReaxFF-MD and represents a predictive framework to identify chemical reaction pathways relevant to film growth processes at the atomic scale under realistic, experimentally relevant conditions. It enables computationally informed engineering of MO molecules with tailored decomposition and reaction pathways, and thus rapid and cost-effective advancements in MO molecule design for existing and future applications of thin film deposition and coating processes.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Pre-Training With Scientific Text Improves Educational Question Generation
Authors:
Hamze Muse,
Sahan Bulathwela,
Emine Yilmaz
Abstract:
With the boom of digital educational materials and scalable e-learning systems, the potential for realising AI-assisted personalised learning has skyrocketed. In this landscape, the automatic generation of educational questions will play a key role, enabling scalable self-assessment when a global population is manoeuvring their personalised learning journeys. We develop EduQG, a novel educational…
▽ More
With the boom of digital educational materials and scalable e-learning systems, the potential for realising AI-assisted personalised learning has skyrocketed. In this landscape, the automatic generation of educational questions will play a key role, enabling scalable self-assessment when a global population is manoeuvring their personalised learning journeys. We develop EduQG, a novel educational question generation model built by adapting a large language model. Our initial experiments demonstrate that EduQG can produce superior educational questions by pre-training on scientific text.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
MetaASSIST: Robust Dialogue State Tracking with Meta Learning
Authors:
Fanghua Ye,
Xi Wang,
Jie Huang,
Shenghui Li,
Samuel Stern,
Emine Yilmaz
Abstract:
Existing dialogue datasets contain lots of noise in their state annotations. Such noise can hurt model training and ultimately lead to poor generalization performance. A general framework named ASSIST has recently been proposed to train robust dialogue state tracking (DST) models. It introduces an auxiliary model to generate pseudo labels for the noisy training set. These pseudo labels are combine…
▽ More
Existing dialogue datasets contain lots of noise in their state annotations. Such noise can hurt model training and ultimately lead to poor generalization performance. A general framework named ASSIST has recently been proposed to train robust dialogue state tracking (DST) models. It introduces an auxiliary model to generate pseudo labels for the noisy training set. These pseudo labels are combined with vanilla labels by a common fixed weighting parameter to train the primary DST model. Notwithstanding the improvements of ASSIST on DST, tuning the weighting parameter is challenging. Moreover, a single parameter shared by all slots and all instances may be suboptimal. To overcome these limitations, we propose a meta learning-based framework MetaASSIST to adaptively learn the weighting parameter. Specifically, we propose three schemes with varying degrees of flexibility, ranging from slot-wise to both slot-wise and instance-wise, to convert the weighting parameter into learnable functions. These functions are trained in a meta-learning manner by taking the validation set as meta data. Experimental results demonstrate that all three schemes can achieve competitive performance. Most impressively, we achieve a state-of-the-art joint goal accuracy of 80.10% on MultiWOZ 2.4.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
Just Mix Once: Worst-group Generalization by Group Interpolation
Authors:
Giorgio Giannone,
Serhii Havrylov,
Jordan Massiah,
Emine Yilmaz,
Yunlong Jiao
Abstract:
Advances in deep learning theory have revealed how average generalization relies on superficial patterns in data. The consequences are brittle models with poor performance with shift in group distribution at test time. When group annotation is available, we can use robust optimization tools to tackle the problem. However, identification and annotation are time-consuming, especially on large datase…
▽ More
Advances in deep learning theory have revealed how average generalization relies on superficial patterns in data. The consequences are brittle models with poor performance with shift in group distribution at test time. When group annotation is available, we can use robust optimization tools to tackle the problem. However, identification and annotation are time-consuming, especially on large datasets. A recent line of work leverages self-supervision and oversampling to improve generalization on minority groups without group annotation. We propose to unify and generalize these approaches using a class-conditional variant of mixup tailored for worst-group generalization. Our approach, Just Mix Once (JM1), interpolates samples during learning, augmenting the training distribution with a continuous mixture of groups. JM1 is domain agnostic and computationally efficient, can be used with any level of group annotation, and performs on par or better than the state-of-the-art on worst-group generalization. Additionally, we provide a simple explanation of why JM1 works.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Evaluation Metrics for Measuring Bias in Search Engine Results
Authors:
Gizem Gezici,
Aldo Lipani,
Yucel Saygin,
Emine Yilmaz
Abstract:
Search engines decide what we see for a given search query. Since many people are exposed to information through search engines, it is fair to expect that search engines are neutral. However, search engine results do not necessarily cover all the viewpoints of a search query topic, and they can be biased towards a specific view since search engine results are returned based on relevance, which is…
▽ More
Search engines decide what we see for a given search query. Since many people are exposed to information through search engines, it is fair to expect that search engines are neutral. However, search engine results do not necessarily cover all the viewpoints of a search query topic, and they can be biased towards a specific view since search engine results are returned based on relevance, which is calculated using many features and sophisticated algorithms where search neutrality is not necessarily the focal point. Therefore, it is important to evaluate the search engine results with respect to bias. In this work we propose novel web search bias evaluation measures which take into account the rank and relevance. We also propose a framework to evaluate web search bias using the proposed measures and test our framework on two popular search engines based on 57 controversial query topics such as abortion, medical marijuana, and gay marriage. We measure the stance bias (in support or against), as well as the ideological bias (conservative or liberal). We observe that the stance does not necessarily correlate with the ideological leaning, e.g. a positive stance on abortion indicates a liberal leaning but a positive stance on Cuba embargo indicates a conservative leaning. Our experiments show that neither of the search engines suffers from stance bias. However, both search engines suffer from ideological bias, both favouring one ideological leaning to the other, which is more significant from the perspective of polarisation in our society.
△ Less
Submitted 3 February, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Composition laws on the Fricke surface and Markov triples
Authors:
A. Muhammed Uludağ,
Esra Ünal Yılmaz
Abstract:
We determine some composition laws related to the Fricke surface and the "double" Fricke surface. This latter surface admits the squares of Markov triples as its solutions.
We determine some composition laws related to the Fricke surface and the "double" Fricke surface. This latter surface admits the squares of Markov triples as its solutions.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
Quantizations of Continued Fractions
Authors:
A. Muhammed Uludağ,
Esra Ünal Yilmaz
Abstract:
We introduce a four-parameter deformation of continued fractions, which we call $ U $-deformation. We study some particular cases and compare them with the q-deformation of continued fractions introduce recently by Morier-Genoud and Ovsienko.
We introduce a four-parameter deformation of continued fractions, which we call $ U $-deformation. We study some particular cases and compare them with the q-deformation of continued fractions introduce recently by Morier-Genoud and Ovsienko.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
Can Population-based Engagement Improve Personalisation? A Novel Dataset and Experiments
Authors:
Sahan Bulathwela,
Meghana Verma,
Maria Perez-Ortiz,
Emine Yilmaz,
John Shawe-Taylor
Abstract:
This work explores how population-based engagement prediction can address cold-start at scale in large learning resource collections. The paper introduces i) VLE, a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures coupled with implicit and explicit signals related to learner engagement, ii) two standard tasks related to pre…
▽ More
This work explores how population-based engagement prediction can address cold-start at scale in large learning resource collections. The paper introduces i) VLE, a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures coupled with implicit and explicit signals related to learner engagement, ii) two standard tasks related to predicting and ranking context-agnostic engagement in video lectures with preliminary baselines and iii) a set of experiments that validate the usefulness of the proposed dataset. Our experimental results indicate that the newly proposed VLE dataset leads to building context-agnostic engagement prediction models that are significantly performant than ones based on previous datasets, mainly attributing to the increase of training examples. VLE dataset's suitability in building models towards Computer Science/ Artificial Intelligence education focused on e-learning/ MOOC use-cases is also evidenced. Further experiments in combining the built model with a personalising algorithm show promising improvements in addressing the cold-start problem encountered in educational recommenders. This is the largest and most diverse publicly available dataset to our knowledge that deals with learner engagement prediction tasks. The dataset, helper tools, descriptive statistics and example code snippets are available publicly.
△ Less
Submitted 22 June, 2022;
originally announced July 2022.
-
ViralBERT: A User Focused BERT-Based Approach to Virality Prediction
Authors:
Rikaz Rameez,
Hossein A. Rahmani,
Emine Yilmaz
Abstract:
Recently, Twitter has become the social network of choice for sharing and spreading information to a multitude of users through posts called 'tweets'. Users can easily re-share these posts to other users through 'retweets', which allow information to cascade to many more users, increasing its outreach. Clearly, being able to know the extent to which a post can be retweeted has great value in adver…
▽ More
Recently, Twitter has become the social network of choice for sharing and spreading information to a multitude of users through posts called 'tweets'. Users can easily re-share these posts to other users through 'retweets', which allow information to cascade to many more users, increasing its outreach. Clearly, being able to know the extent to which a post can be retweeted has great value in advertising, influencing and other such campaigns. In this paper we propose ViralBERT, which can be used to predict the virality of tweets using content- and user-based features. We employ a method of concatenating numerical features such as hashtags and follower numbers to tweet text, and utilise two BERT modules: one for semantic representation of the combined text and numerical features, and another module purely for sentiment analysis of text, as both the information within text and it's ability to elicit an emotional response play a part in retweet proneness. We collect a dataset of 330k tweets to train ViralBERT and validate the efficacy of our model using baselines from current studies in this field. Our experiments show that our approach outperforms these baselines, with a 13% increase in both F1 Score and Accuracy compared to the best performing baseline method. We then undergo an ablation study to investigate the importance of chosen features, finding that text sentiment and follower counts, and to a lesser extent mentions and following counts, are the strongest features for the model, and that hashtag counts are detrimental to the model.
△ Less
Submitted 17 May, 2022;
originally announced June 2022.
-
Integrated Weak Learning
Authors:
Peter Hayes,
Mingtian Zhang,
Raza Habib,
Jordan Burgess,
Emine Yilmaz,
David Barber
Abstract:
We introduce Integrated Weak Learning, a principled framework that integrates weak supervision into the training process of machine learning models. Our approach jointly trains the end-model and a label model that aggregates multiple sources of weak supervision. We introduce a label model that can learn to aggregate weak supervision sources differently for different datapoints and takes into consi…
▽ More
We introduce Integrated Weak Learning, a principled framework that integrates weak supervision into the training process of machine learning models. Our approach jointly trains the end-model and a label model that aggregates multiple sources of weak supervision. We introduce a label model that can learn to aggregate weak supervision sources differently for different datapoints and takes into consideration the performance of the end-model during training. We show that our approach outperforms existing weak learning techniques across a set of 6 benchmark classification datasets. When both a small amount of labeled data and weak supervision are present the increase in performance is both consistent and large, reliably getting a 2-5 point test F1 score gain over non-integrated methods.
△ Less
Submitted 19 June, 2022;
originally announced June 2022.
-
Impact of Tokenization on Language Models: An Analysis for Turkish
Authors:
Cagri Toraman,
Eyup Halit Yilmaz,
Furkan Şahinuç,
Oguzhan Ozcelik
Abstract:
Tokenization is an important text preprocessing step to prepare input tokens for deep language models. WordPiece and BPE are de facto methods employed by important models, such as BERT and GPT. However, the impact of tokenization can be different for morphologically rich languages, such as Turkic languages, where many words can be generated by adding prefixes and suffixes. We compare five tokenize…
▽ More
Tokenization is an important text preprocessing step to prepare input tokens for deep language models. WordPiece and BPE are de facto methods employed by important models, such as BERT and GPT. However, the impact of tokenization can be different for morphologically rich languages, such as Turkic languages, where many words can be generated by adding prefixes and suffixes. We compare five tokenizers at different granularity levels, i.e. their outputs vary from smallest pieces of characters to the surface form of words, including a Morphological-level tokenizer. We train these tokenizers and pretrain medium-sized language models using RoBERTa pretraining procedure on the Turkish split of the OSCAR corpus. We then fine-tune our models on six downstream tasks. Our experiments, supported by statistical tests, reveal that Morphological-level tokenizer has challenging performance with de facto tokenizers. Furthermore, we find that increasing the vocabulary size improves the performance of Morphological and Word-level tokenizers more than that of de facto tokenizers. The ratio of the number of vocabulary parameters to the total number of model parameters can be empirically chosen as 20% for de facto tokenizers and 40% for other tokenizers to obtain a reasonable trade-off between model size and performance.
△ Less
Submitted 19 April, 2022;
originally announced April 2022.
-
Dynamic Schema Graph Fusion Network for Multi-Domain Dialogue State Tracking
Authors:
Yue Feng,
Aldo Lipani,
Fanghua Ye,
Qiang Zhang,
Emine Yilmaz
Abstract:
Dialogue State Tracking (DST) aims to keep track of users' intentions during the course of a conversation. In DST, modelling the relations among domains and slots is still an under-studied problem. Existing approaches that have considered such relations generally fall short in: (1) fusing prior slot-domain membership relations and dialogue-aware dynamic slot relations explicitly, and (2) generaliz…
▽ More
Dialogue State Tracking (DST) aims to keep track of users' intentions during the course of a conversation. In DST, modelling the relations among domains and slots is still an under-studied problem. Existing approaches that have considered such relations generally fall short in: (1) fusing prior slot-domain membership relations and dialogue-aware dynamic slot relations explicitly, and (2) generalizing to unseen domains. To address these issues, we propose a novel \textbf{D}ynamic \textbf{S}chema \textbf{G}raph \textbf{F}usion \textbf{Net}work (\textbf{DSGFNet}), which generates a dynamic schema graph to explicitly fuse the prior slot-domain membership relations and dialogue-aware dynamic slot relations. It also uses the schemata to facilitate knowledge transfer to new domains. DSGFNet consists of a dialogue utterance encoder, a schema graph encoder, a dialogue-aware schema graph evolving network, and a schema graph enhanced dialogue state decoder. Empirical results on benchmark datasets (i.e., SGD, MultiWOZ2.1, and MultiWOZ2.2), show that DSGFNet outperforms existing methods.
△ Less
Submitted 15 April, 2022; v1 submitted 13 April, 2022;
originally announced April 2022.
-
Robust Fingerprint of Location Trajectories Under Differential Privacy
Authors:
Yuzhou Jiang,
Emre Yilmaz,
Erman Ayday
Abstract:
Directly releasing those data raises privacy and liability (e.g., due to unauthorized distribution of such datasets) concerns since location data contain users' sensitive information, e.g., regular moving patterns and favorite spots. To address this, we propose a novel fingerprinting scheme that simultaneously identifies unauthorized redistribution of location datasets and provides differential pr…
▽ More
Directly releasing those data raises privacy and liability (e.g., due to unauthorized distribution of such datasets) concerns since location data contain users' sensitive information, e.g., regular moving patterns and favorite spots. To address this, we propose a novel fingerprinting scheme that simultaneously identifies unauthorized redistribution of location datasets and provides differential privacy guarantees for the shared data. Observing data utility degradation due to differentially-private mechanisms, we introduce a utility-focused post-processing scheme to regain spatio-temporal correlations between points in a location trajectory. We further integrate this post-processing scheme into our fingerprinting scheme as a sampling method. The proposed fingerprinting scheme alleviates the degradation in the utility of the shared dataset due to the noise introduced by differentially-private mechanisms (i.e., adds the fingerprint by preserving the publicly known statistics of the data). Meanwhile, it does not violate differential privacy throughout the entire process due to immunity to post-processing, a fundamental property of differential privacy. Our proposed fingerprinting scheme is robust against known and well-studied attacks against a fingerprinting scheme including random flipping attacks, correlation-based flipping attacks, and collusions among multiple parties, which makes it hard for the attackers to infer the fingerprint codes and avoid accusation. Via experiments on two real-life location datasets and two synthetic ones, we show that our scheme achieves high fingerprinting robustness and outperforms existing approaches. Besides, the proposed fingerprinting scheme increases data utility for differentially-private datasets, which is beneficial for data analyzers.
△ Less
Submitted 21 April, 2023; v1 submitted 10 April, 2022;
originally announced April 2022.