Skip to main content

Showing 1–30 of 30 results for author: Gales, M J F

  1. arXiv:2408.09565  [pdf, other

    cs.CL cs.AI

    Grammatical Error Feedback: An Implicit Evaluation Approach

    Authors: Stefano Bannò, Kate Knill, Mark J. F. Gales

    Abstract: Grammatical feedback is crucial for consolidating second language (L2) learning. Most research in computer-assisted language learning has focused on feedback through grammatical error correction (GEC) systems, rather than examining more holistic feedback that may be more useful for learners. This holistic feedback will be referred to as grammatical error feedback (GEF). In this paper, we present a… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  2. arXiv:2407.06800  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Learn and Don't Forget: Adding a New Language to ASR Foundation Models

    Authors: Mengjie Qian, Siyuan Tang, Rao Ma, Kate M. Knill, Mark J. F. Gales

    Abstract: Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language set. Fine-tuning, while simple, may degrade the accuracy of the original set. We compare three approaches that exploit adaptation parameters: soft language code… ▽ More

    Submitted 24 September, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Proceedings of Interspeech

  3. arXiv:2405.01601  [pdf, other

    cs.CL cs.LG

    Efficient Sample-Specific Encoder Perturbations

    Authors: Yassir Fathullah, Mark J. F. Gales

    Abstract: Encoder-decoder foundation models have displayed state-of-the-art performance on a range of autoregressive sequence tasks. This paper proposes a simple and lightweight modification to such systems to control the behaviour according to a specific attribute of interest. This paper proposes a novel inference-efficient approach to modifying the behaviour of an encoder-decoder system according to a spe… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: To appear in NAACL 2024

  4. arXiv:2404.18557  [pdf, other

    cs.CL

    Can GPT-4 do L2 analytic assessment?

    Authors: Stefano Bannò, Hari Krishna Vydana, Kate M. Knill, Mark J. F. Gales

    Abstract: Automated essay scoring (AES) to evaluate second language (L2) proficiency has been a firmly established technology used in educational contexts for decades. Although holistic scoring has seen advancements in AES that match or even exceed human performance, analytic scoring still encounters issues as it inherits flaws and shortcomings from the human scoring process. The recent introduction of larg… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted for the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

  5. arXiv:2403.19548  [pdf, other

    cs.CL

    WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models

    Authors: Piotr Molenda, Adian Liusie, Mark J. F. Gales

    Abstract: Watermarking generative-AI systems, such as LLMs, has gained considerable interest, driven by their enhanced capabilities across a wide range of tasks. Although current approaches have demonstrated that small, context-dependent shifts in the word distributions can be used to apply and detect watermarks, there has been little work in analyzing the impact that these perturbations have on the quality… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: NAACL 2024 (Findings)

  6. arXiv:2403.13590  [pdf, other

    cs.CL

    Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models

    Authors: Adian Liusie, Yassir Fathullah, Mark J. F. Gales

    Abstract: Large Language Models (LLMs) have demonstrated impressive zero-shot capabilities and versatility in NLP tasks, however they sometimes fail to maintain crucial invariances for specific tasks. One example is permutation sensitivity, where LLMs' outputs may significantly vary depending on the order of the input options. While debiasing techniques can mitigate these issues, and yield better performanc… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  7. arXiv:2311.09363  [pdf, other

    cs.CL

    Investigating the Emergent Audio Classification Ability of ASR Foundation Models

    Authors: Rao Ma, Adian Liusie, Mark J. F. Gales, Kate M. Knill

    Abstract: Text and vision foundation models can perform many tasks in a zero-shot setting, a desirable property that enables these systems to be applied in general and low-resource settings. There has been far less work, however, on the zero-shot abilities of ASR foundation models, with these systems typically fine-tuned to specific tasks or constrained to applications that match their training criterion an… ▽ More

    Submitted 28 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 (main conference)

  8. arXiv:2311.05550  [pdf, other

    cs.CL cs.LG eess.AS

    Towards End-to-End Spoken Grammatical Error Correction

    Authors: Stefano Bannò, Rao Ma, Mengjie Qian, Kate M. Knill, Mark J. F. Gales

    Abstract: Grammatical feedback is crucial for L2 learners, teachers, and testers. Spoken grammatical error correction (GEC) aims to supply feedback to L2 learners on their use of grammar when speaking. This process usually relies on a cascaded pipeline comprising an ASR system, disfluency removal, and GEC, with the associated concern of propagating errors between these individual modules. In this paper, we… ▽ More

    Submitted 19 July, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

  9. arXiv:2309.07606  [pdf, other

    cs.CL cs.IR

    Zero-shot Audio Topic Reranking using Large Language Models

    Authors: Mengjie Qian, Rao Ma, Adian Liusie, Erfan Loweimi, Kate M. Knill, Mark J. F. Gales

    Abstract: Multimodal Video Search by Examples (MVSE) investigates using video clips as the query term for information retrieval, rather than the more traditional text query. This enables far richer search modalities such as images, speaker, content, topic, and emotion. A key element for this process is highly rapid and flexible search to support large archives, which in MVSE is facilitated by representing v… ▽ More

    Submitted 10 September, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

  10. arXiv:2309.04992  [pdf, other

    cs.CL

    Mitigating Word Bias in Zero-shot Prompt-based Classifiers

    Authors: Adian Liusie, Potsawee Manakul, Mark J. F. Gales

    Abstract: Prompt-based classifiers are an attractive approach for zero-shot classification. However, the precise choice of the prompt template and label words can largely influence performance, with semantically equivalent settings often showing notable performance difference. This discrepancy can be partly attributed to word biases, where the classifier may be biased towards classes. To address this proble… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  11. arXiv:2307.09378  [pdf, other

    cs.CL cs.SD eess.AS

    Adapting an ASR Foundation Model for Spoken Language Assessment

    Authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

    Abstract: A crucial part of an accurate and reliable spoken language assessment system is the underlying ASR model. Recently, large-scale pre-trained ASR foundation models such as Whisper have been made available. As the output of these models is designed to be human readable, punctuation is added, numbers are presented in Arabic numeric form and abbreviations are included. Additionally, these models have a… ▽ More

    Submitted 10 October, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Proceedings of SLaTE

  12. arXiv:2307.07889  [pdf, other

    cs.CL

    LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models

    Authors: Adian Liusie, Potsawee Manakul, Mark J. F. Gales

    Abstract: Current developments in large language models (LLMs) have enabled impressive zero-shot capabilities across various natural language tasks. An interesting application of these systems is in the automated assessment of natural language generation (NLG), a highly challenging area with great practical benefit. In this paper, we explore two options for exploiting the emergent abilities of LLMs for zero… ▽ More

    Submitted 6 February, 2024; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: To Appear at EACL 2024

  13. arXiv:2306.13047  [pdf, other

    cs.CL

    Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution

    Authors: Adian Liusie, Vatsal Raina, Andrew Mullooly, Kate Knill, Mark J. F. Gales

    Abstract: Multiple choice exams are widely used to assess candidates across a diverse range of domains and tasks. To moderate question quality, newly proposed questions often pass through pre-test evaluation stages before being deployed into real-world exams. Currently, this evaluation process is manually intensive, which can lead to time lags in the question development cycle. Streamlining this process via… ▽ More

    Submitted 15 October, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

  14. Adapting an Unadaptable ASR System

    Authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

    Abstract: As speech recognition model sizes and training data requirements grow, it is increasingly common for systems to only be available via APIs from online service providers rather than having direct access to models themselves. In this scenario it is challenging to adapt systems to a specific target domain. To address this problem we consider the recently released OpenAI Whisper ASR as an example of a… ▽ More

    Submitted 10 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Proceedings of INTERSPEECH

  15. arXiv:2305.12498  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Multi-Head State Space Model for Speech Recognition

    Authors: Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales

    Abstract: State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches. In this paper, we propose a multi-head state space (MH-SSM) architecture equipped with special gating mechanisms, where parallel heads are taught to learn local and global temporal dynamics on sequence data. As a drop-in… ▽ More

    Submitted 25 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  16. arXiv:2305.05098  [pdf, other

    cs.LG cs.AI cs.CL

    Who Needs Decoders? Efficient Estimation of Sequence-level Attributes

    Authors: Yassir Fathullah, Puria Radmard, Adian Liusie, Mark J. F. Gales

    Abstract: State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor per… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

  17. arXiv:2303.08896  [pdf, other

    cs.CL

    SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

    Authors: Potsawee Manakul, Adian Liusie, Mark J. F. Gales

    Abstract: Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate facts and make non-factual statements which can undermine trust in their output. Existing fact-checking approaches either require access to the output probability distribution (which may not be available for systems such a… ▽ More

    Submitted 11 October, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: EMNLP 2023 (main conference)

  18. N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

    Authors: Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian

    Abstract: Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 mo… ▽ More

    Submitted 10 October, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Proceedings of INTERSPEECH

  19. arXiv:2301.12307  [pdf, other

    cs.CL

    MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization

    Authors: Potsawee Manakul, Adian Liusie, Mark J. F. Gales

    Abstract: State-of-the-art summarization systems can generate highly fluent summaries. These summaries, however, may contain factual inconsistencies and/or information not present in the source. Hence, an important component of assessing the quality of summaries is to determine whether there is information consistency between the source and the summary. Existing approaches are typically based on lexical mat… ▽ More

    Submitted 7 September, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

    Comments: AACL 2023

  20. arXiv:2211.08849  [pdf, other

    eess.AS cs.CL

    L2 proficiency assessment using self-supervised speech representations

    Authors: Stefano Bannò, Kate M. Knill, Marco Matassoni, Vyas Raina, Mark J. F. Gales

    Abstract: There has been a growing demand for automated spoken language assessment systems in recent years. A standard pipeline for this process is to start with a speech recognition system and derive features, either hand-crafted or based on deep-learning, that exploit the transcription and audio. Though these approaches can yield high performance systems, they require speech recognition systems that can b… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  21. arXiv:2208.13265  [pdf, other

    cs.CL

    Podcast Summary Assessment: A Resource for Evaluating Summary Assessment Methods

    Authors: Potsawee Manakul, Mark J. F. Gales

    Abstract: Automatic summary assessment is useful for both machine-generated and human-produced summaries. Automatically evaluating the summary text given the document enables, for example, summary generation system development and detection of inappropriate summaries. Summary assessment can be run in a number of modes: ranking summary generation systems; ranking summaries of a particular document; and estim… ▽ More

    Submitted 28 August, 2022; originally announced August 2022.

  22. arXiv:2206.15407  [pdf, other

    cs.LG cs.AI stat.ML

    Shifts 2.0: Extending The Dataset of Real Distributional Shifts

    Authors: Andrey Malinin, Andreas Athanasopoulos, Muhamed Barakovic, Meritxell Bach Cuadra, Mark J. F. Gales, Cristina Granziera, Mara Graziani, Nikolay Kartashev, Konstantinos Kyriakopoulos, Po-Jui Lu, Nataliia Molchanova, Antonis Nikitakis, Vatsal Raina, Francesco La Rosa, Eli Sivena, Vasileios Tsarsitalidis, Efi Tsompopoulou, Elena Volf

    Abstract: Distributional shift, or the mismatch between training and deployment data, is a significant obstacle to the usage of machine learning in high-stakes industrial applications, such as autonomous driving and medicine. This creates a need to be able to assess how robustly ML models generalize as well as the quality of their uncertainty estimates. Standard ML baseline datasets do not allow these prope… ▽ More

    Submitted 15 September, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

  23. arXiv:2203.08295  [pdf, other

    cs.LG cs.AI stat.ML

    Self-Distribution Distillation: Efficient Uncertainty Estimation

    Authors: Yassir Fathullah, Mark J. F. Gales

    Abstract: Deep learning is increasingly being applied in safety-critical domains. For these scenarios it is important to know the level of uncertainty in a model's prediction to ensure appropriate decisions are made by the system. Deep ensembles are the de-facto standard approach to obtaining various measures of uncertainty. However, ensembles often significantly increase the resources required in the train… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: 17 pages, 3 figures, 17 tables, submitted to UAI 2022

  24. arXiv:2109.03888  [pdf, other

    cs.CL

    Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems

    Authors: Potsawee Manakul, Mark J. F. Gales

    Abstract: Transformer models have achieved state-of-the-art results in a wide range of NLP tasks including summarization. Training and inference using large transformer models can be computationally expensive. Previous work has focused on one important bottleneck, the quadratic self-attention mechanism in the encoder. Modified encoder architectures such as LED or LoBART use local attention patterns to addre… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021 (short paper, main conference)

  25. arXiv:2107.07455  [pdf, other

    cs.LG cs.AI stat.ML

    Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks

    Authors: Andrey Malinin, Neil Band, Ganshin, Alexander, German Chesnokov, Yarin Gal, Mark J. F. Gales, Alexey Noskov, Andrey Ploskonosov, Liudmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, Vyas Raina, Roginskiy, Denis, Mariya Shmatova, Panos Tigas, Boris Yangel

    Abstract: There has been significant research done on developing methods for improving robustness to distributional shift and uncertainty estimation. In contrast, only limited work has examined developing standard datasets and benchmarks for assessing these approaches. Additionally, most work on uncertainty estimation and robustness has developed new techniques based on small-scale regression or image class… ▽ More

    Submitted 11 February, 2022; v1 submitted 15 July, 2021; originally announced July 2021.

  26. arXiv:2107.04691  [pdf, other

    cs.CL

    An Initial Investigation of Non-Native Spoken Question-Answering

    Authors: Vatsal Raina, Mark J. F. Gales

    Abstract: Text-based machine comprehension (MC) systems have a wide-range of applications, and standard corpora exist for developing and evaluating approaches. There has been far less research on spoken question answering (SQA) systems. The SQA task considered in this paper is to extract the answer from a candidate$\text{'}$s spoken response to a question in a prompt-response style language assessment test.… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: 5 pages, 1 figure

  27. arXiv:2105.03801  [pdf, other

    cs.CL

    Long-Span Summarization via Local Attention and Content Selection

    Authors: Potsawee Manakul, Mark J. F. Gales

    Abstract: Transformer-based models have achieved state-of-the-art results in a wide range of natural language processing (NLP) tasks including document summarization. Typically these systems are trained by fine-tuning a large pre-trained model to the target task. One issue with these transformer-based models is that they do not scale well in terms of memory and compute requirements as the input length grows… ▽ More

    Submitted 29 May, 2021; v1 submitted 8 May, 2021; originally announced May 2021.

    Comments: ACL 2021 (camera-ready)

  28. arXiv:2104.01264  [pdf, other

    cs.CL cs.AI cs.LG

    Attention Forcing for Machine Translation

    Authors: Qingyun Dou, Yiting Lu, Potsawee Manakul, Xixin Wu, Mark J. F. Gales

    Abstract: Auto-regressive sequence-to-sequence models with attention mechanisms have achieved state-of-the-art performance in various tasks including Text-To-Speech (TTS) and Neural Machine Translation (NMT). The standard training approach, teacher forcing, guides a model with the reference output history. At inference stage, the generated output history must be used. This mismatch can impact performance. H… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: text overlap with arXiv:1909.12289

  29. arXiv:1909.13695  [pdf, other

    eess.AS cs.CL cs.SD

    Non-native Speaker Verification for Spoken Language Assessment

    Authors: Linlin Wang, Yu Wang, Mark J. F. Gales

    Abstract: Automatic spoken language assessment systems are becoming more popular in order to handle increasing interests in second language learning. One challenge for these systems is to detect malpractice. Malpractice can take a range of forms, this paper focuses on detecting when a candidate attempts to impersonate another in a speaking test. This form of malpractice is closely related to speaker verific… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

  30. arXiv:1909.12289  [pdf, other

    cs.LG cs.CL eess.AS stat.ML

    Attention Forcing for Sequence-to-sequence Model Training

    Authors: Qingyun Dou, Yiting Lu, Joshua Efiong, Mark J. F. Gales

    Abstract: Auto-regressive sequence-to-sequence models with attention mechanism have achieved state-of-the-art performance in many tasks such as machine translation and speech synthesis. These models can be difficult to train. The standard approach, teacher forcing, guides a model with reference output history during training. The problem is that the model is unlikely to recover from its mistakes during infe… ▽ More

    Submitted 2 October, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: 11 pages, 4 figures, conference

    ACM Class: I.2