Skip to main content

Showing 1–50 of 370 results for author: Chen, N

  1. arXiv:2410.12337  [pdf, other

    cs.CV

    ARIC: An Activity Recognition Dataset in Classroom Surveillance Images

    Authors: Linfeng Xu, Fanman Meng, Qingbo Wu, Lili Pan, Heqian Qiu, Lanxiao Wang, Kailong Chen, Kanglei Geng, Yilei Qian, Haojie Wang, Shuchang Zhou, Shimou Ling, Zejia Liu, Nanlin Chen, Yingjie Xu, Shaoxu Cheng, Bowen Tan, Ziyong Xu, Hongliang Li

    Abstract: The application of activity recognition in the ``AI + Education" field is gaining increasing attention. However, current work mainly focuses on the recognition of activities in manually captured videos and a limited number of activity types, with little attention given to recognizing activities in surveillance images from real classrooms. Activity recognition in classroom surveillance images faces… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: text overlap with arXiv:2409.03354

  2. arXiv:2410.12216  [pdf, other

    cs.RO

    Learning Differentiable Tensegrity Dynamics using Graph Neural Networks

    Authors: Nelson Chen, Kun Wang, William R. Johnson III, Rebecca Kramer-Bottiglio, Kostas Bekris, Mridul Aanjaneya

    Abstract: Tensegrity robots are composed of rigid struts and flexible cables. They constitute an emerging class of hybrid rigid-soft robotic systems and are promising systems for a wide array of applications, ranging from locomotion to assembly. They are difficult to control and model accurately, however, due to their compliance and high number of degrees of freedom. To address this issue, prior work has in… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  3. arXiv:2410.10626  [pdf, other

    cs.CL

    Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

    Authors: Guorui Zheng, Xidong Wang, Juhao Liang, Nuo Chen, Yuping Zheng, Benyou Wang

    Abstract: Adapting medical Large Language Models to local languages can reduce barriers to accessing healthcare services, but data scarcity remains a significant challenge, particularly for low-resource languages. To address this, we first construct a high-quality medical dataset and conduct analysis to ensure its quality. In order to leverage the generalization capability of multilingual LLMs to efficientl… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  4. arXiv:2410.05801  [pdf, other

    cs.CL cs.AI

    Retrieving, Rethinking and Revising: The Chain-of-Verification Can Improve Retrieval Augmented Generation

    Authors: Bolei He, Nuo Chen, Xinran He, Lingyong Yan, Zhenkai Wei, Jinchang Luo, Zhen-Hua Ling

    Abstract: Recent Retrieval Augmented Generation (RAG) aims to enhance Large Language Models (LLMs) by incorporating extensive knowledge retrieved from external sources. However, such approach encounters some challenges: Firstly, the original queries may not be suitable for precise retrieval, resulting in erroneous contextual knowledge; Secondly, the language model can easily generate inconsistent answer wit… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP 2024 Findings. 9 pages, 4 figures, 7 tables

  5. arXiv:2409.18654  [pdf, other

    eess.AS cs.SD

    Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models

    Authors: Xiaoxue Gao, Nancy F. Chen

    Abstract: Current automatic speech recognition systems struggle with modeling long speech sequences due to high quadratic complexity of Transformer-based models. Selective state space models such as Mamba has performed well on long-sequence modeling in natural language processing and computer vision tasks. However, research endeavors in speech technology tasks has been under-explored. We propose Speech-Mamb… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: 8 pages; SLT 2024

  6. arXiv:2409.16022  [pdf, other

    cs.CL cs.AI

    AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment

    Authors: Nuo Chen, Jiqun Liu, Xiaoyu Dong, Qijiong Liu, Tetsuya Sakai, Xiao-Ming Wu

    Abstract: Cognitive biases are systematic deviations in thinking that lead to irrational judgments and problematic decision-making, extensively studied across various fields. Recently, large language models (LLMs) have shown advanced understanding capabilities but may inherit human biases from their training data. While social biases in LLMs have been well-studied, cognitive biases have received less attent… ▽ More

    Submitted 8 October, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

  7. arXiv:2409.15376  [pdf, other

    cs.LG cs.AI cs.CL

    ControlMath: Controllable Data Generation Promotes Math Generalist Models

    Authors: Nuo Chen, Ning Wu, Jianhui Chang, Jia Li

    Abstract: Utilizing large language models (LLMs) for data augmentation has yielded encouraging results in mathematical reasoning. However, these approaches face constraints in problem diversity, potentially restricting them to in-domain/distribution data generation. To this end, we propose ControlMath, an iterative method involving an equation-generator module and two LLM-based agents. The module creates di… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: 17 pages

    Report number: EMNLP 2024 Main

  8. arXiv:2409.14666  [pdf, other

    cs.AI

    Semi-supervised Learning For Robust Speech Evaluation

    Authors: Huayun Zhang, Jeremy H. M. Wong, Geyu Lin, Nancy F. Chen

    Abstract: Speech evaluation measures a learners oral proficiency using automatic models. Corpora for training such models often pose sparsity challenges given that there often is limited scored data from teachers, in addition to the score distribution across proficiency levels being often imbalanced among student cohorts. Automatic scoring is thus not robust when faced with under-represented samples or out-… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

    Comments: 6 pages

  9. arXiv:2409.12576  [pdf, other

    cs.CV

    StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

    Authors: Zhengguang Zhou, Jing Li, Huaxia Li, Nemo Chen, Xu Tang

    Abstract: Tuning-free personalized image generation methods have achieved significant success in maintaining facial consistency, i.e., identities, even with multiple characters. However, the lack of holistic consistency in scenes with multiple characters hampers these methods' ability to create a cohesive narrative. In this paper, we introduce StoryMaker, a personalization solution that preserves not only f… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: 12 pages, 5 figures

  10. arXiv:2409.10157  [pdf, other

    eess.AS cs.SD eess.SP

    Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization

    Authors: Xiaoxue Gao, Chen Zhang, Yiming Chen, Huayun Zhang, Nancy F. Chen

    Abstract: Current emotional text-to-speech (TTS) models predominantly conduct supervised training to learn the conversion from text and desired emotion to its emotional speech, focusing on a single emotion per text-speech pair. These models only learn the correct emotional outputs without fully comprehending other emotion characteristics, which limits their capabilities of capturing the nuances between diff… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 5 pages

  11. arXiv:2409.07924  [pdf, other

    cs.RO

    Universal Trajectory Optimization Framework for Differential Drive Robot Class

    Authors: Mengke Zhang, Nanhe Chen, Hu Wang, Jianxiong Qiu, Zhichao Han, Qiuyu Ren, Chao Xu, Fei Gao, Yanjun Cao

    Abstract: Differential drive robots are widely used in various scenarios thanks to their straightforward principle, from household service robots to disaster response field robots. There are several types of driving mechanisms for real-world applications, including two-wheeled, four-wheeled skid-steering, tracked robots, and so on. The differences in the driving mechanisms usually require specific kinematic… ▽ More

    Submitted 27 September, 2024; v1 submitted 12 September, 2024; originally announced September 2024.

    Comments: 15 pages, 15 figures

  12. arXiv:2409.06635  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders

    Authors: Wenyu Zhang, Shuo Sun, Bin Wang, Xunlong Zou, Zhuohan Liu, Yingxu He, Geyu Lin, Nancy F. Chen, Ai Ti Aw

    Abstract: The rapid advancements in large language models (LLMs) have significantly enhanced natural language processing capabilities, facilitating the development of AudioLLMs that process and understand speech and audio inputs alongside text. Existing AudioLLMs typically combine a pre-trained audio encoder with a pre-trained LLM, which are subsequently finetuned on specific audio tasks. However, the pre-t… ▽ More

    Submitted 22 September, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

  13. arXiv:2409.05606  [pdf, other

    cs.CV cs.MM

    CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization

    Authors: Nan Chen, Mengqi Huang, Zhuowei Chen, Yang Zheng, Lei Zhang, Zhendong Mao

    Abstract: Subject-driven text-to-image (T2I) customization has drawn significant interest in academia and industry. This task enables pre-trained models to generate novel images based on unique subjects. Existing studies adopt a self-reconstructive perspective, focusing on capturing all details of a single image, which will misconstrue the specific image's irrelevant attributes (e.g., view, pose, and backgr… ▽ More

    Submitted 11 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

  14. arXiv:2409.01347  [pdf, other

    cs.CV

    Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance

    Authors: Cunzheng Wang, Ziyuan Guo, Yuxuan Duan, Huaxia Li, Nemo Chen, Xu Tang, Yao Hu

    Abstract: Consistency distillation methods have demonstrated significant success in accelerating generative tasks of diffusion models. However, since previous consistency distillation methods use simple and straightforward strategies in selecting target timesteps, they usually struggle with blurs and detail losses in generated images. To address these limitations, we introduce Target-Driven Distillation (TD… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  15. arXiv:2408.14418  [pdf, other

    cs.CL cs.AI

    MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues

    Authors: Kuluhan Binici, Abhinav Ramesh Kashyap, Viktor Schlegel, Andy T. Liu, Vijay Prakash Dwivedi, Thanh-Tung Nguyen, Xiaoxue Gao, Nancy F. Chen, Stefan Winkler

    Abstract: Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade the performance of downstream tasks like summarization. This issue is particularly pronounced in clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning is scarce, necessitating the use of ASR models as black-box solut… ▽ More

    Submitted 5 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  16. arXiv:2408.13772  [pdf, ps, other

    cs.OS

    FRAP: A Flexible Resource Accessing Protocol for Multiprocessor Real-Time Systems

    Authors: Shuai Zhao, Hanzhi Xu, Nan Chen, Ruoxian Su, Wanli Chang

    Abstract: Fully-partitioned fixed-priority scheduling (FP-FPS) multiprocessor systems are widely found in real-time applications, where spin-based protocols are often deployed to manage the mutually exclusive access of shared resources. Unfortunately, existing approaches either enforce rigid spin priority rules for resource accessing or carry significant pessimism in the schedulability analysis, imposing su… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

  17. arXiv:2408.12396  [pdf, other

    cs.CV physics.geo-ph

    Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis

    Authors: Zhixiang Guo, Xinming Wu, Luming Liang, Hanlin Sheng, Nuo Chen, Zhengfa Bi

    Abstract: We explore adapting foundation models (FMs) from the computer vision domain to geoscience. FMs, large neural networks trained on massive datasets, excel in diverse tasks with remarkable adaptability and generality. However, geoscience faces challenges like lacking curated training datasets and high computational costs for developing specialized FMs. This study considers adapting FMs from computer… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  18. arXiv:2408.11873  [pdf, other

    eess.AS cs.CR cs.LG

    Parameter-Efficient Transfer Learning under Federated Learning for Automatic Speech Recognition

    Authors: Xuan Kan, Yonghui Xiao, Tien-Ju Yang, Nanxin Chen, Rajiv Mathews

    Abstract: This work explores the challenge of enhancing Automatic Speech Recognition (ASR) model performance across various user-specific domains while preserving user data privacy. We employ federated learning and parameter-efficient domain adaptation methods to solve the (1) massive data requirement of ASR models from user-specific scenarios and (2) the substantial communication cost between servers and c… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  19. arXiv:2408.10463  [pdf, other

    cs.SD cs.LG eess.AS

    Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting

    Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

    Abstract: The keyword spotting (KWS) problem requires large amounts of real speech training data to achieve high accuracy across diverse populations. Utilizing large amounts of text-to-speech (TTS) synthesized data can reduce the cost and time associated with KWS development. However, TTS data may contain artifacts not present in real speech, which the KWS model can exploit (overfit), leading to degraded ac… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models

  20. arXiv:2408.08656  [pdf, other

    cs.CL

    LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs

    Authors: Do Xuan Long, Hai Nguyen Ngoc, Tiviatis Sim, Hieu Dao, Shafiq Joty, Kenji Kawaguchi, Nancy F. Chen, Min-Yen Kan

    Abstract: We present the first systematic evaluation examining format bias in performance of large language models (LLMs). Our approach distinguishes between two categories of an evaluation metric under format constraints to reliably and accurately assess performance: one measures performance when format constraints are adhered to, while the other evaluates performance regardless of constraint adherence. We… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  21. arXiv:2408.06827  [pdf, other

    eess.AS cs.LG

    PRESENT: Zero-Shot Text-to-Prosody Control

    Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien Herremans

    Abstract: Current strategies for achieving fine-grained prosody control in speech synthesis entail extracting additional style embeddings or adopting more complex architectures. To enable zero-shot application of pretrained text-to-speech (TTS) models, we present PRESENT (PRosody Editing without Style Embeddings or New Training), which exploits explicit prosody prediction in FastSpeech2-based models by modi… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  22. arXiv:2408.03560  [pdf, other

    cs.LG stat.ML

    In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models

    Authors: Ayrton San Joaquin, Bin Wang, Zhengyuan Liu, Nicholas Asher, Brian Lim, Philippe Muller, Nancy F. Chen

    Abstract: Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the open-source community. To address this challenge, we propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and eval… ▽ More

    Submitted 2 October, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: EMNLP 2024 - Findings

  23. arXiv:2407.18879  [pdf, other

    cs.SD cs.LG eess.AS

    Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model

    Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

    Abstract: This paper explores the use of TTS synthesized training data for KWS (keyword spotting) task while minimizing development cost and time. Keyword spotting models require a huge amount of training data to be accurate, and obtaining such training data can be costly. In the current state of the art, TTS models can generate large amounts of natural-sounding data, which can help reducing cost and time f… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models

  24. arXiv:2407.13466  [pdf, other

    cs.RO cs.LG

    LIMT: Language-Informed Multi-Task Visual World Models

    Authors: Elie Aljalbout, Nikolaos Sotirakis, Patrick van der Smagt, Maximilian Karl, Nutan Chen

    Abstract: Most recent successes in robot reinforcement learning involve learning a specialized single-task agent. However, robots capable of performing multiple tasks can be much more valuable in real-world applications. Multi-task reinforcement learning can be very challenging due to the increased sample complexity and the potentially conflicting task objectives. Previous work on this topic is domina… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  25. arXiv:2407.11484  [pdf, other

    cs.AI cs.CL

    The Oscars of AI Theater: A Survey on Role-Playing with Language Models

    Authors: Nuo Chen, Yan Wang, Yang Deng, Jia Li

    Abstract: This survey explores the burgeoning field of role-playing with language models, focusing on their development from early persona-based models to advanced character-driven simulations facilitated by Large Language Models (LLMs). Initially confined to simple persona consistency due to limited model capabilities, role-playing tasks have now expanded to embrace complex character portrayals involving c… ▽ More

    Submitted 1 September, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 28 pages

  26. arXiv:2407.09546  [pdf, other

    q-fin.TR cs.SI

    A Reflective LLM-based Agent to Guide Zero-shot Cryptocurrency Trading

    Authors: Yuan Li, Bingqiao Luo, Qian Wang, Nuo Chen, Xu Liu, Bingsheng He

    Abstract: The utilization of Large Language Models (LLMs) in financial trading has primarily been concentrated within the stock market, aiding in economic and financial decisions. Yet, the unique opportunities presented by the cryptocurrency market, noted for its on-chain data's transparency and the critical influence of off-chain signals like news, remain largely untapped by LLMs. This work aims to bridge… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  27. arXiv:2407.03897  [pdf, other

    cs.LG cs.NE

    gFlora: a topology-aware method to discover functional co-response groups in soil microbial communities

    Authors: Nan Chen, Merlijn Schram, Doina Bucur

    Abstract: We aim to learn the functional co-response group: a group of taxa whose co-response effect (the representative characteristic of the group showing the total topological abundance of taxa) co-responds (associates well statistically) to a functional variable. Different from the state-of-the-art method, we model the soil microbial community as an ecological co-occurrence network with the taxa as node… ▽ More

    Submitted 17 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: BIOKDD accepted. Note: the first version of this paper is not approved by all authors. the second version is the official version for BIOKDD camera-ready

  28. arXiv:2407.00981  [pdf, other

    cs.HC cs.CL

    VisEval: A Benchmark for Data Visualization in the Era of Large Language Models

    Authors: Nan Chen, Yuge Zhang, Jiahang Xu, Kan Ren, Yuqing Yang

    Abstract: Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However,… ▽ More

    Submitted 7 August, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  29. arXiv:2406.16020  [pdf, other

    cs.SD cs.CL eess.AS

    AudioBench: A Universal Benchmark for Audio Large Language Models

    Authors: Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen

    Abstract: We introduce AudioBench, a universal benchmark designed to evaluate Audio Large Language Models (AudioLLMs). It encompasses 8 distinct tasks and 26 datasets, among which, 7 are newly proposed datasets. The evaluation targets three main aspects: speech understanding, audio scene understanding, and voice understanding (paralinguistic). Despite recent advancements, there lacks a comprehensive benchma… ▽ More

    Submitted 2 September, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: v3 - Abundent update on models and evaluation details; Code: https://github.com/AudioLLMs/AudioBench

  30. arXiv:2406.11021  [pdf, other

    cs.CV

    $α$-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction

    Authors: Sanbao Su, Nuo Chen, Felix Juefei-Xu, Chen Feng, Fei Miao

    Abstract: In the realm of autonomous vehicle (AV) perception, comprehending 3D scenes is paramount for tasks such as planning and mapping. Camera-based 3D Semantic Occupancy Prediction (OCC) aims to infer scene geometry and semantics from limited observations. While it has gained popularity due to affordability and rich visual cues, existing methods often neglect the inherent uncertainty in models. To addre… ▽ More

    Submitted 4 October, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  31. arXiv:2406.09383  [pdf, other

    cs.CV

    Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

    Authors: Yiming Li, Zhiheng Li, Nuo Chen, Moonjun Gong, Zonglin Lyu, Zehong Wang, Peili Jiang, Chen Feng

    Abstract: Large-scale datasets have fueled recent advancements in AI-based autonomous vehicle research. However, these datasets are usually collected from a single vehicle's one-time pass of a certain location, lacking multiagent interactions or repeated traversals of the same place. Such information could lead to transformative enhancements in autonomous vehicles' perception, prediction, and planning capab… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024

  32. arXiv:2406.05358  [pdf, other

    cs.LG math.OC

    Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management

    Authors: Huiling Meng, Ningyuan Chen, Xuefeng Gao

    Abstract: Intensity control is a type of continuous-time dynamic optimization problems with many important applications in Operations Research including queueing and revenue management. In this study, we adapt the reinforcement learning framework to intensity control using choice-based network revenue management as a case study, which is a classical problem in revenue management that features a large state… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  33. arXiv:2406.03052  [pdf, other

    cs.LG

    Are Your Models Still Fair? Fairness Attacks on Graph Neural Networks via Node Injections

    Authors: Zihan Luo, Hong Huang, Yongkang Zhou, Jiping Zhang, Nuo Chen

    Abstract: Despite the remarkable capabilities demonstrated by Graph Neural Networks (GNNs) in graph-related tasks, recent research has revealed the fairness vulnerabilities in GNNs when facing malicious adversarial attacks. However, all existing fairness attacks require manipulating the connectivity between existing nodes, which may be prohibited in reality. To this end, we introduce a Node Injection-based… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 21 pages

  34. arXiv:2406.02963  [pdf, other

    cs.SD eess.AS

    Dataset-Distillation Generative Model for Speech Emotion Recognition

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H. M Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng

    Abstract: Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training with it. DD has been investigated in computer vision but not yet in speech. This paper presents the first approach for DD to speech targeting Speech Em… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  35. arXiv:2406.02921  [pdf, other

    cs.CL cs.AI cs.LG cs.NE eess.AS

    Text Injection for Neural Contextual Biasing

    Authors: Zhong Meng, Zelin Wu, Rohit Prabhavalkar, Cal Peyser, Weiran Wang, Nanxin Chen, Tara N. Sainath, Bhuvana Ramabhadran

    Abstract: Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhance contextual ASR. CTI leverages not only the paired speech-text data, but also a much larger corpus of unpaired text to optimize the ASR model and it… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure

    Journal ref: Interspeech 2024, Kos Island, Greece

  36. arXiv:2406.02426  [pdf, other

    math.OC cs.LG

    Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls

    Authors: Tianyu Wang, Ningyuan Chen, Chun Wang

    Abstract: In contextual optimization, a decision-maker observes historical samples of uncertain variables and associated concurrent covariates, without knowing their joint distribution. Given an additional covariate observation, the goal is to choose a decision that minimizes some operational costs. A prevalent issue here is covariate shift, where the marginal distribution of the new covariate differs from… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  37. arXiv:2405.17463  [pdf, other

    cs.GT cs.LG

    No Algorithmic Collusion in Two-Player Blindfolded Game with Thompson Sampling

    Authors: Ningyuan Chen, Xuefeng Gao, Yi Xiong

    Abstract: When two players are engaged in a repeated game with unknown payoff matrices, they may be completely unaware of the existence of each other and use multi-armed bandit algorithms to choose the actions, which is referred to as the ``blindfolded game'' in this paper. We show that when the players use Thompson sampling, the game dynamics converges to the Nash equilibrium under a mild assumption on the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  38. arXiv:2405.15329  [pdf, other

    cs.CL

    Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework

    Authors: Minzhi Li, Zhengyuan Liu, Shumin Deng, Shafiq Joty, Nancy F. Chen, Min-Yen Kan

    Abstract: The acceleration of Large Language Models (LLMs) research has opened up new possibilities for evaluating generated texts. They serve as scalable and economical evaluators, but the question of how reliable these evaluators are has emerged as a crucial research question. Prior research efforts in the meta-evaluation of LLMs as judges limit the prompting of an LLM to a single use to obtain a final ev… ▽ More

    Submitted 14 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  39. arXiv:2405.12038  [pdf, other

    cs.LG cs.IR

    Adaptive Convolutional Forecasting Network Based on Time Series Feature-Driven

    Authors: Dandan Zhang, Zhiqiang Zhang, Nanguang Chen, Yun Wang

    Abstract: Time series data in real-world scenarios contain a substantial amount of nonlinear information, which significantly interferes with the training process of models, leading to decreased prediction performance. Therefore, during the time series forecasting process, extracting the local and global time series patterns and understanding the potential nonlinear features among different time observation… ▽ More

    Submitted 3 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  40. arXiv:2405.10496  [pdf, other

    cs.IT eess.SP

    Electromagnetic Information Theory for Holographic MIMO Communications

    Authors: Li Wei, Tierui Gong, Chongwen Huang, Zhaoyang Zhang, Wei E. I. Sha, Zhi Ning Chen, Linglong Dai, Merouane Debbah, Chau Yuen

    Abstract: Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far it… ▽ More

    Submitted 25 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  41. arXiv:2405.08460  [pdf, other

    cs.CL cs.AI

    Is Your LLM Outdated? Evaluating LLMs at Temporal Generalization

    Authors: Chenghao Zhu, Nuo Chen, Yufei Gao, Yunyi Zhang, Prayag Tiwari, Benyou Wang

    Abstract: The rapid advancement of Large Language Models (LLMs) highlights the urgent need for evolving evaluation methodologies that keep pace with improvements in language comprehension and information processing. However, traditional benchmarks, which are often static, fail to capture the continually changing information landscape, leading to a disparity between the perceived and actual effectiveness of… ▽ More

    Submitted 10 July, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Preprint

  42. arXiv:2405.03138  [pdf, other

    cs.CL

    CRAFT: Extracting and Tuning Cultural Instructions from the Wild

    Authors: Bin Wang, Geyu Lin, Zhengyuan Liu, Chengwei Wei, Nancy F. Chen

    Abstract: Large language models (LLMs) have rapidly evolved as the foundation of various natural language processing (NLP) applications. Despite their wide use cases, their understanding of culturally-related concepts and reasoning remains limited. Meantime, there is a significant need to enhance these models' cultural reasoning capabilities, especially concerning underrepresented regions. This paper introd… ▽ More

    Submitted 9 July, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: Aceepted to ACL 2024 Workshop - C3NLP (Workshop on Cross-Cultural Considerations in NLP)

  43. arXiv:2405.03110  [pdf, other

    cs.IR

    Vector Quantization for Recommender Systems: A Review and Outlook

    Authors: Qijiong Liu, Xiaoyu Dong, Jiaren Xiao, Nuo Chen, Hengchang Hu, Jieming Zhu, Chenxu Zhu, Tetsuya Sakai, Xiao-Ming Wu

    Abstract: Vector quantization, renowned for its unparalleled feature compression capabilities, has been a prominent topic in signal processing and machine learning research for several decades and remains widely utilized today. With the emergence of large models and generative AI, vector quantization has gained popularity in recommender systems, establishing itself as a preferred solution. This paper starts… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  44. arXiv:2405.02630  [pdf, other

    quant-ph cs.DC cs.SE

    cuTN-QSVM: cuTensorNet-accelerated Quantum Support Vector Machine with cuQuantum SDK

    Authors: Kuan-Cheng Chen, Tai-Yue Li, Yun-Yuan Wang, Simon See, Chun-Chieh Wang, Robert Wille, Nan-Yow Chen, An-Cheng Yang, Chun-Yu Lin

    Abstract: This paper investigates the application of Quantum Support Vector Machines (QSVMs) with an emphasis on the computational advancements enabled by NVIDIA's cuQuantum SDK, especially leveraging the cuTensorNet library. We present a simulation workflow that substantially diminishes computational overhead, as evidenced by our experiments, from exponential to quadratic cost. While state vector simulatio… ▽ More

    Submitted 8 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 10 pages, 14 figures

  45. arXiv:2404.18946  [pdf, other

    physics.optics cs.IR eess.IV

    Align-Free Multi-Plane Phase Retrieval

    Authors: Jiabao Wang, Yang Wu, Jun Wang, Ni Chen

    Abstract: The multi-plane phase retrieval method provides a budget-friendly and effective way to perform phase imaging, yet it often encounters alignment challenges due to shifts along the optical axis in experiments. Traditional methods, such as employing beamsplitters instead of mechanical stage movements or adjusting focus using tunable light sources, add complexity to the setup required for multi-plane… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  46. arXiv:2404.17454  [pdf, other

    cs.LG cs.AI q-bio.QM

    Domain Adaptive and Fine-grained Anomaly Detection for Single-cell Sequencing Data and Beyond

    Authors: Kaichen Xu, Yueyang Ding, Suyang Hou, Weiqiang Zhan, Nisang Chen, Jun Wang, Xiaobo Sun

    Abstract: Fined-grained anomalous cell detection from affected tissues is critical for clinical diagnosis and pathological research. Single-cell sequencing data provide unprecedented opportunities for this task. However, current anomaly detection methods struggle to handle domain shifts prevalent in multi-sample and multi-domain single-cell sequencing data, leading to suboptimal performance. Moreover, these… ▽ More

    Submitted 29 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: 17 pages, 2 figures. Accepted by IJCAI 2024

  47. arXiv:2404.11932  [pdf, other

    cs.CL cs.AI

    CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment

    Authors: Geyu Lin, Bin Wang, Zhengyuan Liu, Nancy F. Chen

    Abstract: Multilingual proficiency presents a significant challenge for large language models (LLMs). English-centric models are usually suboptimal in other languages, particularly those that are linguistically distant from English. This performance discrepancy mainly stems from the imbalanced distribution of training data across languages during pre-training and instruction tuning stages. To address this p… ▽ More

    Submitted 12 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 11 pages

  48. arXiv:2404.09754  [pdf, other

    cs.CL

    Resilience of Large Language Models for Noisy Instructions

    Authors: Bin Wang, Chengwei Wei, Zhengyuan Liu, Geyu Lin, Nancy F. Chen

    Abstract: As the rapidly advancing domain of natural language processing (NLP), large language models (LLMs) have emerged as powerful tools for interpreting human commands and generating text across various tasks. Nonetheless, the resilience of LLMs to handle text containing inherent errors, stemming from human interactions and collaborative systems, has not been thoroughly explored. Our study investigates… ▽ More

    Submitted 2 October, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to EMNLP 2024 Findings

  49. arXiv:2404.07471  [pdf, other

    cs.SE cs.AI cs.CL

    Structure-aware Fine-tuning for Code Pre-trained Models

    Authors: Jiayi Wu, Renyu Zhu, Nuo Chen, Qiushi Sun, Xiang Li, Ming Gao

    Abstract: Over the past few years, we have witnessed remarkable advancements in Code Pre-trained Models (CodePTMs). These models achieved excellent representation capabilities by designing structure-based pre-training tasks for code. However, how to enhance the absorption of structural knowledge when fine-tuning CodePTMs still remains a significant challenge. To fill this gap, in this paper, we present Stru… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by COLING 2024

  50. arXiv:2404.06762  [pdf, other

    cs.CL cs.HC

    Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems

    Authors: Zhengyuan Liu, Stella Xin Yin, Geyu Lin, Nancy F. Chen

    Abstract: Intelligent Tutoring Systems (ITSs) can provide personalized and self-paced learning experience. The emergence of large language models (LLMs) further enables better human-machine interaction, and facilitates the development of conversational ITSs in various disciplines such as math and language learning. In dialogic teaching, recognizing and adapting to individual characteristics can significantl… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.