Skip to main content

Showing 1–50 of 596 results for author: Liu, A

  1. arXiv:2410.15954  [pdf, other

    cs.LG cs.AI

    TS-ACL: A Time Series Analytic Continual Learning Framework for Privacy-Preserving and Class-Incremental Pattern Recognition

    Authors: Kejia Fan, Jiaxu Li, Songning Lai, Linpu Lv, Anfeng Liu, Jianheng Tang, Houbing Herbert Song, Huiping Zhuang

    Abstract: Class-incremental Learning (CIL) in Time Series Classification (TSC) aims to incrementally train models using the streaming time series data that arrives continuously. The main problem in this scenario is catastrophic forgetting, i.e., training models with new samples inevitably leads to the forgetting of previously learned knowledge. Among existing methods, the replay-based methods achieve satisf… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 11 pages, 3 figures, 2 tables

    MSC Class: I.2.6

  2. arXiv:2410.13351  [pdf, other

    cs.CL cs.AI cs.LG

    Representation Learning of Structured Data for Medical Foundation Models

    Authors: Vijay Prakash Dwivedi, Viktor Schlegel, Andy T. Liu, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Jeng Wei, Wei-Hsian Yin, Stefan Winkler, Robby T. Tan

    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing me… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024 Workshop on Unifying Representations in Neural Models (UniReps 2024)

  3. arXiv:2410.09671  [pdf, other

    cs.AI cs.CL

    OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

    Authors: Jun Wang, Meng Fang, Ziyu Wan, Muning Wen, Jiachen Zhu, Anjie Liu, Ziqin Gong, Yan Song, Lei Chen, Lionel M. Ni, Linyi Yang, Ying Wen, Weinan Zhang

    Abstract: In this technical report, we introduce OpenR, an open-source framework designed to integrate key components for enhancing the reasoning capabilities of large language models (LLMs). OpenR unifies data acquisition, reinforcement learning training (both online and offline), and non-autoregressive decoding into a cohesive software platform. Our goal is to establish an open-source platform and communi… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  4. arXiv:2410.05352  [pdf, other

    cs.LG cs.AI

    Recent Advances of Multimodal Continual Learning: A Comprehensive Survey

    Authors: Dianzhi Yu, Xinni Zhang, Yankai Chen, Aiwei Liu, Yifei Zhang, Philip S. Yu, Irwin King

    Abstract: Continual learning (CL) aims to empower machine learning models to learn continually from new data, while building upon previously acquired knowledge without forgetting. As machine learning models have evolved from small to large pre-trained architectures, and from supporting unimodal to multimodal data, multimodal continual learning (MMCL) methods have recently emerged. The primary challenge of M… ▽ More

    Submitted 10 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

  5. arXiv:2410.04780  [pdf, other

    cs.CV

    Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality

    Authors: Guanyu Zhou, Yibo Yan, Xin Zou, Kun Wang, Aiwei Liu, Xuming Hu

    Abstract: Multimodal Large Language Models (MLLMs) have emerged as a central focus in both industry and academia, but often suffer from biases introduced by visual and language priors, which can lead to multimodal hallucination. These biases arise from the visual encoder and the Large Language Model (LLM) backbone, affecting the attention mechanism responsible for aligning multimodal inputs. Existing decodi… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  6. arXiv:2410.04350  [pdf, other

    cs.CL

    TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights

    Authors: Aiwei Liu, Haoping Bai, Zhiyun Lu, Yanchao Sun, Xiang Kong, Simon Wang, Jiulong Shan, Albin Madappally Jose, Xiaojiang Liu, Lijie Wen, Philip S. Yu, Meng Cao

    Abstract: Direct Preference Optimization (DPO) has been widely adopted for preference alignment of Large Language Models (LLMs) due to its simplicity and effectiveness. However, DPO is derived as a bandit problem in which the whole response is treated as a single arm, ignoring the importance differences between tokens, which may affect optimization efficiency and make it difficult to achieve optimal results… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

    Comments: 27 pages, 7 figures, 2 tables

    MSC Class: 68T50 ACM Class: I.2.7

  7. arXiv:2410.03168  [pdf, other

    cs.CR cs.CL

    Can Watermarked LLMs be Identified by Users via Crafted Prompts?

    Authors: Aiwei Liu, Sheng Guan, Yiming Liu, Leyi Pan, Yifei Zhang, Liancheng Fang, Lijie Wen, Philip S. Yu, Xuming Hu

    Abstract: Text watermarking for Large Language Models (LLMs) has made significant progress in detecting LLM outputs and preventing misuse. Current watermarking techniques offer high detectability, minimal impact on text quality, and robustness to text editing. However, current researches lack investigation into the imperceptibility of watermarking techniques in LLM services. This is crucial as LLM providers… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: 25 pages, 5 figures, 8 tables

    MSC Class: 68T50 ACM Class: I.2.7

  8. arXiv:2410.02955  [pdf, other

    cs.AI cs.AR cs.ET cs.HC

    AiBAT: Artificial Intelligence/Instructions for Build, Assembly, and Test

    Authors: Benjamin Nuernberger, Anny Liu, Heather Stefanini, Richard Otis, Amanda Towler, R. Peter Dillon

    Abstract: Instructions for Build, Assembly, and Test (IBAT) refers to the process used whenever any operation is conducted on hardware, including tests, assembly, and maintenance. Currently, the generation of IBAT documents is time-intensive, as users must manually reference and transfer information from engineering diagrams and parts lists into IBAT instructions. With advances in machine learning and compu… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: 9 pages, 6 figures, 2 tables

  9. arXiv:2410.01949  [pdf, other

    cs.LG

    Discrete Copula Diffusion

    Authors: Anji Liu, Oliver Broadrick, Mathias Niepert, Guy Van den Broeck

    Abstract: Discrete diffusion models have recently shown significant progress in modeling complex data, such as natural languages and DNA sequences. However, unlike diffusion models for continuous data, which can generate high-quality samples in just a few denoising steps, modern discrete diffusion models still require hundreds or even thousands of denoising steps to perform well. In this paper, we identify… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  10. arXiv:2410.01707  [pdf, other

    cs.CL cs.AI

    Interpretable Contrastive Monte Carlo Tree Search Reasoning

    Authors: Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei Liu, Xuming Hu, Lijie Wen

    Abstract: We propose SC-MCTS*: a novel Monte Carlo Tree Search (MCTS) reasoning algorithm for Large Language Models (LLMs), significantly improves both reasoning accuracy and speed. Our motivation comes from: 1. Previous MCTS LLM reasoning works often overlooked its biggest drawback--slower speed compared to CoT; 2. Previous research mainly used MCTS as a tool for LLM reasoning on various tasks with limited… ▽ More

    Submitted 11 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

  11. arXiv:2410.00292  [pdf, other

    cs.CL cs.CV

    Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis

    Authors: Chun-Hsiao Yeh, Jiayun Wang, Andrew D. Graham, Andrea J. Liu, Bo Tan, Yubei Chen, Yi Ma, Meng C. Lin

    Abstract: Accurate diagnosis of ocular surface diseases is critical in optometry and ophthalmology, which hinge on integrating clinical data sources (e.g., meibography imaging and clinical metadata). Traditional human assessments lack precision in quantifying clinical observations, while current machine-based methods often treat diagnoses as multi-class classification problems, limiting the diagnoses to a p… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: Accepted to MICCAI 2024. Project Webpage: https://danielchyeh.github.io/MDPipe/

  12. arXiv:2409.19231  [pdf, other

    cs.LG cs.AI

    Double Actor-Critic with TD Error-Driven Regularization in Reinforcement Learning

    Authors: Haohui Chen, Zhiyong Chen, Aoxiang Liu, Wentuo Fang

    Abstract: To obtain better value estimation in reinforcement learning, we propose a novel algorithm based on the double actor-critic framework with temporal difference error-driven regularization, abbreviated as TDDR. TDDR employs double actors, with each actor paired with a critic, thereby fully leveraging the advantages of double critics. Additionally, TDDR introduces an innovative critic regularization a… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  13. arXiv:2409.19211  [pdf, other

    cs.PL

    Programming with High-Level Abstractions, Proceedings of the 3rd Workshop on Logic and Practice of Programming

    Authors: David S. Warren, Yanhong A. Liu

    Abstract: This proceedings contains abstracts and position papers for the work presented at the third Logic and Practice of Programming (LPOP) Workshop. The workshop was held online, using zoom, at stonybrook.zoom.us, on December 13, 2022. The workshop focused on core high-level abstractions around sets and logic rules, to help bring them to the general practice of programming.

    Submitted 27 September, 2024; originally announced September 2024.

  14. arXiv:2409.16295  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget

    Authors: Andy T. Liu, Yi-Cheng Lin, Haibin Wu, Stefan Winkler, Hung-yi Lee

    Abstract: Despite their impressive success, training foundation models remains computationally costly. This paper investigates how to efficiently train speech foundation models with self-supervised learning (SSL) under a limited compute budget. We examine critical factors in SSL that impact the budget, including model architecture, model size, and data size. Our goal is to make analytical steps toward under… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: To appear in SLT 2024

  15. arXiv:2409.16117  [pdf, ps, other

    eess.AS cs.SD

    Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration

    Authors: Pin-Jui Ku, Alexander H. Liu, Roman Korostik, Sung-Feng Huang, Szu-Wei Fu, Ante Jukić

    Abstract: This paper proposes a generative pretraining foundation model for high-quality speech restoration tasks. By directly operating on complex-valued short-time Fourier transform coefficients, our model does not rely on any vocoders for time-domain signal reconstruction. As a result, our model simplifies the synthesis process and removes the quality upper-bound introduced by any mel-spectrogram vocoder… ▽ More

    Submitted 24 September, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: 5 pages, Submitted to ICASSP 2025. The implementation and configuration could be found in https://github.com/NVIDIA/NeMo/blob/main/examples/audio/conf/flow_matching_generative_ssl_pretraining.yaml The audio demo page could be found in https://kuray107.github.io/ssl_gen25-examples/index.html

  16. arXiv:2409.15897  [pdf, ps, other

    eess.AS cs.SD

    ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

    Authors: Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharhi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe

    Abstract: Neural codecs have become crucial to recent speech and audio generation research. In addition to signal compression capabilities, discrete codecs have also been found to enhance downstream training efficiency and compatibility with autoregressive language models. However, as extensive downstream applications are investigated, challenges have arisen in ensuring fair comparisons across diverse appli… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: Accepted by SLT

  17. arXiv:2409.15841  [pdf, other

    cs.CV

    FSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving

    Authors: Erxin Guo, Pei An, You Yang, Qiong Liu, An-An Liu

    Abstract: 4D occupancy forecasting is one of the important techniques for autonomous driving, which can avoid potential risk in the complex traffic scenes. Scene flow is a crucial element to describe 4D occupancy map tendency. However, an accurate scene flow is difficult to predict in the real scene. In this paper, we find that BEV scene flow can approximately represent 3D scene flow in most traffic scenes.… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  18. arXiv:2409.14085  [pdf, other

    eess.AS cs.SD

    Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models

    Authors: Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kaiwei Chang, Jiawei Du, Ke-Han Lu, Alexander H. Liu, Ho-Lam Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan, James Glass, Shinji Watanabe, Hung-yi Lee

    Abstract: Neural audio codec models are becoming increasingly important as they serve as tokenizers for audio, enabling efficient transmission or facilitating speech language modeling. The ideal neural audio codec should maintain content, paralinguistics, speaker characteristics, and audio information even at low bitrates. Recently, numerous advanced neural codec models have been proposed. However, codec mo… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

  19. G-Fuzz: A Directed Fuzzing Framework for gVisor

    Authors: Yuwei Li, Yuan Chen, Shouling Ji, Xuhong Zhang, Guanglu Yan, Alex X. Liu, Chunming Wu, Zulie Pan, Peng Lin

    Abstract: gVisor is a Google-published application-level kernel for containers. As gVisor is lightweight and has sound isolation, it has been widely used in many IT enterprises \cite{Stripe, DigitalOcean, Cloundflare}. When a new vulnerability of the upstream gVisor is found, it is important for the downstream developers to test the corresponding code to maintain the security. To achieve this aim, directed… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: This paper has published in IEEE Transactions on Dependable and Secure Computing (TDSC), https://ieeexplore.ieee.org/abstract/document/10049484/citations?tabFilter=papers#citations

    Journal ref: IEEE Transactions on Dependable and Secure Computing, vol. 21, no. 1, pp. 168-185, Jan.-Feb. 2024

  20. arXiv:2409.07321  [pdf, other

    cs.CV cs.AI

    Module-wise Adaptive Adversarial Training for End-to-end Autonomous Driving

    Authors: Tianyuan Zhang, Lu Wang, Jiaqi Kang, Xinwei Zhang, Siyuan Liang, Yuwei Chen, Aishan Liu, Xianglong Liu

    Abstract: Recent advances in deep learning have markedly improved autonomous driving (AD) models, particularly end-to-end systems that integrate perception, prediction, and planning stages, achieving state-of-the-art performance. However, these models remain vulnerable to adversarial attacks, where human-imperceptible perturbations can disrupt decision-making processes. While adversarial training is an effe… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 14 pages

  21. arXiv:2409.05112  [pdf, other

    cs.CL

    WaterSeeker: Pioneering Efficient Detection of Watermarked Segments in Large Documents

    Authors: Leyi Pan, Aiwei Liu, Yijian Lu, Zitian Gao, Yichen Di, Lijie Wen, Irwin King, Philip S. Yu

    Abstract: Watermarking algorithms for large language models (LLMs) have attained high accuracy in detecting LLM-generated text. However, existing methods primarily focus on distinguishing fully watermarked text from non-watermarked text, overlooking real-world scenarios where LLMs generate only small sections within large documents. In this scenario, balancing time complexity and detection performance poses… ▽ More

    Submitted 15 October, 2024; v1 submitted 8 September, 2024; originally announced September 2024.

    Comments: 20 pages, 7 figures, 8 tables

    MSC Class: 68T50 ACM Class: I.2.7

  22. arXiv:2409.02497  [pdf, other

    eess.IV cs.CV

    A Learnable Color Correction Matrix for RAW Reconstruction

    Authors: Anqi Liu, Shiyi Mu, Shugong Xu

    Abstract: Autonomous driving algorithms usually employ sRGB images as model input due to their compatibility with the human visual system. However, visually pleasing sRGB images are possibly sub-optimal for downstream tasks when compared to RAW images. The availability of RAW images is constrained by the difficulties in collecting real-world driving data and the associated challenges of annotation. To addre… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Accepted by BMVC2024

  23. arXiv:2409.02483  [pdf, other

    cs.CV cs.AI

    TASAR: Transfer-based Attack on Skeletal Action Recognition

    Authors: Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Ajian Liu, Xingxing Wei, Meng Wang, He Wang

    Abstract: Skeletal sequences, as well-structured representations of human behaviors, play a vital role in Human Activity Recognition (HAR). The transferability of adversarial skeletal sequences enables attacks in real-world HAR scenarios, such as autonomous driving, intelligent surveillance, and human-computer interactions. However, most existing skeleton-based HAR (S-HAR) attacks are primarily designed for… ▽ More

    Submitted 9 October, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

  24. arXiv:2409.00107  [pdf, other

    eess.SY cs.AI cs.LG econ.GN math.OC

    Evaluating the Impact of Multiple DER Aggregators on Wholesale Energy Markets: A Hybrid Mean Field Approach

    Authors: Jun He, Andrew L. Liu

    Abstract: The integration of distributed energy resources (DERs) into wholesale energy markets can greatly enhance grid flexibility, improve market efficiency, and contribute to a more sustainable energy future. As DERs -- such as solar PV panels and energy storage -- proliferate, effective mechanisms are needed to ensure that small prosumers can participate meaningfully in these markets. We study a wholesa… ▽ More

    Submitted 27 August, 2024; originally announced September 2024.

  25. arXiv:2408.16300  [pdf, other

    cs.NE math.OC

    A Distance Similarity-based Genetic Optimization Algorithm for Satellite Ground Network Planning Considering Feeding Mode

    Authors: Yingying Ren, Qiuli Li, Yangyang Guo, Witold Pedrycz, Lining Xing, Anfeng Liu, Yanjie Song

    Abstract: With the rapid development of the satellite industry, the information transmission network based on communication satellites has gradually become a major and important part of the future satellite ground integration network. However, the low transmission efficiency of the satellite data relay back mission has become a problem that is currently constraining the construction of the system and needs… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 25 pages

  26. arXiv:2408.14418  [pdf, other

    cs.CL cs.AI

    MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues

    Authors: Kuluhan Binici, Abhinav Ramesh Kashyap, Viktor Schlegel, Andy T. Liu, Vijay Prakash Dwivedi, Thanh-Tung Nguyen, Xiaoxue Gao, Nancy F. Chen, Stefan Winkler

    Abstract: Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade the performance of downstream tasks like summarization. This issue is particularly pronounced in clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning is scarce, necessitating the use of ASR models as black-box solut… ▽ More

    Submitted 5 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  27. arXiv:2408.13491  [pdf, other

    cs.CV

    ESA: Annotation-Efficient Active Learning for Semantic Segmentation

    Authors: Jinchao Ge, Zeyu Zhang, Minh Hieu Phan, Bowen Zhang, Akide Liu, Yang Zhao

    Abstract: Active learning enhances annotation efficiency by selecting the most revealing samples for labeling, thereby reducing reliance on extensive human input. Previous methods in semantic segmentation have centered on individual pixels or small areas, neglecting the rich patterns in natural images and the power of advanced pre-trained models. To address these challenges, we propose three key contributio… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  28. arXiv:2408.12836  [pdf, other

    cs.AR

    An Architectural Error Metric for CNN-Oriented Approximate Multipliers

    Authors: Ao Liu, Jie Han, Qin Wang, Zhigang Mao, Honglan Jiang

    Abstract: As a potential alternative for implementing the large number of multiplications in convolutional neural networks (CNNs), approximate multipliers (AMs) promise both high hardware efficiency and accuracy. However, the characterization of accuracy and design of appropriate AMs are critical to an AM-based CNN (AM-CNN). In this work, the generation and propagation of errors in an AM-CNN are analyzed by… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 11 pages, 10 figures

  29. arXiv:2408.12793  [pdf, other

    cs.CV

    La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection

    Authors: Hang Zou, Chenxi Du, Hui Zhang, Yuan Zhang, Ajian Liu, Jun Wan, Zhen Lei

    Abstract: Facial recognition systems are susceptible to both physical and digital attacks, posing significant security risks. Traditional approaches often treat these two attack types separately due to their distinct characteristics. Thus, when being combined attacked, almost all methods could not deal. Some studies attempt to combine the sparse data from both types of attacks into a single dataset and try… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  30. arXiv:2408.12494  [pdf, other

    cs.CL cs.AI

    GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

    Authors: Kunsheng Tang, Wenbo Zhou, Jie Zhang, Aishan Liu, Gelei Deng, Shuai Li, Peigui Qi, Weiming Zhang, Tianwei Zhang, Nenghai Yu

    Abstract: Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  31. Towards Deconfounded Image-Text Matching with Causal Inference

    Authors: Wenhui Li, Xinqi Su, Dan Song, Lanjun Wang, Kun Zhang, An-An Liu

    Abstract: Prior image-text matching methods have shown remarkable performance on many benchmark datasets, but most of them overlook the bias in the dataset, which exists in intra-modal and inter-modal, and tend to learn the spurious correlations that extremely degrade the generalization ability of the model. Furthermore, these methods often incorporate biased external knowledge from large-scale datasets as… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: ACM MM

    Journal ref: 2023/10/26,Proceedings of the 31st ACM International Conference on Multimedia,6264-6273

  32. arXiv:2408.12095  [pdf, other

    cs.CL cs.AI cs.LG

    uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization

    Authors: Aishik Nagar, Yutong Liu, Andy T. Liu, Viktor Schlegel, Vijay Prakash Dwivedi, Arun-Kumar Kaliya-Perumal, Guna Pratheep Kalanchiam, Yili Tang, Robby T. Tan

    Abstract: Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as fai… ▽ More

    Submitted 25 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 12 pages

  33. arXiv:2408.10883  [pdf, other

    cs.AI cs.CV

    DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection

    Authors: Xinqi Su, Yawen Cui, Ajian Liu, Xun Lin, Yuhao Wang, Haochen Liang, Wenhui Li, Zitong Yu

    Abstract: In current web environment, fake news spreads rapidly across online social networks, posing serious threats to society. Existing multimodal fake news detection (MFND) methods can be classified into knowledge-based and semantic-based approaches. However, these methods are overly dependent on human expertise and feedback, lacking flexibility. To address this challenge, we propose a Dynamic Analysis… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  34. arXiv:2408.10111  [pdf, other

    cs.AI cs.LG

    PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities

    Authors: Yuanjian Xu, Anxian Liu, Jianing Hao, Zhenzhuo Li, Shichang Meng, Guang Zhang

    Abstract: Financial time series modeling is crucial for understanding and predicting market behaviors but faces challenges such as non-linearity, non-stationarity, and high noise levels. Traditional models struggle to capture complex patterns due to these issues, compounded by limitations in computational resources and model capacity. Inspired by the success of large language models in NLP, we introduce… ▽ More

    Submitted 19 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  35. arXiv:2408.09752  [pdf, other

    cs.CV

    A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method

    Authors: Hang Zou, Chenxi Du, Ajian Liu, Yuan Zhang, Jing Liu, Mingchuan Yang, Jun Wan, Hui Zhang

    Abstract: Iris recognition is widely used in high-security scenarios due to its stability and distinctiveness. However, the acquisition of iris images typically requires near-infrared illumination and near-infrared band filters, leading to significant and consistent differences in imaging across devices. This underscores the importance of developing cross-domain capabilities in iris anti-spoofing methods. D… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  36. arXiv:2408.09395  [pdf, other

    cs.CV

    OU-CoViT: Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF Images

    Authors: Yang Li, Jianing Deng, Chong Zhong, Danjuan Yang, Meiyan Li, A. H. Welsh, Aiyi Liu, Xingtao Zhou, Catherine C. Liu, Bo Fu

    Abstract: Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging and joint modeling of multiple discrete and continuous clinical scores presents a promising new paradigm for multi-task problems in Ophthalmology. The bi-channel framework that arises from the Ophthalmic phenomenon of ``interocular asymmetries'' of both eyes (OU) calls for new employment on the SOTA transformer-based models.… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  37. Flexible 3D Lane Detection by Hierarchical Shape MatchingFlexible 3D Lane Detection by Hierarchical Shape Matching

    Authors: Zhihao Guan, Ruixin Liu, Zejian Yuan, Ao Liu, Kun Tang, Tong Zhou, Erlong Li, Chao Zheng, Shuqi Mei

    Abstract: As one of the basic while vital technologies for HD map construction, 3D lane detection is still an open problem due to varying visual conditions, complex typologies, and strict demands for precision. In this paper, an end-to-end flexible and hierarchical lane detector is proposed to precisely predict 3D lane lines from point clouds. Specifically, we design a hierarchical network predicting flexib… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  38. arXiv:2408.06518  [pdf, other

    cs.CL

    Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models

    Authors: Hila Gonen, Terra Blevins, Alisa Liu, Luke Zettlemoyer, Noah A. Smith

    Abstract: Despite their wide adoption, the biases and unintended behaviors of language models remain poorly understood. In this paper, we identify and characterize a phenomenon never discussed before, which we call semantic leakage, where models leak irrelevant information from the prompt into the generation in unexpected ways. We propose an evaluation setting to detect semantic leakage both by humans and a… ▽ More

    Submitted 12 September, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  39. arXiv:2408.06150  [pdf, other

    cs.CL physics.chem-ph q-bio.BM

    LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library

    Authors: Tianhao Yu, Cai Yao, Zhuorui Sun, Feng Shi, Lin Zhang, Kangjie Lyu, Xuan Bai, Andong Liu, Xicheng Zhang, Jiali Zou, Wenshou Wang, Chris Lai, Kai Wang

    Abstract: In this study, we generate and maintain a database of 10 million virtual lipids through METiS's in-house de novo lipid generation algorithms and lipid virtual screening techniques. These virtual lipids serve as a corpus for pre-training, lipid representation learning, and downstream task knowledge transfer, culminating in state-of-the-art LNP property prediction performance. We propose LipidBERT,… ▽ More

    Submitted 19 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  40. arXiv:2408.06047  [pdf, other

    cs.CV

    BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

    Authors: Xuanpu Zhang, Dan Song, Pengxin Zhan, Qingguo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Anan Liu

    Abstract: Image-based virtual try-on is an increasingly popular and important task to generate realistic try-on images of specific person. Existing methods always employ an accurate mask to remove the original garment in the source image, thus achieving realistic synthesized images in simple and conventional try-on scenarios based on powerful diffusion model. Therefore, acquiring suitable mask is vital to t… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  41. arXiv:2408.02882  [pdf, other

    cs.AI cs.CR cs.LG

    Compromising Embodied Agents with Contextual Backdoor Attacks

    Authors: Aishan Liu, Yuguang Zhou, Xianglong Liu, Tianyuan Zhang, Siyuan Liang, Jiakai Wang, Yanjun Pu, Tianlin Li, Junqi Zhang, Wenbo Zhou, Qing Guo, Dacheng Tao

    Abstract: Large language models (LLMs) have transformed the development of embodied intelligence. By providing a few contextual demonstrations, developers can utilize the extensive internal knowledge of LLMs to effortlessly translate complex tasks described in abstract language into sequences of code snippets, which will serve as the execution logic for embodied agents. However, this paper uncovers a signif… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  42. arXiv:2408.02157  [pdf, other

    cs.CV

    PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance

    Authors: Aoming Liu, Zhong Li, Zhang Chen, Nannan Li, Yi Xu, Bryan A. Plummer

    Abstract: Immersive scene generation, notably panorama creation, benefits significantly from the adaptation of large pre-trained text-to-image (T2I) models for multi-view image generation. Due to the high cost of acquiring multi-view images, tuning-free generation is preferred. However, existing methods are either limited to simple correspondences or require extensive fine-tuning to capture complex ones. We… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  43. arXiv:2407.21037  [pdf, other

    cs.CL cs.AI

    An Application of Large Language Models to Coding Negotiation Transcripts

    Authors: Ray Friedman, Jaewoo Cho, Jeanne Brett, Xuhui Zhan, Ningyu Han, Sriram Kannan, Yingxiang Ma, Jesse Spencer-Smith, Elisabeth Jäckel, Alfred Zerres, Madison Hooper, Katie Babbit, Manish Acharya, Wendi Adair, Soroush Aslani, Tayfun Aykaç, Chris Bauman, Rebecca Bennett, Garrett Brady, Peggy Briggs, Cheryl Dowie, Chase Eck, Igmar Geiger, Frank Jacob, Molly Kern , et al. (33 additional authors not shown)

    Abstract: In recent years, Large Language Models (LLM) have demonstrated impressive capabilities in the field of natural language processing (NLP). This paper explores the application of LLMs in negotiation transcript analysis by the Vanderbilt AI Negotiation Lab. Starting in September 2022, we applied multiple strategies using LLMs from zero shot learning to fine tuning models to in-context learning). The… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  44. arXiv:2407.20242  [pdf, other

    cs.CY cs.AI cs.RO

    BadRobot: Manipulating Embodied LLMs in the Physical World

    Authors: Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Changgan Yin, Minghui Li, Lulu Xue, Yichen Wang, Shengshan Hu, Aishan Liu, Peijin Guo, Leo Yu Zhang

    Abstract: Embodied AI represents systems where AI is integrated into physical entities, enabling them to perceive and interact with their surroundings. Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI by facilitating sophisticated task planning. However, a critical safety issue remains overlooked: could these embodied LLMs per… ▽ More

    Submitted 3 October, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 38 pages, 16 figures

  45. arXiv:2407.18428  [pdf, other

    cs.LG cs.AI cs.CV

    Weighted Risk Invariance: Domain Generalization under Invariant Feature Shift

    Authors: Gina Wong, Joshua Gleason, Rama Chellappa, Yoav Wald, Anqi Liu

    Abstract: Learning models whose predictions are invariant under multiple environments is a promising approach for out-of-distribution generalization. Such models are trained to extract features $X_{\text{inv}}$ where the conditional distribution $Y \mid X_{\text{inv}}$ of the label given the extracted features does not change across environments. Invariant models are also supposed to generalize to shifts in… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  46. arXiv:2407.16607  [pdf, other

    cs.CL cs.LG

    Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

    Authors: Jonathan Hayase, Alisa Liu, Yejin Choi, Sewoong Oh, Noah A. Smith

    Abstract: The pretraining data of today's strongest language models is opaque; in particular, little is known about the proportions of various domains or languages represented. In this work, we tackle a task which we call data mixture inference, which aims to uncover the distributional make-up of training data. We introduce a novel attack based on a previously overlooked source of information: byte-pair enc… ▽ More

    Submitted 5 September, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: new robustness experiments; new baselines; include Mistral, Mistral-Nemo and GPT-NeoX; link to code

  47. arXiv:2407.13874  [pdf, other

    quant-ph cs.DS cs.IT cs.LG

    Optimal high-precision shadow estimation

    Authors: Sitan Chen, Jerry Li, Allen Liu

    Abstract: We give the first tight sample complexity bounds for shadow tomography and classical shadows in the regime where the target error is below some sufficiently small inverse polynomial in the dimension of the Hilbert space. Formally we give a protocol that, given any $m\in\mathbb{N}$ and $ε\le O(d^{-12})$, measures $O(\log(m)/ε^2)$ copies of an unknown mixed state $ρ\in\mathbb{C}^{d\times d}$ and out… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  48. arXiv:2407.13266  [pdf, other

    cs.SD cs.HC eess.AS

    How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines

    Authors: Ailin Liu, Pepijn Vunderink, Jose Vargas Quiros, Chirag Raman, Hayley Hung

    Abstract: Low-frequency audio has been proposed as a promising privacy-preserving modality to study social dynamics in real-world settings. To this end, researchers have developed wearable devices that can record audio at frequencies as low as 1250 Hz to mitigate the automatic extraction of the verbal content of speech that may contain private details. This paper investigates the validity of this hypothesis… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: This manuscript has been accepted by Interspeech 2024

  49. Incremental high average-utility itemset mining: survey and challenges

    Authors: Jing Chen, Shengyi Yang, Weiping Ding, Peng Li, Aijun Liu, Hongjun Zhang, Tian Li

    Abstract: The High Average Utility Itemset Mining (HAUIM) technique, a variation of High Utility Itemset Mining (HUIM), uses the average utility of the itemsets. Historically, most HAUIM algorithms were designed for static databases. However, practical applications like market basket analysis and business decision-making necessitate regular updates of the database with new transactions. As a result, researc… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 25 pages, 23 figures

  50. arXiv:2407.10061  [pdf, other

    cs.CV

    InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation

    Authors: Zeyu Zhang, Akide Liu, Qi Chen, Feng Chen, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang

    Abstract: Text-to-motion generation holds potential for film, gaming, and robotics, yet current methods often prioritize short motion generation, making it challenging to produce long motion sequences effectively: (1) Current methods struggle to handle long motion sequences as a single input due to prohibitively high computational cost; (2) Breaking down the generation of long motion sequences into shorter… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.