Skip to main content

Showing 1–50 of 224 results for author: Xie, R

  1. arXiv:2410.15580  [pdf, other

    cs.LG cs.CL

    Language Models are Symbolic Learners in Arithmetic

    Authors: Chunyuan Deng, Zhiqi Li, Roy Xie, Ruidi Chang, Hanjie Chen

    Abstract: Large Language Models (LLMs) are thought to struggle with arithmetic learning due to the inherent differences between language modeling and numerical computation, but concrete evidence has been lacking. This work responds to this claim through a two-side experiment. We first investigate whether LLMs leverage partial products during arithmetic learning. We find that although LLMs can identify some… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  2. arXiv:2410.15252  [pdf, other

    cs.CL cs.AI

    Lossless KV Cache Compression to 2%

    Authors: Zhen Yang, J. N. Han, Kan Wu, Ruobing Xie, An Wang, Xingwu Sun, Zhanhui Kang

    Abstract: Large language models have revolutionized data processing in numerous domains, with their ability to handle extended context reasoning receiving notable recognition. To speed up inference, maintaining a key-value (KV) cache memory is essential. Nonetheless, the growing demands for KV cache memory create significant hurdles for efficient implementation. This work introduces a novel architecture, Cr… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  3. arXiv:2410.14283  [pdf, other

    cs.CV

    Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization

    Authors: Bin Lin, Yanzhen Yu, Jianhao Ye, Ruitao Lv, Yuguang Yang, Ruoye Xie, Pan Yu, Hongbin Zhou

    Abstract: Existing audio-driven facial animation methods face critical challenges, including expression leakage, ineffective subtle expression transfer, and imprecise audio-driven synchronization. We discovered that these issues stem from limitations in motion representation and the lack of fine-grained control over facial expressions. To address these problems, we present Takin-ADA, a novel two-stage appro… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: under review

  4. arXiv:2410.12519  [pdf, other

    cs.IR

    RosePO: Aligning LLM-based Recommenders with Human Values

    Authors: Jiayi Liao, Xiangnan He, Ruobing Xie, Jiancan Wu, Yancheng Yuan, Xingwu Sun, Zhanhui Kang, Xiang Wang

    Abstract: Recently, there has been a growing interest in leveraging Large Language Models (LLMs) for recommendation systems, which usually adapt a pre-trained LLM to the recommendation scenario through supervised fine-tuning (SFT). However, both the pre-training and SFT stages fail to explicitly model the comparative relationships of a user's preferences on different items. To construct a "helpful and harml… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  5. arXiv:2410.11701  [pdf, other

    cs.CL cs.AI cs.CV cs.MM

    Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions

    Authors: Yuhan Fu, Ruobing Xie, Jiazhen Liu, Bangxiang Lan, Xingwu Sun, Zhanhui Kang, Xirong Li

    Abstract: Hallucinations in multimodal large language models (MLLMs) hinder their practical applications. To address this, we propose a Magnifier Prompt (MagPrompt), a simple yet effective method to tackle hallucinations in MLLMs via extremely simple instructions. MagPrompt is based on the following two key principles, which guide the design of various effective prompts, demonstrating robustness: (1) MLLMs… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 9 pages, 13 tables, 4 figures

  6. arXiv:2410.07673  [pdf, other

    cs.LG cs.AI

    Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference

    Authors: Jianxing Yu, Shiqi Wang, Han Yin, Zhenlong Sun, Ruobing Xie, Bo Zhang, Yanghui Rao

    Abstract: This paper focuses on detecting clickbait posts on the Web. These posts often use eye-catching disinformation in mixed modalities to mislead users to click for profit. That affects the user experience and thus would be blocked by content provider. To escape detection, malicious creators use tricks to add some irrelevant non-bait content into bait posts, dressing them up as legal to fool the detect… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  7. arXiv:2410.03440  [pdf, other

    cs.CL cs.AI

    Exploring the Benefit of Activation Sparsity in Pre-training

    Authors: Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin, Yankai Lin, Zhiyuan Zeng, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou

    Abstract: Pre-trained Transformers inherently possess the characteristic of sparse activation, where only a small fraction of the neurons are activated for each token. While sparse activation has been explored through post-training methods, its potential in pre-training remains untapped. In this work, we first study how activation properties change during pre-training. Our examination reveals that Transform… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: ICML 2024

  8. arXiv:2410.00036  [pdf, other

    cs.HC cs.AI eess.AS

    InsightPulse: An IoT-based System for User Experience Interview Analysis

    Authors: Dian Lyu, Yuetong Lu, Jassie He, Murad Mehrab Abrar, Ruijun Xie, John Raiti

    Abstract: Conducting efficient and effective user experience (UX) interviews often poses challenges, such as maintaining focus on key topics and managing the duration of interviews and post-interview analyses. To address these issues, this paper introduces InsightPulse, an Internet of Things (IoT)-based hardware and software system designed to streamline and enhance the UX interview process through speech a… ▽ More

    Submitted 23 September, 2024; originally announced October 2024.

    Comments: Accepted for publication at the 10th IEEE International Conference on Collaboration and Internet Computing (IEEE CIC 2024), Washington D.C., USA

  9. arXiv:2409.12980  [pdf, other

    cs.CV

    A New People-Object Interaction Dataset and NVS Benchmarks

    Authors: Shuai Guo, Houqiang Zhong, Qiuwen Wang, Ziyu Chen, Yijie Gao, Jiajing Yuan, Chenyu Zhang, Rong Xie, Li Song

    Abstract: Recently, NVS in human-object interaction scenes has received increasing attention. Existing human-object interaction datasets mainly consist of static data with limited views, offering only RGB images or videos, mostly containing interactions between a single person and objects. Moreover, these datasets exhibit complexities in lighting environments, poor synchronization, and low resolution, hinde… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  10. arXiv:2409.12139  [pdf, other

    cs.SD cs.AI eess.AS

    Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models

    Authors: Sijing Chen, Yuan Feng, Laipeng He, Tianwei He, Wendi He, Yanni Hu, Bin Lin, Yiting Lin, Yu Pan, Pengfei Tan, Chengwei Tian, Chen Wang, Zhicheng Wang, Ruoye Xie, Jixun Yao, Quanlei Yan, Yuguang Yang, Jianhao Ye, Jingjing Yin, Yanzhen Yu, Huimin Zhang, Xiang Zhang, Guangcheng Zhao, Hongbin Zhou, Pengpeng Zou

    Abstract: With the advent of the big data and large language model era, zero-shot personalized rapid customization has emerged as a significant trend. In this report, we introduce Takin AudioLLM, a series of techniques and models, mainly including Takin TTS, Takin VC, and Takin Morphing, specifically designed for audiobook production. These models are capable of zero-shot speech production, generating high-… ▽ More

    Submitted 23 September, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

    Comments: Technical Report; 18 pages; typos corrected, references added, demo url modified, author name modified;

  11. Towards Empathetic Conversational Recommender Systems

    Authors: Xiaoyu Zhang, Ruobing Xie, Yougang Lyu, Xin Xin, Pengjie Ren, Mingfei Liang, Bo Zhang, Zhanhui Kang, Maarten de Rijke, Zhaochun Ren

    Abstract: Conversational recommender systems (CRSs) are able to elicit user preferences through multi-turn dialogues. They typically incorporate external knowledge and pre-trained language models to capture the dialogue context. Most CRS approaches, trained on benchmark datasets, assume that the standard items and responses in these benchmarks are optimal. However, they overlook that users may express negat… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  12. arXiv:2409.09281  [pdf, other

    cs.CL cs.AI cs.LG

    Language Models "Grok" to Copy

    Authors: Ang Lv, Ruobing Xie, Xingwu Sun, Zhanhui Kang, Rui Yan

    Abstract: We examine the pre-training dynamics of language models, focusing on their ability to copy text from preceding context--a fundamental skill for various LLM applications, including in-context learning (ICL) and retrieval-augmented generation (RAG). We propose a novel perspective that Transformer-based language models develop copying abilities similarly to grokking, which refers to sudden generaliza… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: 5 pages, 7 figures

  13. arXiv:2409.07237  [pdf, other

    cs.IR

    Negative Sampling in Recommendation: A Survey and Future Directions

    Authors: Haokai Ma, Ruobing Xie, Lei Meng, Fuli Feng, Xiaoyu Du, Xingwu Sun, Zhanhui Kang, Xiangxu Meng

    Abstract: Recommender systems aim to capture users' personalized preferences from the cast amount of user behaviors, making them pivotal in the era of information explosion. However, the presence of the dynamic preference, the "information cocoons", and the inherent feedback loops in recommendation make users interact with a limited number of items. Conventional recommendation algorithms typically focus on… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 38 pages, 9 figures; Under review

  14. arXiv:2409.06130  [pdf, other

    cs.CR cs.AI

    On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective

    Authors: Aoting Hu, Yanzhi Chen, Renjie Xie, Adrian Weller

    Abstract: Safeguarding the intellectual property of machine learning models has emerged as a pressing concern in AI security. Model watermarking is a powerful technique for protecting ownership of machine learning models, yet its reliability has been recently challenged by recent watermark removal attacks. In this work, we investigate why existing watermark embedding techniques particularly those based on b… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  15. PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe Questions

    Authors: Yudong Zhang, Ruobing Xie, Jiansheng Chen, Xingwu Sun, Yu Wang

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated their powerful multimodal capabilities. However, they also face serious safety problems, as adversaries can induce robustness issues in LVLMs through the use of well-designed adversarial examples. Therefore, LVLMs are in urgent need of detection tools for adversarial examples to prevent incorrect responses. In this work, we first discover that… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

    Comments: Accepted by ACM Multimedia 2024 BNI track (Oral)

  16. arXiv:2409.02657  [pdf, other

    cs.CV cs.AI cs.MM

    PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation

    Authors: Jun Ling, Yiwen Wang, Han Xue, Rong Xie, Li Song

    Abstract: While previous audio-driven talking head generation (THG) methods generate head poses from driving audio, the generated poses or lips cannot match the audio well or are not editable. In this study, we propose \textbf{PoseTalk}, a THG system that can freely generate lip-synchronized talking head videos with free head poses conditioned on text prompts and audio. The core insight of our method is usi… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 7+5 pages, 15 figures

  17. arXiv:2408.13036  [pdf, other

    cs.CV

    S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points

    Authors: Bing He, Yunuo Chen, Guo Lu, Qi Wang, Qunshan Gu, Rong Xie, Li Song, Wenjun Zhang

    Abstract: Dynamic scene reconstruction using Gaussians has recently attracted increased interest. Mainstream approaches typically employ a global deformation field to warp a 3D scene in canonical space. However, the inherent low-frequency nature of implicit neural fields often leads to ineffective representations of complex motions. Moreover, their structural rigidity can hinder adaptation to scenes with va… ▽ More

    Submitted 6 October, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: 20 pages, 9 figures, 5 tables

  18. arXiv:2408.10681  [pdf, other

    cs.CL cs.LG

    HMoE: Heterogeneous Mixture of Experts for Language Modeling

    Authors: An Wang, Xingwu Sun, Ruobing Xie, Shuaipeng Li, Jiaqi Zhu, Zhen Yang, Pinxue Zhao, J. N. Han, Zhanhui Kang, Di Wang, Naoaki Okazaki, Cheng-zhong Xu

    Abstract: Mixture of Experts (MoE) offers remarkable performance and computational efficiency by selectively activating subsets of model parameters. Traditionally, MoE models use homogeneous experts, each with identical capacity. However, varying complexity in input data necessitates experts with diverse capabilities, while homogeneous MoE hinders effective expert specialization and efficient parameter util… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  19. arXiv:2408.08209  [pdf, other

    cs.IR

    Modeling Domain and Feedback Transitions for Cross-Domain Sequential Recommendation

    Authors: Changshuo Zhang, Teng Shi, Xiao Zhang, Qi Liu, Ruobing Xie, Jun Xu, Ji-Rong Wen

    Abstract: Nowadays, many recommender systems encompass various domains to cater to users' diverse needs, leading to user behaviors transitioning across different domains. In fact, user behaviors across different domains reveal changes in preference toward recommended items. For instance, a shift from negative feedback to positive feedback indicates improved user satisfaction. However, existing cross-domain… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  20. arXiv:2408.01191  [pdf, other

    cs.CV

    A Weakly Supervised and Globally Explainable Learning Framework for Brain Tumor Segmentation

    Authors: Ruitao Xie, Limai Jiang, Xiaoxi He, Yi Pan, Yunpeng Cai

    Abstract: Machine-based brain tumor segmentation can help doctors make better diagnoses. However, the complex structure of brain tumors and expensive pixel-level annotations present challenges for automatic tumor segmentation. In this paper, we propose a counterfactual generation framework that not only achieves exceptional brain tumor segmentation performance without the need for pixel-level annotations, b… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo

  21. arXiv:2407.15866  [pdf, other

    cs.LG cs.AI cs.AR

    SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight Quantization

    Authors: Rui Xie, Asad Ul Haq, Linsen Ma, Krystal Sun, Sanchari Sen, Swagath Venkataramani, Liu Liu, Tong Zhang

    Abstract: Recent studies have revealed that, during the inference on generative AI models such as transformer, the importance of different weights exhibits substantial context-dependent variations. This naturally manifests a promising potential of adaptively configuring weight quantization to improve the generative AI inference efficiency. Although configurable weight quantization can readily leverage the h… ▽ More

    Submitted 17 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  22. arXiv:2407.14386  [pdf, other

    cs.LG

    Frontiers of Deep Learning: From Novel Application to Real-World Deployment

    Authors: Rui Xie

    Abstract: Deep learning continues to re-shape numerous fields, from natural language processing and imaging to data analytics and recommendation systems. This report studies two research papers that represent recent progress on deep learning from two largely different aspects: The first paper applied the transformer networks, which are typically used in language models, to improve the quality of synthetic a… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  23. arXiv:2407.07061  [pdf, other

    cs.CL

    Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

    Authors: Weize Chen, Ziming You, Ran Li, Yitong Guan, Chen Qian, Chenyang Zhao, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun

    Abstract: The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: work in progress

  24. arXiv:2407.03636  [pdf, other

    cs.CV

    Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration

    Authors: Yuhong Zhang, Hengsheng Zhang, Xinning Chai, Zhengxue Cheng, Rong Xie, Li Song, Wenjun Zhang

    Abstract: Image restoration is a classic low-level problem aimed at recovering high-quality images from low-quality images with various degradations such as blur, noise, rain, haze, etc. However, due to the inherent complexity and non-uniqueness of degradation in real-world images, it is challenging for a model trained for single tasks to handle real-world restoration problems effectively. Moreover, existin… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  25. arXiv:2407.03635  [pdf, other

    cs.CV

    MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration

    Authors: Yuhong Zhang, Hengsheng Zhang, Xinning Chai, Rong Xie, Li Song, Wenjun Zhang

    Abstract: Realistic image restoration is a crucial task in computer vision, and the use of diffusion-based models for image restoration has garnered significant attention due to their ability to produce realistic results. However, the quality of the generated images is still a significant challenge due to the severity of image degradation and the uncontrollability of the diffusion model. In this work, we de… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  26. arXiv:2407.02371  [pdf, other

    cs.CV

    OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

    Authors: Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhenheng Yang, Zhijie Chen, Xiang Li, Jian Yang, Ying Tai

    Abstract: Text-to-video (T2V) generation has recently garnered significant attention thanks to the large multi-modality model Sora. However, T2V generation still faces two important challenges: 1) Lacking a precise open sourced high-quality dataset. The previous popular video datasets, e.g. WebVid-10M and Panda-70M, are either with low quality or too large for most research institutions. Therefore, it is ch… ▽ More

    Submitted 2 August, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 15 pages, 9 figures

  27. arXiv:2407.00100  [pdf, other

    cs.LG cs.AI cs.CL

    Enhancing In-Context Learning via Implicit Demonstration Augmentation

    Authors: Xiaoling Zhou, Wei Ye, Yidong Wang, Chaoya Jiang, Zhemg Lee, Rui Xie, Shikun Zhang

    Abstract: The emergence of in-context learning (ICL) enables large pre-trained language models (PLMs) to make predictions for unseen inputs without updating parameters. Despite its potential, ICL's effectiveness heavily relies on the quality, quantity, and permutation of demonstrations, commonly leading to suboptimal and unstable performance. In this paper, we tackle this challenge for the first time from t… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 Main 19 pages,10 figures

    ACM Class: I.2.7

  28. arXiv:2406.15968  [pdf, other

    cs.CL cs.LG

    ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods

    Authors: Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Zhenqiang Gong, Bhuwan Dhingra

    Abstract: The rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the pretraining data used for training them. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during training. We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) to detect LLMs' pretraini… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  29. arXiv:2406.12501  [pdf, other

    cs.IR

    Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback

    Authors: Guipeng Xv, Xinyu Li, Ruobing Xie, Chen Lin, Chong Liu, Feng Xia, Zhanhui Kang, Leyu Lin

    Abstract: Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1) noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content with user feedback. In order to tackle these challenges, we propose Denoising and Aligning Multi-modal Recommende… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  30. arXiv:2406.12298  [pdf

    cs.AI physics.ao-ph

    Research on Dangerous Flight Weather Prediction based on Machine Learning

    Authors: Haoxing Liu, Renjie Xie, Haoshen Qin, Yizhou Li

    Abstract: With the continuous expansion of the scale of air transport, the demand for aviation meteorological support also continues to grow. The impact of hazardous weather on flight safety is critical. How to effectively use meteorological data to improve the early warning capability of flight dangerous weather and ensure the safe flight of aircraft is the primary task of aviation meteorological services.… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  31. Accurate Explanation Model for Image Classifiers using Class Association Embedding

    Authors: Ruitao Xie, Jingbang Chen, Limai Jiang, Rui Xiao, Yi Pan, Yunpeng Cai

    Abstract: Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor… ▽ More

    Submitted 22 August, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 2024 IEEE 40th International Conference on Data Engineering (ICDE)

  32. arXiv:2406.06737  [pdf, other

    cs.CR cs.CL

    Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications

    Authors: Junlin Wang, Tianyi Yang, Roy Xie, Bhuwan Dhingra

    Abstract: With the proliferation of LLM-integrated applications such as GPT-s, millions are deployed, offering valuable services through proprietary instruction prompts. These systems, however, are prone to prompt extraction attacks through meticulously designed queries. To help mitigate this problem, we introduce the Raccoon benchmark which comprehensively evaluates a model's susceptibility to prompt extra… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  33. arXiv:2406.04828  [pdf, other

    cs.IR

    QAGCF: Graph Collaborative Filtering for Q&A Recommendation

    Authors: Changshuo Zhang, Teng Shi, Xiao Zhang, Yanping Zheng, Ruobing Xie, Qi Liu, Jun Xu, Ji-Rong Wen

    Abstract: Question and answer (Q&A) platforms usually recommend question-answer pairs to meet users' knowledge acquisition needs, unlike traditional recommendations that recommend only one item. This makes user behaviors more complex, and presents two challenges for Q&A recommendation, including: the collaborative information entanglement, which means user feedback is influenced by either the question or th… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  34. arXiv:2406.01873  [pdf, other

    cs.CL cs.CR cs.LG

    CR-UTP: Certified Robustness against Universal Text Perturbations on Large Language Models

    Authors: Qian Lou, Xin Liang, Jiaqi Xue, Yancheng Zhang, Rui Xie, Mengxin Zheng

    Abstract: It is imperative to ensure the stability of every prediction made by a language model; that is, a language's prediction should remain consistent despite minor input variations, like word substitutions. In this paper, we investigate the problem of certifying a language model's robustness against Universal Text Perturbations (UTPs), which have been widely used in universal adversarial attacks and ba… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL Findings 2024

  35. arXiv:2405.20701  [pdf, other

    cs.CL cs.AI

    Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement

    Authors: Pengwei Zhan, Zhen Xu, Qian Tan, Jie Song, Ru Xie

    Abstract: Large language models (LLMs) demonstrate exceptional instruct-following ability to complete various downstream tasks. Although this impressive ability makes LLMs flexible task solvers, their performance in solving tasks also heavily relies on instructions. In this paper, we reveal that LLMs are over-sensitive to lexical variations in task instructions, even when the variations are imperceptible to… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  36. arXiv:2405.18979  [pdf, other

    cs.LG stat.ML

    MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

    Authors: Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, Bo An

    Abstract: Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution (OOD) samples without requiring access to the corresponding ground truth labels. Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to predict… ▽ More

    Submitted 24 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: The three first authors contributed equally

  37. arXiv:2405.15280  [pdf, other

    cs.IR cs.AI cs.LG

    DFGNN: Dual-frequency Graph Neural Network for Sign-aware Feedback

    Authors: Yiqing Wu, Ruobing Xie, Zhao Zhang, Xu Zhang, Fuzhen Zhuang, Leyu Lin, Zhanhui Kang, Yongjun Xu

    Abstract: The graph-based recommendation has achieved great success in recent years. However, most existing graph-based recommendations focus on capturing user preference based on positive edges/feedback, while ignoring negative edges/feedback (e.g., dislike, low rating) that widely exist in real-world recommender systems. How to utilize negative feedback in graph-based recommendations still remains underex… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024 Research Track

  38. arXiv:2405.14580  [pdf, other

    cs.GR

    LDM: Large Tensorial SDF Model for Textured Mesh Generation

    Authors: Rengan Xie, Wenting Zheng, Kai Huang, Yizheng Chen, Qi Wang, Qi Ye, Wei Chen, Yuchi Huo

    Abstract: Previous efforts have managed to generate production-ready 3D assets from text or images. However, these methods primarily employ NeRF or 3D Gaussian representations, which are not adept at producing smooth, high-quality geometries required by modern rendering pipelines. In this paper, we propose LDM, a novel feed-forward framework capable of generating high-fidelity, illumination-decoupled textur… ▽ More

    Submitted 13 October, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  39. arXiv:2405.11270  [pdf, other

    cs.CV

    HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos

    Authors: Qifeng Chen, Rengan Xie, Kai Huang, Qi Wang, Wenting Zheng, Rong Li, Yuchi Huo

    Abstract: Recently, implicit neural representation has been widely used to generate animatable human avatars. However, the materials and geometry of those representations are coupled in the neural network and hard to edit, which hinders their application in traditional graphics engines. We present a framework for acquiring human avatars that are attached with high-resolution physically-based material textur… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  40. arXiv:2405.10861  [pdf, other

    cs.CL cs.AI cs.CY

    Tailoring Vaccine Messaging with Common-Ground Opinions

    Authors: Rickard Stureborg, Sanxing Chen, Ruoyu Xie, Aayushi Patel, Christopher Li, Chloe Qinyu Zhu, Tingnan Hu, Jun Yang, Bhuwan Dhingra

    Abstract: One way to personalize chatbot interactions is by establishing common ground with the intended reader. A domain where establishing mutual understanding could be particularly impactful is vaccine concerns and misinformation. Vaccine interventions are forms of messaging which aim to answer concerns expressed about vaccination. Tailoring responses in this domain is difficult, since opinions often hav… ▽ More

    Submitted 23 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: NAACL Findings 2024

    MSC Class: 68T50 (Primary) 68T01; 68T37; 91F20 (Secondary) ACM Class: I.2; I.2.7; I.7

  41. arXiv:2405.03562  [pdf, other

    cs.IR

    ID-centric Pre-training for Recommendation

    Authors: Yiqing Wu, Ruobing Xie, Zhao Zhang, Fuzhen Zhuang, Xu Zhang, Leyu Lin, Zhanhui Kang, Yongjun Xu

    Abstract: Classical sequential recommendation models generally adopt ID embeddings to store knowledge learned from user historical behaviors and represent items. However, these unique IDs are challenging to be transferred to new domains. With the thriving of pre-trained language model (PLM), some pioneer works adopt PLM for pre-trained recommendation, where modality information (e.g., text) is considered un… ▽ More

    Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  42. arXiv:2405.00742  [pdf, other

    cs.CR cs.LG stat.ML

    Federated Graph Learning for EV Charging Demand Forecasting with Personalization Against Cyberattacks

    Authors: Yi Li, Renyou Xie, Chaojie Li, Yi Wang, Zhaoyang Dong

    Abstract: Mitigating cybersecurity risk in electric vehicle (EV) charging demand forecasting plays a crucial role in the safe operation of collective EV chargings, the stability of the power grid, and the cost-effective infrastructure expansion. However, existing methods either suffer from the data privacy issue and the susceptibility to cyberattacks or fail to consider the spatial correlation among differe… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 11 pages,4 figures

  43. arXiv:2404.18109  [pdf, other

    cs.CV

    Finding Beautiful and Happy Images for Mental Health and Well-being Applications

    Authors: Ruitao Xie, Connor Qiu, Guoping Qiu

    Abstract: This paper explores how artificial intelligence (AI) technology can contribute to achieve progress on good health and well-being, one of the United Nations' 17 Sustainable Development Goals. It is estimated that one in ten of the global population lived with a mental disorder. Inspired by studies showing that engaging and viewing beautiful natural images can make people feel happier and less stres… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  44. arXiv:2404.16678  [pdf, other

    cs.CV

    Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior

    Authors: Han Wang, Xinning Chai, Yiwen Wang, Yuhong Zhang, Rong Xie, Li Song

    Abstract: Colorizing grayscale images offers an engaging visual experience. Existing automatic colorization methods often fail to generate satisfactory results due to incorrect semantic colors and unsaturated colors. In this work, we propose an automatic colorization pipeline to overcome these challenges. We leverage the extraordinary generative ability of the diffusion prior to synthesize color with plausi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  45. arXiv:2404.16307  [pdf, other

    cs.LG cs.CV

    Boosting Model Resilience via Implicit Adversarial Data Augmentation

    Authors: Xiaoling Zhou, Wei Ye, Zhemg Lee, Rui Xie, Shikun Zhang

    Abstract: Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To address this, we propose to augment the deep features of samples by incorporating their adversarial and anti-adversarial perturbation distributions, enabling adapti… ▽ More

    Submitted 1 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures, accepted by IJCAI 2024

    ACM Class: I.2.6; I.4.3

  46. arXiv:2404.14567  [pdf, other

    cs.CL

    WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models

    Authors: Ronald Xie, Steven Palayew, Augustin Toma, Gary Bader, Bo Wang

    Abstract: This paper outlines our submission to the MEDIQA2024 Multilingual and Multimodal Medical Answer Generation (M3G) shared task. We report results for two standalone solutions under the English category of the task, the first involving two consecutive API calls to the Claude 3 Opus API and the second involving training an image-disease label joint embedding in the style of CLIP for image classificati… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  47. arXiv:2404.14544  [pdf, other

    cs.CL

    WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction

    Authors: Augustin Toma, Ronald Xie, Steven Palayew, Patrick R. Lawler, Bo Wang

    Abstract: Medical errors in clinical text pose significant risks to patient safety. The MEDIQA-CORR 2024 shared task focuses on detecting and correcting these errors across three subtasks: identifying the presence of an error, extracting the erroneous sentence, and generating a corrected sentence. In this paper, we present our approach that achieved top performance in all three subtasks. For the MS dataset,… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  48. arXiv:2404.08796  [pdf, other

    cs.IR

    The Elephant in the Room: Rethinking the Usage of Pre-trained Language Model in Sequential Recommendation

    Authors: Zekai Qu, Ruobing Xie, Chaojun Xiao, Xingwu Sun, Zhanhui Kang

    Abstract: Sequential recommendation (SR) has seen significant advancements with the help of Pre-trained Language Models (PLMs). Some PLM-based SR models directly use PLM to encode user historical behavior's text sequences to learn user representations, while there is seldom an in-depth exploration of the capability and suitability of PLM in behavior sequence modeling. In this work, we first conduct extensiv… ▽ More

    Submitted 14 August, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted at RecSys 2024

  49. arXiv:2404.04062  [pdf, other

    cs.LG math.OC

    Derivative-free tree optimization for complex systems

    Authors: Ye Wei, Bo Peng, Ruiwen Xie, Yangtao Chen, Yu Qin, Peng Wen, Stefan Bauer, Po-Yen Tung

    Abstract: A tremendous range of design tasks in materials, physics, and biology can be formulated as finding the optimum of an objective function depending on many parameters without knowing its closed-form expression or the derivative. Traditional derivative-free optimization techniques often rely on strong assumptions about objective functions, thereby failing at optimizing non-convex systems beyond 100 d… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 39 pages, 3 figures

  50. arXiv:2404.02078  [pdf, other

    cs.AI cs.CL cs.LG

    Advancing LLM Reasoning Generalists with Preference Trees

    Authors: Lifan Yuan, Ganqu Cui, Hanbin Wang, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, Bowen Zhou, Hao Peng, Zhiyuan Liu, Maosong Sun

    Abstract: We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning. Finetuned from Mistral-7B and CodeLlama-70B, Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks covering mathematics, code generation, and logical reasoning problems. Notably, Eurus-70B beats GPT-3.5 Turbo in reasoning through a comprehensive benchmarking across 1… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Models and data are available at https://github.com/OpenBMB/Eurus