Skip to main content

Showing 1–50 of 91 results for author: Shang, Y

  1. arXiv:2410.11859  [pdf, other

    cs.HC cs.CY

    SouLLMate: An Adaptive LLM-Driven System for Advanced Mental Health Support and Assessment, Based on a Systematic Application Survey

    Authors: Qiming Guo, Jinwen Tang, Wenbo Sun, Haoteng Tang, Yi Shang, Wenlu Wang

    Abstract: Mental health issues significantly impact individuals' daily lives, yet many do not receive the help they need even with available online resources. This study aims to provide accessible, stigma-free, personalized, and real-time mental health support through cutting-edge AI technologies. It makes the following contributions: (1) Conducting an extensive survey of recent mental health support method… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

  2. arXiv:2410.10818  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

    Authors: Mu Cai, Reuben Tan, Jianrui Zhang, Bocheng Zou, Kai Zhang, Feng Yao, Fangrui Zhu, Jing Gu, Yiwu Zhong, Yuzhang Shang, Yao Dou, Jaden Park, Jianfeng Gao, Yong Jae Lee, Jianwei Yang

    Abstract: Understanding fine-grained temporal dynamics is crucial for multimodal video comprehension and generation. Due to the lack of fine-grained temporal annotations, existing video benchmarks mostly resemble static image benchmarks and are incompetent at evaluating models for temporal understanding. In this paper, we introduce TemporalBench, a new benchmark dedicated to evaluating fine-grained temporal… ▽ More

    Submitted 15 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: Project Page: https://temporalbench.github.io/

  3. arXiv:2410.06809  [pdf, other

    cs.CL cs.CR

    Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level

    Authors: Xinyi Zeng, Yuying Shang, Yutao Zhu, Jiawei Chen, Yu Tian

    Abstract: Large language models (LLMs) have demonstrated immense utility across various industries. However, as LLMs advance, the risk of harmful outputs increases due to incorrect or malicious instruction prompts. While current methods effectively address jailbreak risks, they share common limitations: 1) Judging harmful responses from the prefill-level lacks utilization of the model's decoding outputs, le… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: 19 pages, 9 figures

  4. arXiv:2410.06795  [pdf, other

    cs.CL cs.CV

    From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models

    Authors: Yuying Shang, Xinyi Zeng, Yutao Zhu, Xiao Yang, Zhengwei Fang, Jingyuan Zhang, Jiawei Chen, Zinan Liu, Yu Tian

    Abstract: Hallucinations in large vision-language models (LVLMs) are a significant challenge, i.e., generating objects that are not presented in the visual input, which impairs their reliability. Recent studies often attribute hallucinations to a lack of understanding of visual input, yet ignore a more fundamental issue: the model's inability to effectively extract or decouple visual features. In this paper… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  5. arXiv:2410.06153  [pdf, other

    cs.CL

    AgentSquare: Automatic LLM Agent Search in Modular Design Space

    Authors: Yu Shang, Yu Li, Keyu Zhao, Likai Ma, Jiahe Liu, Fengli Xu, Yong Li

    Abstract: Recent advancements in Large Language Models (LLMs) have led to a rapid growth of agentic systems capable of handling a wide range of complex tasks. However, current research largely relies on manual, task-specific design, limiting their adaptability to novel tasks. In this paper, we introduce a new research problem: Modularized LLM Agent Search (MoLAS). We propose a modular design space that abst… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 26 pages

  6. arXiv:2410.00255  [pdf, other

    cs.AI cs.CL cs.CV

    Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

    Authors: Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan

    Abstract: Recent advancements in 3D Large Language Models (3DLLMs) have highlighted their potential in building general-purpose agents in the 3D real world, yet challenges remain due to the lack of high-quality robust instruction-following data, leading to limited discriminative power and generalization of 3DLLMs. In this paper, we introduce Robin3D, a powerful 3DLLM trained on large-scale instruction-follo… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: 10 pages

  7. arXiv:2409.20034  [pdf, other

    cs.CV

    Camera Calibration using a Collimator System

    Authors: Shunkun Liang, Banglei Guan, Zhenbao Yu, Pengju Sun, Yang Shang

    Abstract: Camera calibration is a crucial step in photogrammetry and 3D vision applications. In practical scenarios with a long working distance to cover a wide area, target-based calibration methods become complicated and inflexible due to site limitations. This paper introduces a novel camera calibration method using a collimator system, which can provide a reliable and controllable calibration environmen… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV2024 (oral presentation)

  8. arXiv:2409.19330  [pdf, other

    cs.CV cs.AI

    3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models

    Authors: Hao Chen, Wei Zhao, Yingli Li, Tianyang Zhong, Yisong Wang, Youlan Shang, Lei Guo, Junwei Han, Tianming Liu, Jun Liu, Tuo Zhang

    Abstract: Medical image analysis is crucial in modern radiological diagnostics, especially given the exponential growth in medical imaging data. The demand for automated report generation systems has become increasingly urgent. While prior research has mainly focused on using machine learning and multimodal language models for 2D medical images, the generation of reports for 3D medical images has been less… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  9. arXiv:2409.17561  [pdf, other

    cs.SE

    TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models

    Authors: Quanjun Zhang, Ye Shang, Chunrong Fang, Siqi Gu, Jianyi Zhou, Zhenyu Chen

    Abstract: Software testing is a crucial phase in the software life cycle, helping identify potential risks and reduce maintenance costs. With the advancement of Large Language Models (LLMs), researchers have proposed an increasing number of LLM-based software testing techniques, particularly in the area of test case generation. Despite the growing interest, limited efforts have been made to thoroughly evalu… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  10. arXiv:2409.12963  [pdf, other

    cs.CV cs.AI cs.LG

    Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner

    Authors: Yuzhang Shang, Bingxin Xu, Weitai Kang, Mu Cai, Yuheng Li, Zehao Wen, Zhen Dong, Kurt Keutzer, Yong Jae Lee, Yan Yan

    Abstract: Advancements in Large Language Models (LLMs) inspire various strategies for integrating video modalities. A key approach is Video-LLMs, which incorporate an optimizable interface linking sophisticated video encoders to LLMs. However, due to computation and data limitations, these Video-LLMs are typically pre-trained to process only short videos, limiting their broader application for understanding… ▽ More

    Submitted 1 October, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

  11. arXiv:2409.10033  [pdf, other

    cs.SE cs.AI

    Can GPT-O1 Kill All Bugs? An Evaluation of GPT-Family LLMs on QuixBugs

    Authors: Haichuan Hu, Ye Shang, Guolin Xu, Congqing He, Quanjun Zhang

    Abstract: LLMs have long demonstrated remarkable effectiveness in automatic program repair (APR), with OpenAI's ChatGPT being one of the most widely used models in this domain. Through continuous iterations and upgrades of GPT-family models, their performance in fixing bugs has already reached state-of-the-art levels. However, there are few works comparing the effectiveness and variations of different versi… ▽ More

    Submitted 16 September, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

  12. arXiv:2409.03550  [pdf, other

    cs.CV cs.AI cs.LG

    DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

    Authors: Qianlong Xiang, Miao Zhang, Yuzhang Shang, Jianlong Wu, Yan Yan, Liqiang Nie

    Abstract: Diffusion models (DMs) have demonstrated exceptional generative capabilities across various areas, while they are hindered by slow inference speeds and high computational demands during deployment. The most common way to accelerate DMs involves reducing the number of denoising steps during generation, achieved through faster sampling solvers or knowledge distillation (KD). In contrast to prior app… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  13. arXiv:2409.03267  [pdf, other

    cs.SE

    No Man is an Island: Towards Fully Automatic Programming by Code Search, Code Generation and Program Repair

    Authors: Quanjun Zhang, Chunrong Fang, Ye Shang, Tongke Zhang, Shengcheng Yu, Zhenyu Chen

    Abstract: Automatic programming attempts to minimize human intervention in the generation of executable code, and has been a long-standing challenge in the software engineering community. To advance automatic programming, researchers are focusing on three primary directions: (1) code search that reuses existing code snippets from external databases; (2) code generation that produces new code snippets from n… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  14. arXiv:2408.14506  [pdf, other

    cs.LG

    Distilling Long-tailed Datasets

    Authors: Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan

    Abstract: Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradi… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  15. arXiv:2408.03225  [pdf, other

    cs.CV

    Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera

    Authors: Zibin Liu, Banglei Guan, Yang Shang, Qifeng Yu, Laurent Kneip

    Abstract: Pose estimation and tracking of objects is a fundamental application in 3D vision. Event cameras possess remarkable attributes such as high dynamic range, low latency, and resilience against motion blur, which enables them to address challenging high dynamic range scenes or high-speed motion. These features make event cameras an ideal complement over standard cameras for object pose estimation. In… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Transactions on Image Processing,2024

  16. arXiv:2408.01614  [pdf, other

    cs.CY cs.AI

    Advancing Mental Health Pre-Screening: A New Custom GPT for Psychological Distress Assessment

    Authors: Jinwen Tang, Yi Shang

    Abstract: This study introduces 'Psycho Analyst', a custom GPT model based on OpenAI's GPT-4, optimized for pre-screening mental health disorders. Enhanced with DSM-5, PHQ-8, detailed data descriptions, and extensive training data, the model adeptly decodes nuanced linguistic indicators of mental health disorders. It utilizes a dual-task framework that includes binary classification and a three-stage PHQ-8… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  17. arXiv:2407.11965  [pdf, other

    cs.CV

    UrbanWorld: An Urban World Model for 3D City Generation

    Authors: Yu Shang, Jiansheng Chen, Hangyu Fan, Jingtao Ding, Jie Feng, Yong Li

    Abstract: Cities, as the most fundamental environment of human life, encompass diverse physical elements such as buildings, roads and vegetation with complex interconnection. Crafting realistic, interactive 3D urban environments plays a crucial role in constructing AI agents capable of perceiving, decision-making, and acting like humans in real-world environments. However, creating high-fidelity 3D urban en… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 11 pages

  18. arXiv:2407.11034  [pdf

    cs.LG

    Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Biomedical Data Analysis

    Authors: Siqi Li, Xin Li, Kunyu Yu, Di Miao, Mingcheng Zhu, Mengying Yan, Yuhe Ke, Danny D'Agostino, Yilin Ning, Qiming Wu, Ziwen Wang, Yuqing Shang, Molei Liu, Chuan Hong, Nan Liu

    Abstract: Clinical and biomedical research in low-resource settings often faces significant challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine l… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  19. arXiv:2407.07268  [pdf, other

    cs.CV

    Dataset Quantization with Active Learning based Adaptive Sampling

    Authors: Zhenghao Zhao, Yuzhang Shang, Junyi Wu, Yan Yan

    Abstract: Deep learning has made remarkable progress recently, largely due to the availability of large, well-labeled datasets. However, the training on such datasets elevates costs and computational demands. To address this, various techniques like coreset selection, dataset distillation, and dataset quantization have been explored in the literature. Unlike traditional techniques that depend on uniform sam… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  20. arXiv:2406.16062  [pdf, other

    cs.NE

    Towards Biologically Plausible Computing: A Comprehensive Comparison

    Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

    Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  21. arXiv:2406.12373  [pdf, other

    cs.CL cs.AI cs.LG

    WebCanvas: Benchmarking Web Agents in Online Environments

    Authors: Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, Zhengyang Wu

    Abstract: For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web… ▽ More

    Submitted 16 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Our platform, tool and dataset are publically available at https://www.imean.ai/web-canvas/ and https://huggingface.co/datasets/iMeanAI/Mind2Web-Live/

    MSC Class: 68T50 ACM Class: I.2.7

  22. arXiv:2405.17921  [pdf

    cs.AI cs.CY

    Towards Clinical AI Fairness: Filling Gaps in the Puzzle

    Authors: Mingxuan Liu, Yilin Ning, Salinelat Teixayavong, Xiaoxuan Liu, Mayli Mertens, Yuqing Shang, Xin Li, Di Miao, Jie Xu, Daniel Shu Wei Ting, Lionel Tim-Ee Cheng, Jasmine Chiat Ling Ong, Zhen Ling Teo, Ting Fang Tan, Narrendar RaviChandran, Fei Wang, Leo Anthony Celi, Marcus Eng Hock Ong, Nan Liu

    Abstract: The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical adva… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  23. arXiv:2405.17403  [pdf, other

    cs.LG cs.AI

    A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

    Authors: Kai Wang, Mingjia Shi, Yukun Zhou, Zekai Li, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You

    Abstract: Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps can be empirically divided into acceleration, deceleration, and convergence areas based on the process increment. ii) These time steps are imbalanced, with many… ▽ More

    Submitted 14 October, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    ACM Class: I.2

  24. arXiv:2405.16005  [pdf, other

    cs.CV

    PTQ4DiT: Post-training Quantization for Diffusion Transformers

    Authors: Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, Yan Yan

    Abstract: The recent introduction of Diffusion Transformers (DiTs) has demonstrated exceptional capabilities in image generation by using a different backbone architecture, departing from traditional U-Nets and embracing the scalable nature of transformers. Despite their advanced capabilities, the wide deployment of DiTs, particularly for real-time applications, is currently hampered by considerable computa… ▽ More

    Submitted 17 October, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024. Code is available at https://github.com/adreamwu/PTQ4DiT

  25. arXiv:2405.15056  [pdf, other

    cs.LG cs.CV cs.GR

    ElastoGen: 4D Generative Elastodynamics

    Authors: Yutao Feng, Yintong Shang, Xiang Feng, Lei Lan, Shandian Zhe, Tianjia Shao, Hongzhi Wu, Kun Zhou, Hao Su, Chenfanfu Jiang, Yin Yang

    Abstract: We present ElastoGen, a knowledge-driven AI model that generates physically accurate 4D elastodynamics. Unlike deep models that learn from video- or image-based observations, ElastoGen leverages the principles of physics and learns from established mathematical and optimization procedures. The core idea of ElastoGen is converting the differential equation, corresponding to the nonlinear force equi… ▽ More

    Submitted 1 October, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  26. arXiv:2405.14136  [pdf, other

    cs.CV

    Efficient Multitask Dense Predictor via Binarization

    Authors: Yuzhang Shang, Dan Xu, Gaowen Liu, Ramana Rao Kompella, Yan Yan

    Abstract: Multi-task learning for dense prediction has emerged as a pivotal area in computer vision, enabling simultaneous processing of diverse yet interrelated pixel-wise prediction tasks. However, the substantial computational demands of state-of-the-art (SoTA) models often limit their widespread deployment. This paper addresses this challenge by introducing network binarization to compress resource-inte… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR'2024

  27. arXiv:2405.13059  [pdf, other

    cs.CL cs.AI

    RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint Multimodal Aspect-Sentiment Analysis

    Authors: Yaxin Liu, Yan Zhou, Ziming Li, Jinchuan Zhang, Yu Shang, Chenyang Zhang, Songlin Hu

    Abstract: As an important multimodal sentiment analysis task, Joint Multimodal Aspect-Sentiment Analysis (JMASA), aiming to jointly extract aspect terms and their associated sentiment polarities from the given text-image pairs, has gained increasing concerns. Existing works encounter two limitations: (1) multi-level modality noise, i.e., instance- and feature-level noise; and (2) multi-grained semantic gap,… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME 2024

  28. arXiv:2404.01640  [pdf, other

    quant-ph cs.DS

    Deterministic Search on Complete Bipartite Graphs by Continuous Time Quantum Walk

    Authors: Honghong Lin, Yun Shang

    Abstract: This paper presents a deterministic search algorithm on complete bipartite graphs. Our algorithm adopts the simple form of alternating iterations of an oracle and a continuous-time quantum walk operator, which is a generalization of Grover's search algorithm. We address the most general case of multiple marked states, so there is a problem of estimating the number of marked states. To this end, we… ▽ More

    Submitted 10 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  29. arXiv:2403.15388  [pdf, other

    cs.CV cs.AI cs.CL

    LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

    Authors: Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan

    Abstract: Large Multimodal Models (LMMs) have shown significant visual reasoning capabilities by connecting a visual encoder and a large language model. LMMs typically take in a fixed and large amount of visual tokens, such as the penultimate layer features in the CLIP visual encoder, as the prefix content. Recent LMMs incorporate more complex visual inputs, such as high-resolution images and videos, which… ▽ More

    Submitted 22 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: Project page: https://llava-prumerge.github.io/

  30. arXiv:2403.09998  [pdf, other

    cs.CV cs.AI

    FBPT: A Fully Binary Point Transformer

    Authors: Zhixing Hou, Yuzhang Shang, Yan Yan

    Abstract: This paper presents a novel Fully Binary Point Cloud Transformer (FBPT) model which has the potential to be widely applied and expanded in the fields of robotics and mobile devices. By compressing the weights and activations of a 32-bit full-precision network to 1-bit binary values, the proposed binary point cloud Transformer network significantly reduces the storage footprint and computational re… ▽ More

    Submitted 9 May, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted to ICRA 2024. arXiv admin note: substantial text overlap with arXiv:2303.01166

  31. arXiv:2403.06251  [pdf, other

    q-bio.NC cs.CV cs.LG

    Online Multi-spectral Neuron Tracing

    Authors: Bin Duan, Yuzhang Shang, Dawen Cai, Yan Yan

    Abstract: In this paper, we propose an online multi-spectral neuron tracing method with uniquely designed modules, where no offline training are required. Our method is trained online to update our enhanced discriminative correlation filter to conglutinate the tracing process. This distinctive offline-training-free schema differentiates us from other training-dependent tracing approaches like deep learning… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  32. arXiv:2403.05235  [pdf

    cs.LG cs.AI cs.CY

    Fairness-Aware Interpretable Modeling (FAIM) for Trustworthy Machine Learning in Healthcare

    Authors: Mingxuan Liu, Yilin Ning, Yuhe Ke, Yuqing Shang, Bibhas Chakraborty, Marcus Eng Hock Ong, Roger Vaughan, Nan Liu

    Abstract: The escalating integration of machine learning in high-stakes fields such as healthcare raises substantial concerns about model fairness. We propose an interpretable framework - Fairness-Aware Interpretable Modeling (FAIM), to improve model fairness without compromising performance, featuring an interactive interface to identify a "fairer" model from a set of high-performing models and promoting t… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  33. arXiv:2403.05229  [pdf

    cs.AI

    Developing Federated Time-to-Event Scores Using Heterogeneous Real-World Survival Data

    Authors: Siqi Li, Yuqing Shang, Ziwen Wang, Qiming Wu, Chuan Hong, Yilin Ning, Di Miao, Marcus Eng Hock Ong, Bibhas Chakraborty, Nan Liu

    Abstract: Survival analysis serves as a fundamental component in numerous healthcare applications, where the determination of the time to specific events (such as the onset of a certain disease or death) for patients is crucial for clinical decision-making. Scoring systems are widely used for swift and efficient risk prediction. However, existing methods for constructing survival scores presume that data or… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  34. arXiv:2402.16363  [pdf, other

    cs.CL cs.AI

    LLM Inference Unveiled: Survey and Roofline Model Insights

    Authors: Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer

    Abstract: The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Although the field has expanded and is vibrant, there hasn't been a concise framework that analyzes the various methods of LLM Inference to provide a clear understanding of this domain. Our survey stands out from traditional literature reviews by not only summ… ▽ More

    Submitted 1 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  35. arXiv:2402.03666  [pdf, other

    cs.CV

    QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

    Authors: Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Junchi Yan, Yan Yan

    Abstract: The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation d… ▽ More

    Submitted 5 September, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Code available at https://github.com/hatchetProject/QuEST

  36. arXiv:2402.02563  [pdf, other

    cs.CL cs.AI cs.LG

    Synergy-of-Thoughts: Eliciting Efficient Reasoning in Hybrid Language Models

    Authors: Yu Shang, Yu Li, Fengli Xu, Yong Li

    Abstract: Large language models (LLMs) have shown impressive emergent abilities in a wide range of tasks, but the associated expensive API cost greatly limits the real application. Previous works like chain-of-thought (CoT) and tree-of-thoughts (ToT) have predominately focused on enhancing accuracy, but overlook the rapidly increasing API cost, which could be particularly problematic for open-ended real-wor… ▽ More

    Submitted 24 August, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 19 pages, 16 figures, 12 tables

  37. arXiv:2401.15318  [pdf, other

    cs.GR cs.AI cs.CV cs.LG

    Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering

    Authors: Yutao Feng, Xiang Feng, Yintong Shang, Ying Jiang, Chang Yu, Zeshun Zong, Tianjia Shao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang

    Abstract: We demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian Splatting and Position-Based Dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesiv… ▽ More

    Submitted 23 July, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

  38. Enhancing Source Code Classification Effectiveness via Prompt Learning Incorporating Knowledge Features

    Authors: Yong Ma, Senlin Luo, Yu-Ming Shang, Yifei Zhang, Zhengjun Li

    Abstract: Researchers have investigated the potential of leveraging pre-trained language models, such as CodeBERT, to enhance source code-related tasks. Previous methodologies have relied on CodeBERT's '[CLS]' token as the embedding representation of input sequences for task performance, necessitating additional neural network layers to enhance feature representation, which in turn increases computational e… ▽ More

    Submitted 19 August, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: Accepted by Scientific Reports

  39. A Novel Prompt-tuning Method: Incorporating Scenario-specific Concepts into a Verbalizer

    Authors: Yong Ma, Senlin Luo, Yu-Ming Shang, Zhengjun Li, Yong Liu

    Abstract: The verbalizer, which serves to map label words to class labels, is an essential component of prompt-tuning. In this paper, we present a novel approach to constructing verbalizers. While existing methods for verbalizer construction mainly rely on augmenting and refining sets of synonyms or related words based on class names, this paradigm suffers from a narrow perspective and lack of abstraction,… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  40. arXiv:2401.01735  [pdf, other

    cs.GT

    Economics Arena for Large Language Models

    Authors: Shangmin Guo, Haoran Bu, Haochuan Wang, Yi Ren, Dianbo Sui, Yuming Shang, Siting Lu

    Abstract: Large language models (LLMs) have been extensively used as the backbones for general-purpose agents, and some economics literature suggest that LLMs are capable of playing various types of economics games. Following these works, to overcome the limitation of evaluating LLMs using static benchmarks, we propose to explore competitive games as an evaluation for LLMs to incorporate multi-players and d… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  41. arXiv:2312.16627  [pdf, other

    cs.LG

    MIM4DD: Mutual Information Maximization for Dataset Distillation

    Authors: Yuzhang Shang, Zhihang Yuan, Yan Yan

    Abstract: Dataset distillation (DD) aims to synthesize a small dataset whose test performance is comparable to a full dataset using the same model. State-of-the-art (SoTA) methods optimize synthetic datasets primarily by matching heuristic indicators extracted from two networks: one from real data and one from synthetic data (see Fig.1, Left), such as gradients and training trajectories. DD is essentially a… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: Accepted to NeurIPS 2023

  42. arXiv:2312.15993  [pdf

    cs.AI cs.RO eess.SY

    Adaptive Kalman-based hybrid car following strategy using TD3 and CACC

    Authors: Yuqi Zheng, Ruidong Yan, Bin Jia, Rui Jiang, Adriana TAPUS, Xiaojing Chen, Shiteng Zheng, Ying Shang

    Abstract: In autonomous driving, the hybrid strategy of deep reinforcement learning and cooperative adaptive cruise control (CACC) can fully utilize the advantages of the two algorithms and significantly improve the performance of car following. However, it is challenging for the traditional hybrid strategy based on fixed coefficients to adapt to mixed traffic flow scenarios, which may decrease the performa… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: 32pages,13figures

  43. arXiv:2312.05821  [pdf, other

    cs.CL

    ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

    Authors: Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, Guangyu Sun

    Abstract: In this paper, we introduce a new post-training compression paradigm for Large Language Models (LLMs) to facilitate their wider adoption. We delve into LLM weight low-rank factorization, and find that the challenges of this task stem from the outlier phenomenon in the LLM activations and the sensitivity difference among various kinds of layers. To address these issues, we propose a training-free a… ▽ More

    Submitted 18 September, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

  44. arXiv:2311.13099  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF

    Authors: Yutao Feng, Yintong Shang, Xuan Li, Tianjia Shao, Chenfanfu Jiang, Yin Yang

    Abstract: We show that physics-based simulations can be seamlessly integrated with NeRF to generate high-quality elastodynamics of real-world objects. Unlike existing methods, we discretize nonlinear hyperelasticity in a meshless way, obviating the necessity for intermediate auxiliary shape proxies like a tetrahedral mesh or voxel grid. A quadratic generalized moving least square (Q-GMLS) is employed to cap… ▽ More

    Submitted 27 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  45. arXiv:2311.03417  [pdf

    cs.LG cs.AI

    Federated Learning for Clinical Structured Data: A Benchmark Comparison of Engineering and Statistical Approaches

    Authors: Siqi Li, Di Miao, Qiming Wu, Chuan Hong, Danny D'Agostino, Xin Li, Yilin Ning, Yuqing Shang, Huazhu Fu, Marcus Eng Hock Ong, Hamed Haddadi, Nan Liu

    Abstract: Federated learning (FL) has shown promising potential in safeguarding data privacy in healthcare collaborations. While the term "FL" was originally coined by the engineering community, the statistical field has also explored similar privacy-preserving algorithms. Statistical FL algorithms, however, remain considerably less recognized than their engineering counterparts. Our goal was to bridge the… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  46. arXiv:2311.02107  [pdf

    cs.LG cs.AI cs.CY

    Generative Artificial Intelligence in Healthcare: Ethical Considerations and Assessment Checklist

    Authors: Yilin Ning, Salinelat Teixayavong, Yuqing Shang, Julian Savulescu, Vaishaanth Nagaraj, Di Miao, Mayli Mertens, Daniel Shu Wei Ting, Jasmine Chiat Ling Ong, Mingxuan Liu, Jiuwen Cao, Michael Dunn, Roger Vaughan, Marcus Eng Hock Ong, Joseph Jao-Yiu Sung, Eric J Topol, Nan Liu

    Abstract: The widespread use of ChatGPT and other emerging technology powered by generative artificial intelligence (GenAI) has drawn much attention to potential ethical issues, especially in high-stakes applications such as healthcare, but ethical discussions are yet to translate into operationalisable solutions. Furthermore, ongoing ethical discussions often neglect other types of GenAI that have been use… ▽ More

    Submitted 23 February, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  47. arXiv:2310.05242  [pdf, other

    cs.CL cs.AI

    ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data

    Authors: Tianyang Zhong, Wei Zhao, Yutong Zhang, Yi Pan, Peixin Dong, Zuowei Jiang, Xiaoyan Kui, Youlan Shang, Li Yang, Yaonai Wei, Longtao Yang, Hao Chen, Huan Zhao, Yuxiao Liu, Ning Zhu, Yiwei Li, Yisong Wang, Jiaqi Yao, Jiaqi Wang, Ying Zeng, Lei He, Chao Zheng, Zhixue Zhang, Ming Li, Zhengliang Liu , et al. (17 additional authors not shown)

    Abstract: Radiology report generation, as a key step in medical image analysis, is critical to the quantitative analysis of clinically informed decision-making levels. However, complex and diverse radiology reports with cross-source heterogeneity pose a huge generalizability challenge to the current methods under massive data volume, mainly because the style and normativity of radiology reports are obviousl… ▽ More

    Submitted 9 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  48. arXiv:2310.00034  [pdf, other

    cs.LG cs.AI cs.CL

    PB-LLM: Partially Binarized Large Language Models

    Authors: Yuzhang Shang, Zhihang Yuan, Qiang Wu, Zhen Dong

    Abstract: This paper explores network binarization, a radical form of quantization, compressing model weights to a single bit, specifically for Large Language Models (LLMs) compression. Due to previous binarization methods collapsing LLMs, we propose a novel approach, Partially-Binarized LLM (PB-LLM), which can achieve extreme low-bit quantization while maintaining the linguistic reasoning capacity of quant… ▽ More

    Submitted 7 November, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: Frist work using network binarization for large language model compression

  49. arXiv:2309.13682  [pdf, other

    cs.CV cs.LG

    Causal-DFQ: Causality Guided Data-free Network Quantization

    Authors: Yuzhang Shang, Bingxin Xu, Gaowen Liu, Ramana Kompella, Yan Yan

    Abstract: Model quantization, which aims to compress deep neural networks and accelerate inference speed, has greatly facilitated the development of cumbersome models on mobile and edge devices. There is a common assumption in quantization methods from prior works that training data is available. In practice, however, this assumption cannot always be fulfilled due to reasons of privacy and security, renderi… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV2023

  50. arXiv:2304.01089  [pdf, other

    cs.CL

    RPTQ: Reorder-based Post-training Quantization for Large Language Models

    Authors: Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu

    Abstract: Large-scale language models (LLMs) have demonstrated impressive performance, but their deployment presents challenges due to their significant memory usage. This issue can be alleviated through quantization. In this paper, we identify that the challenge in quantizing activations in LLMs arises from varying ranges across channels, rather than solely the presence of outliers. To address this challen… ▽ More

    Submitted 17 May, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: 18 pages