Skip to main content

Showing 1–50 of 6,082 results for author: Chen, Y

  1. arXiv:2410.16272  [pdf, other

    cs.CV

    MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors

    Authors: Honghua Chen, Yushi Lan, Yongwei Chen, Yifan Zhou, Xingang Pan

    Abstract: Drag-based editing has become popular in 2D content creation, driven by the capabilities of image generative models. However, extending this technique to 3D remains a challenge. Existing 3D drag-based editing methods, whether employing explicit spatial transformations or relying on implicit latent optimization within limited-capacity 3D generative models, fall short in handling significant topolog… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 16 pages, 10 figures, conference

  2. arXiv:2410.16155  [pdf, other

    cs.CL

    A Troublemaker with Contagious Jailbreak Makes Chaos in Honest Towns

    Authors: Tianyi Men, Pengfei Cao, Zhuoran Jin, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: With the development of large language models, they are widely used as agents in various fields. A key component of agents is memory, which stores vital information but is susceptible to jailbreak attacks. Existing research mainly focuses on single-agent attacks and shared memory attacks. However, real-world scenarios often involve independent memory. In this paper, we propose the Troublemaker Mak… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  3. arXiv:2410.15959  [pdf, other

    cs.RO cs.CV

    Diffusion Transformer Policy

    Authors: Zhi Hou, Tianyi Zhang, Yuwen Xiong, Hengjun Pu, Chengyang Zhao, Ronglei Tong, Yu Qiao, Jifeng Dai, Yuntao Chen

    Abstract: Recent large visual-language action models pretrained on diverse robot datasets have demonstrated the potential for generalizing to new environments with a few in-domain data. However, those approaches usually predict discretized or continuous actions by a small action head, which limits the ability in handling diverse action spaces. In contrast, we model the continuous action with a large multi-m… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: Preprint

  4. arXiv:2410.15912  [pdf, other

    cs.RO cs.AI

    Bench4Merge: A Comprehensive Benchmark for Merging in Realistic Dense Traffic with Micro-Interactive Vehicles

    Authors: Zhengming Wang, Junli Wang, Pengfei Li, Zhaohan Li, Peng Li, Yilun Chen

    Abstract: While the capabilities of autonomous driving have advanced rapidly, merging into dense traffic remains a significant challenge, many motion planning methods for this scenario have been proposed but it is hard to evaluate them. Most existing closed-loop simulators rely on rule-based controls for other vehicles, which results in a lack of diversity and randomness, thus failing to accurately assess t… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 6 pages, 7 figures, IEEE international conference on robotics and automation

  5. arXiv:2410.15780  [pdf

    cs.CV cs.AI

    An Efficient System for Automatic Map Storytelling -- A Case Study on Historical Maps

    Authors: Ziyi Liu, Claudio Affolter, Sidi Wu, Yizi Chen, Lorenz Hurni

    Abstract: Historical maps provide valuable information and knowledge about the past. However, as they often feature non-standard projections, hand-drawn styles, and artistic elements, it is challenging for non-experts to identify and interpret them. While existing image captioning methods have achieved remarkable success on natural images, their performance on maps is suboptimal as maps are underrepresented… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  6. arXiv:2410.15770  [pdf

    cs.AI

    A roadmap for generative mapping: unlocking the power of generative AI for map-making

    Authors: Sidi Wu, Katharina Henggeler, Yizi Chen, Lorenz Hurni

    Abstract: Maps are broadly relevant across various fields, serving as valuable tools for presenting spatial phenomena and communicating spatial knowledge. However, map-making is still largely confined to those with expertise in GIS and cartography due to the specialized software and complex workflow involved, from data processing to visualization. While generative AI has recently demonstrated its remarkable… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  7. arXiv:2410.15710  [pdf, other

    cs.RO

    Hierarchical Search-Based Cooperative Motion Planning

    Authors: Yuchen Wu, Yifan Yang, Gang Xu, Junjie Cao, Yansong Chen, Licheng Wen, Yong Liu

    Abstract: Cooperative path planning, a crucial aspect of multi-agent systems research, serves a variety of sectors, including military, agriculture, and industry. Many existing algorithms, however, come with certain limitations, such as simplified kinematic models and inadequate support for multiple group scenarios. Focusing on the planning problem associated with a nonholonomic Ackermann model for Unmanned… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  8. arXiv:2410.15665  [pdf, other

    cs.AI cs.LG

    Long Term Memory: The Foundation of AI Self-Evolution

    Authors: Xun Jiang, Feng Li, Han Zhao, Jiaying Wang, Jun Shao, Shihao Xu, Shu Zhang, Weiling Chen, Xavier Tang, Yize Chen, Mengyue Wu, Weizhi Ma, Mengdi Wang, Tianqiao Chen

    Abstract: Large language models (LLMs) like GPTs, trained on vast datasets, have demonstrated impressive capabilities in language understanding, reasoning, and planning, achieving human-level performance in various tasks. Most studies focus on enhancing these models by training on ever-larger datasets to build more powerful foundation models. While training stronger models is important, enabling models to e… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 56 pages, 13 figures

  9. arXiv:2410.15636  [pdf, other

    cs.CV

    LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images

    Authors: Hao He, Yixun Liang, Luozhou Wang, Yuanhao Cai, Xinli Xu, Hao-Xiang Guo, Xiang Wen, Yingcong Chen

    Abstract: Recent large reconstruction models have made notable progress in generating high-quality 3D objects from single images. However, these methods often struggle with controllability, as they lack information from multiple views, leading to incomplete or inconsistent 3D reconstructions. To address this limitation, we introduce LucidFusion, a flexible end-to-end feed-forward framework that leverages th… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 17 pages, 12 figures, project page: coming soon

  10. arXiv:2410.15631  [pdf, other

    cs.SE cs.CR

    Security of Language Models for Code: A Systematic Literature Review

    Authors: Yuchen Chen, Weisong Sun, Chunrong Fang, Zhenpeng Chen, Yifei Ge, Tingxu Han, Quanjun Zhang, Yang Liu, Zhenyu Chen, Baowen Xu

    Abstract: Language models for code (CodeLMs) have emerged as powerful tools for code-related tasks, outperforming traditional methods and standard machine learning approaches. However, these models are susceptible to security vulnerabilities, drawing increasing research attention from domains such as software engineering, artificial intelligence, and cybersecurity. Despite the growing body of research focus… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  11. arXiv:2410.15281  [pdf, other

    cs.RO cs.AI cs.CL cs.HC

    Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment

    Authors: Can Cui, Yunsheng Ma, Zichong Yang, Yupeng Zhou, Peiran Liu, Juanwu Lu, Lingxi Li, Yaobin Chen, Jitesh H. Panchal, Amr Abdelraouf, Rohit Gupta, Kyungtae Han, Ziran Wang

    Abstract: With the broader usage and highly successful development of Large Language Models (LLMs), there has been a growth of interest and demand for applying LLMs to autonomous driving technology. Driven by their natural language understanding and reasoning ability, LLMs have the potential to enhance various aspects of autonomous driving systems, from perception and scene understanding to language interac… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  12. arXiv:2410.15273  [pdf

    cs.HC

    ArchiTone: A LEGO-Inspired Gamified System for Visualized Music Education

    Authors: Jiaxing Yu, Tieyao Zhang, Songruoyao Wu, Xinda Wu, Tingxiao Wu, Yanjun Chen, Kejun Zhang

    Abstract: Participation in music activities has many benefits, but often requires music theory knowledge and aural skills, which can be challenging for beginners. To help them engage more easily, it's crucial to adopt teaching strategies that lower these barriers. Informed by formative investigation and inspired by LEGO, we introduce ArchiTone, a gamified system that employs constructivism by visualizing mu… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  13. arXiv:2410.15247  [pdf, other

    cs.LG cs.AI

    Tensor-Fused Multi-View Graph Contrastive Learning

    Authors: Yujia Wu, Junyi Mo, Elynn Chen, Yuzhou Chen

    Abstract: Graph contrastive learning (GCL) has emerged as a promising approach to enhance graph neural networks' (GNNs) ability to learn rich representations from unlabeled graph-structured data. However, current GCL models face challenges with computational demands and limited feature utilization, often relying only on basic graph properties like node degrees and edge attributes. This constrains their capa… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  14. arXiv:2410.15241  [pdf, other

    cs.LG stat.ML

    Conditional Uncertainty Quantification for Tensorized Topological Neural Networks

    Authors: Yujia Wu, Bo Yang, Yang Zhao, Elynn Chen, Yuzhou Chen, Zheshi Zheng

    Abstract: Graph Neural Networks (GNNs) have become the de facto standard for analyzing graph-structured data, leveraging message-passing techniques to capture both structural and node feature information. However, recent studies have raised concerns about the statistical reliability of uncertainty estimates produced by GNNs. This paper addresses this crucial challenge by introducing a novel technique for qu… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.12007

  15. arXiv:2410.15239  [pdf, other

    cs.LG stat.ML

    Conditional Prediction ROC Bands for Graph Classification

    Authors: Yujia Wu, Bo Yang, Elynn Chen, Yuzhou Chen, Zheshi Zheng

    Abstract: Graph classification in medical imaging and drug discovery requires accuracy and robust uncertainty quantification. To address this need, we introduce Conditional Prediction ROC (CP-ROC) bands, offering uncertainty quantification for ROC curves and robustness to distributional shifts in test data. Although developed for Tensorized Graph Neural Networks (TGNNs), CP-ROC is adaptable to general Graph… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  16. arXiv:2410.15218  [pdf, other

    cs.LG

    Science Time Series: Deep Learning in Hydrology

    Authors: Junyang He, Ying-Jung Chen, Anushka Idamekorala, Geoffrey Fox

    Abstract: This research is part of a systematic study of scientific time series. In the last three years, hundreds of papers and over fifty new deep-learning models have been described for time series models. These mainly focus on the key aspect of time dependence, whereas in some scientific time series, the situation is more complex with multiple locations, each location having multiple observed and target… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  17. arXiv:2410.15108  [pdf

    q-bio.NC cs.LG eess.IV

    The shape of the brain's connections is predictive of cognitive performance: an explainable machine learning study

    Authors: Yui Lo, Yuqian Chen, Dongnan Liu, Wan Liu, Leo Zekelman, Jarrett Rushmore, Fan Zhang, Yogesh Rathi, Nikos Makris, Alexandra J. Golby, Weidong Cai, Lauren J. O'Donnell

    Abstract: The shape of the brain's white matter connections is relatively unexplored in diffusion MRI tractography analysis. While it is known that tract shape varies in populations and across the human lifespan, it is unknown if the variability in dMRI tractography-derived shape may relate to the brain's functional variability across individuals. This work explores the potential of leveraging tractography… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  18. arXiv:2410.15060  [pdf, other

    cs.CV

    BYOCL: Build Your Own Consistent Latent with Hierarchical Representative Latent Clustering

    Authors: Jiayue Dai, Yunya Wang, Yihan Fang, Yuetong Chen, Butian Xiong

    Abstract: To address the semantic inconsistency issue with SAM or other single-image segmentation models handling image sequences, we introduce BYOCL. This novel model outperforms SAM in extensive experiments, showcasing its Hierarchical prototype capabilities across CLIP and other representations. BYOCL significantly reduces time and space consumption by dividing inputs into smaller batches, achieving expo… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: 5 pages, 5 figures

  19. arXiv:2410.14864  [pdf, other

    cs.GT math.OC

    Double Distributionally Robust Bid Shading for First Price Auctions

    Authors: Yanlin Qu, Ravi Kant, Yan Chen, Brendan Kitts, San Gultekin, Aaron Flores, Jose Blanchet

    Abstract: Bid shading has become a standard practice in the digital advertising industry, in which most auctions for advertising (ad) opportunities are now of first price type. Given an ad opportunity, performing bid shading requires estimating not only the value of the opportunity but also the distribution of the highest bid from competitors (i.e. the competitive landscape). Since these two estimates tend… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  20. arXiv:2410.14720  [pdf, other

    cs.LG cs.CV

    SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models

    Authors: Yuqi Li, Yao Lu, Zeyu Dong, Chuanguang Yang, Yihao Chen, Jianping Gou

    Abstract: The deployment of Deep Neural Network (DNN)-based networks on resource-constrained devices remains a significant challenge due to their high computational and parameter requirements. To solve this problem, layer pruning has emerged as a potent approach to reduce network size and improve computational efficiency. However, existing layer pruning methods mostly overlook the intrinsic connections and… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 20 pages

  21. arXiv:2410.14214  [pdf, other

    cs.CV eess.IV

    MambaSCI: Efficient Mamba-UNet for Quad-Bayer Patterned Video Snapshot Compressive Imaging

    Authors: Zhenghao Pan, Haijin Zeng, Jiezhang Cao, Yongyong Chen, Kai Zhang, Yong Xu

    Abstract: Color video snapshot compressive imaging (SCI) employs computational imaging techniques to capture multiple sequential video frames in a single Bayer-patterned measurement. With the increasing popularity of quad-Bayer pattern in mainstream smartphone cameras for capturing high-resolution videos, mobile photography has become more accessible to a wider audience. However, existing color video SCI re… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024

  22. arXiv:2410.13914  [pdf, other

    cs.LG stat.ML

    Exogenous Matching: Learning Good Proposals for Tractable Counterfactual Estimation

    Authors: Yikang Chen, Dehui du, Lili Tian

    Abstract: We propose an importance sampling method for tractable and efficient estimation of counterfactual expressions in general settings, named Exogenous Matching. By minimizing a common upper bound of counterfactual estimators, we transform the variance minimization problem into a conditional distribution learning problem, enabling its integration with existing conditional distribution modeling approach… ▽ More

    Submitted 20 October, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: 51 pages, 15 figures

  23. arXiv:2410.13860  [pdf, other

    cs.CV cs.RO

    VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding

    Authors: Runsen Xu, Zhiwei Huang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin

    Abstract: 3D visual grounding is crucial for robots, requiring integration of natural language and 3D scene understanding. Traditional methods depending on supervised learning with 3D point clouds are limited by scarce datasets. Recently zero-shot methods leveraging LLMs have been proposed to address the data issue. While effective, these methods only use object-centric information, limiting their ability t… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: CoRL 2024 Camera Ready. 25 pages. A novel zero-shot 3D visual grounding framework based solely on 2D images

  24. arXiv:2410.13855  [pdf, other

    cs.LG

    Diffusing States and Matching Scores: A New Framework for Imitation Learning

    Authors: Runzhe Wu, Yiding Chen, Gokul Swamy, Kianté Brantley, Wen Sun

    Abstract: Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function, and can therefore be thought of as the sequential generalization of a Generative Adversarial Network (GAN). A prominent example of this framework is Generative Adversarial Imitation Learning (GAIL). However, in recent years, diffusion models have emerged… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  25. arXiv:2410.13852  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Retrospective Learning from Interactions

    Authors: Zizhao Chen, Mustafa Omer Gul, Yiwei Chen, Gloria Geng, Anne Wu, Yoav Artzi

    Abstract: Multi-turn interactions between large language models (LLMs) and users naturally include implicit feedback signals. If an LLM responds in an unexpected way to an instruction, the user is likely to signal it by rephrasing the request, expressing frustration, or pivoting to an alternative task. Such signals are task-independent and occupy a relatively constrained subspace of language, allowing the L… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  26. arXiv:2410.13805  [pdf, other

    cs.CL

    A Watermark for Order-Agnostic Language Models

    Authors: Ruibo Chen, Yihan Wu, Yanshuo Chen, Chenxi Liu, Junfeng Guo, Heng Huang

    Abstract: Statistical watermarking techniques are well-established for sequentially decoded language models (LMs). However, these techniques cannot be directly applied to order-agnostic LMs, as the tokens in order-agnostic LMs are not generated sequentially. In this work, we introduce Pattern-mark, a pattern-based watermarking framework specifically designed for order-agnostic LMs. We develop a Markov-chain… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  27. arXiv:2410.13523  [pdf, other

    cs.CV cs.AI

    Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?

    Authors: Che Liu, Zhongwei Wan, Haozhe Wang, Yinda Chen, Talha Qaiser, Chen Jin, Fariba Yousefi, Nikolay Burlutskiy, Rossella Arcucci

    Abstract: Medical Vision-Language Pre-training (MedVLP) has made significant progress in enabling zero-shot tasks for medical image understanding. However, training MedVLP models typically requires large-scale datasets with paired, high-quality image-text data, which are scarce in the medical domain. Recent advancements in Large Language Models (LLMs) and diffusion models have made it possible to generate l… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: Under Review

  28. arXiv:2410.13506  [pdf, other

    cs.CE

    Development of a New Type of Vortex Bladeless Wind Turbine for Urban Energy Systems

    Authors: Dongkun Han, Shihan Huang, Pak Kei Abia Hui, Yue Chen

    Abstract: Innovation and development of renewable energy devices are crucial for reaching a sustainable and environmentally conscious future. This work focuses on the development of a new type of renewable energy devices in the context of Smart Garden at the Chinese University of Hong Kong, which aims to design a bladeless wind turbine for urban areas, addressing the pressing need for clean energy locally a… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 6 pages, 9 figures

    MSC Class: 00-02

  29. arXiv:2410.13486  [pdf, other

    cs.CV

    SemSim: Revisiting Weak-to-Strong Consistency from a Semantic Similarity Perspective for Semi-supervised Medical Image Segmentation

    Authors: Shiao Xie, Hongyi Wang, Ziwei Niu, Hao Sun, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin

    Abstract: Semi-supervised learning (SSL) for medical image segmentation is a challenging yet highly practical task, which reduces reliance on large-scale labeled dataset by leveraging unlabeled samples. Among SSL techniques, the weak-to-strong consistency framework, popularized by FixMatch, has emerged as a state-of-the-art method in classification tasks. Notably, such a simple pipeline has also shown compe… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  30. arXiv:2410.13408  [pdf, other

    cs.LG cs.AI cs.CL

    MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

    Authors: Chuanyu Tang, Yilong Chen, Zhenyu Zhang, Junyuan Shang, Wenyuan Zhang, Yong Huang, Tingwen Liu

    Abstract: Low-Rank Adaptation (LoRA) drives research to align its performance with full fine-tuning. However, significant challenges remain: (1) Simply increasing the rank size of LoRA does not effectively capture high-rank information, which leads to a performance bottleneck.(2) MoE-style LoRA methods substantially increase parameters and inference latency, contradicting the goals of efficient fine-tuning… ▽ More

    Submitted 17 October, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

    Comments: 11 pages, 7 figures

  31. arXiv:2410.13237  [pdf, other

    cs.CL cs.AI cs.CR

    Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

    Authors: Yiyi Chen, Qiongxiu Li, Russa Biswas, Johannes Bjerva

    Abstract: Language Confusion is a phenomenon where Large Language Models (LLMs) generate text that is neither in the desired language, nor in a contextually appropriate language. This phenomenon presents a critical challenge in text generation by LLMs, often appearing as erratic and unpredictable behavior. We hypothesize that there are linguistic regularities to this inherent vulnerability in LLMs and shed… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 17 pages, 6 figures, 14 tables

    ACM Class: I.1.2; I.1.5

  32. arXiv:2410.13196  [pdf, other

    cs.AI cs.LG

    Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models

    Authors: Tangwen Qian, Junhe Li, Yile Chen, Gao Cong, Tao Sun, Fei Wang, Yongjun Xu

    Abstract: Modeling trajectory data with generic-purpose dense representations has become a prevalent paradigm for various downstream applications, such as trajectory classification, travel time estimation and similarity computation. However, existing methods typically rely on trajectories from a single spatial view, limiting their ability to capture the rich contextual information that is crucial for gainin… ▽ More

    Submitted 18 October, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

  33. arXiv:2410.13155  [pdf, other

    cs.CL

    SLM-Mod: Small Language Models Surpass LLMs at Content Moderation

    Authors: Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, Koustuv Saha

    Abstract: Large language models (LLMs) have shown promise in many natural language understanding tasks, including content moderation. However, these models can be expensive to query in real-time and do not allow for a community-specific approach to content moderation. To address these challenges, we explore the use of open-source small language models (SLMs) for community-specific content moderation tasks.… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: Preprint: 15 pages, 8 figures, 8 pages

  34. arXiv:2410.13139  [pdf, other

    cs.MA cs.CV cs.HC

    See Behind Walls in Real-time Using Aerial Drones and Augmented Reality

    Authors: Sikai Yang, Kang Yang, Yuning Chen, Fan Zhao, Wan Du

    Abstract: This work presents ARD2, a framework that enables real-time through-wall surveillance using two aerial drones and an augmented reality (AR) device. ARD2 consists of two main steps: target direction estimation and contour reconstruction. In the first stage, ARD2 leverages geometric relationships between the drones, the user, and the target to project the target's direction onto the user's AR displa… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 6 pages

  35. arXiv:2410.13067  [pdf, other

    eess.SY cs.LG math.OC

    Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way

    Authors: Jeongyeol Kwon, Luke Dotson, Yudong Chen, Qiaomin Xie

    Abstract: Previous studies on two-timescale stochastic approximation (SA) mainly focused on bounding mean-squared errors under diminishing stepsize schemes. In this work, we investigate {\it constant} stpesize schemes through the lens of Markov processes, proving that the iterates of both timescales converge to a unique joint stationary distribution in Wasserstein metric. We derive explicit geometric and no… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  36. arXiv:2410.12236  [pdf, other

    cs.LG cs.AI

    Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay

    Authors: Yuyang Chen, Kaiyan Zhao, Yiming Wang, Ming Yang, Jian Zhang, Xiaoguang Niu

    Abstract: Nowadays transformer-based Large Language Models (LLM) for code generation tasks usually apply sampling and filtering pipelines. Due to the sparse reward problem in code generation tasks caused by one-token incorrectness, transformer-based models will sample redundant programs till they find a correct one, leading to low efficiency. To overcome the challenge, we incorporate Experience Replay (ER)… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  37. arXiv:2410.12219  [pdf, other

    cs.AI cs.CL cs.MM

    OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities

    Authors: Lichang Chen, Hexiang Hu, Mingda Zhang, Yiwen Chen, Zifeng Wang, Yandong Li, Pranav Shyam, Tianyi Zhou, Heng Huang, Ming-Hsuan Yang, Boqing Gong

    Abstract: We introduce OmnixR, an evaluation suite designed to benchmark SoTA Omni-modality Language Models, such as GPT-4o and Gemini. Evaluating OLMs, which integrate multiple modalities such as text, vision, and audio, presents unique challenges. Particularly, the user message might often consist of multiple modalities, such that OLMs have to establish holistic understanding and reasoning across modaliti… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 19 pages, 6 figures, 12 tables

  38. arXiv:2410.12138  [pdf, other

    cs.LG cs.CL

    Preference Optimization with Multi-Sample Comparisons

    Authors: Chaoqi Wang, Zhuokai Zhao, Chen Zhu, Karthik Abinav Sankararaman, Michal Valko, Xuefei Cao, Zhaorun Chen, Madian Khabsa, Yuxin Chen, Hao Ma, Sinong Wang

    Abstract: Recent advancements in generative models, particularly large language models (LLMs) and diffusion models, have been driven by extensive pretraining on large datasets followed by post-training. However, current post-training methods such as reinforcement learning from human feedback (RLHF) and direct alignment from preference methods (DAP) primarily utilize single-sample comparisons. These approach… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: preprint

  39. arXiv:2410.12080  [pdf, other

    cs.CV

    SplatPose+: Real-time Image-Based Pose-Agnostic 3D Anomaly Detection

    Authors: Yizhe Liu, Yan Song Hu, Yuhao Chen, John Zelek

    Abstract: Image-based Pose-Agnostic 3D Anomaly Detection is an important task that has emerged in industrial quality control. This task seeks to find anomalies from query images of a tested object given a set of reference images of an anomaly-free object. The challenge is that the query views (a.k.a poses) are unknown and can be different from the reference views. Currently, new methods such as OmniposeAD a… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  40. arXiv:2410.11913  [pdf

    cs.CV

    Development and Testing of a Wood Panels Bark Removal Equipment Based on Deep Learning

    Authors: Rijun Wang, Guanghao Zhang, Hongyang Chen, Xinye Yu, Yesheng Chen, Fulong Liang, Xiangwei Mou, Bo Wang

    Abstract: Attempting to apply deep learning methods to wood panels bark removal equipment to enhance the quality and efficiency of bark removal is a significant and challenging endeavor. This study develops and tests a deep learning-based wood panels bark removal equipment. In accordance with the practical requirements of sawmills, a wood panels bark removal equipment equipped with a vision inspection syste… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  41. arXiv:2410.11845  [pdf, ps, other

    cs.DC

    A Review on Edge Large Language Models: Design, Execution, and Applications

    Authors: Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen

    Abstract: Large language models (LLMs) have revolutionized natural language processing with their exceptional capabilities. However, deploying LLMs on resource-constrained edge devices presents significant challenges due to computational limitations, memory constraints, and edge hardware heterogeneity. This survey summarizes recent developments in edge LLMs across their lifecycle, examining resource-efficie… ▽ More

    Submitted 29 September, 2024; originally announced October 2024.

  42. arXiv:2410.11761  [pdf, other

    cs.CV cs.AI

    SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

    Authors: Ying Chen, Guoan Wang, Yuanfeng Ji, Yanjun Li, Jin Ye, Tianbin Li, Bin Zhang, Nana Pei, Rongshan Yu, Yu Qiao, Junjun He

    Abstract: Despite the progress made by multimodal large language models (MLLMs) in computational pathology, they remain limited by a predominant focus on patch-level analysis, missing essential contextual information at the whole-slide level. The lack of large-scale instruction datasets and the gigapixel scale of whole slide images (WSIs) pose significant developmental challenges. In this paper, we present… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  43. arXiv:2410.11444  [pdf, other

    cs.LG cs.AI stat.ML

    On Championing Foundation Models: From Explainability to Interpretability

    Authors: Shi Fu, Yuzhu Chen, Yingjie Wang, Dacheng Tao

    Abstract: Understanding the inner mechanisms of black-box foundation models (FMs) is essential yet challenging in artificial intelligence and its applications. Over the last decade, the long-running focus has been on their explainability, leading to the development of post-hoc explainable methods to rationalize the specific decisions already made by black-box FMs. However, these explainable methods have cer… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 45 pages, 14 figures

  44. arXiv:2410.11404  [pdf, other

    cs.CV

    MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description

    Authors: Jiawei Mo, Yixuan Chen, Rifen Lin, Yongkang Ni, Min Zeng, Xiping Hu, Min Li

    Abstract: Despite continuous advancements in deep learning for understanding human motion, existing models often struggle to accurately identify action timing and specific body parts, typically supporting only single-round interaction. Such limitations in capturing fine-grained motion details reduce their effectiveness in motion understanding tasks. In this paper, we propose MoChat, a multimodal large langu… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  45. arXiv:2410.11315  [pdf, other

    cs.CL

    SEER: Self-Aligned Evidence Extraction for Retrieval-Augmented Generation

    Authors: Xinping Zhao, Dongfang Li, Yan Zhong, Boren Hu, Yibin Chen, Baotian Hu, Min Zhang

    Abstract: Recent studies in Retrieval-Augmented Generation (RAG) have investigated extracting evidence from retrieved passages to reduce computational costs and enhance the final RAG performance, yet it remains challenging. Existing methods heavily rely on heuristic-based augmentation, encountering several issues: (1) Poor generalization due to hand-crafted context filtering; (2) Semantics deficiency due to… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 15 pages, 6 figures, 5 tables. Accepted by EMNLP 2024 (main)

  46. arXiv:2410.11285  [pdf, other

    cs.CV

    Scalable Indoor Novel-View Synthesis using Drone-Captured 360 Imagery with 3D Gaussian Splatting

    Authors: Yuanbo Chen, Chengyu Zhang, Jason Wang, Xuefan Gao, Avideh Zakhor

    Abstract: Scene reconstruction and novel-view synthesis for large, complex, multi-story, indoor scenes is a challenging and time-consuming task. Prior methods have utilized drones for data capture and radiance fields for scene reconstruction, both of which present certain challenges. First, in order to capture diverse viewpoints with the drone's front-facing camera, some approaches fly the drone in an unsta… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: Accepted to ECCV 2024 S3DSGR Workshop

  47. arXiv:2410.11209  [pdf, other

    cs.CR

    CRUcialG: Reconstruct Integrated Attack Scenario Graphs by Cyber Threat Intelligence Reports

    Authors: Wenrui Cheng, Tiantian Zhu, Tieming Chen, Qixuan Yuan, Jie Ying, Hongmei Li, Chunlin Xiong, Mingda Li, Mingqi Lv, Yan Chen

    Abstract: Cyber Threat Intelligence (CTI) reports are factual records compiled by security analysts through their observations of threat events or their own practical experience with attacks. In order to utilize CTI reports for attack detection, existing methods have attempted to map the content of reports onto system-level attack provenance graphs to clearly depict attack procedures. However, existing stud… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  48. arXiv:2410.11148  [pdf, other

    eess.IV cs.CV

    Deep unrolled primal dual network for TOF-PET list-mode image reconstruction

    Authors: Rui Hu, Chenxu Li, Kun Tian, Jianan Cui, Yunmei Chen, Huafeng Liu

    Abstract: Time-of-flight (TOF) information provides more accurate location data for annihilation photons, thereby enhancing the quality of PET reconstruction images and reducing noise. List-mode reconstruction has a significant advantage in handling TOF information. However, current advanced TOF PET list-mode reconstruction algorithms still require improvements when dealing with low-count data. Deep learnin… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 11 pages, 11 figures

  49. arXiv:2410.10989  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    Liger Kernel: Efficient Triton Kernels for LLM Training

    Authors: Pin-Lun Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, Yanning Chen

    Abstract: Training Large Language Models (LLMs) efficiently at scale presents a formidable challenge, driven by their ever-increasing computational demands and the need for enhanced performance. In this work, we introduce Liger-Kernel, an open-sourced set of Triton kernels developed specifically for LLM training. With kernel optimization techniques like kernel operation fusing and input chunking, our kernel… ▽ More

    Submitted 18 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: 17 pages, 12 figures

  50. arXiv:2410.10874  [pdf

    cs.CL cs.AI

    Optimizing Transformer based on high-performance optimizer for predicting employment sentiment in American social media content

    Authors: Feiyang Wang, Qiaozhi Bao, Zixuan Wang, Yanlin Chen

    Abstract: This article improves the Transformer model based on swarm intelligence optimization algorithm, aiming to predict the emotions of employment related text content on American social media. Through text preprocessing, feature extraction, and vectorization, the text data was successfully converted into numerical data and imported into the model for training. The experimental results show that during… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 5 pages, 5 figures