Skip to main content

Showing 1–50 of 417 results for author: Wei, T

  1. arXiv:2410.15163  [pdf, other

    cs.AI

    Optimizing Large Language Models for Dynamic Constraints through Human-in-the-Loop Discriminators

    Authors: Timothy Wei, Annabelle Miin, Anastasia Miin

    Abstract: Large Language Models (LLMs) have recently demonstrated impressive capabilities across various real-world applications. However, due to the current text-in-text-out paradigm, it remains challenging for LLMs to handle dynamic and complex application constraints, let alone devise general solutions that meet predefined system goals. Current common practices like model finetuning and reflection-based… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  2. arXiv:2410.14281  [pdf, other

    cs.LG

    PTR: A Pre-trained Language Model for Trajectory Recovery

    Authors: Tonglong Wei, Yan Lin, Youfang Lin, Shengnan Guo, Jilin Hu, Gao Cong, Huaiyu Wan

    Abstract: Spatiotemporal trajectory data is vital for web-of-things services and is extensively collected and analyzed by web-based hardware and platforms. However, issues such as service interruptions and network instability often lead to sparsely recorded trajectories, resulting in a loss of detailed movement data. As a result, recovering these trajectories to restore missing information becomes essential… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  3. arXiv:2410.12802  [pdf, other

    cs.RO

    Resolving Positional Ambiguity in Dialogues by Vision-Language Models for Robot Navigation

    Authors: Kuan-Lin Chen, Tzu-Ti Wei, Li-Tzu Yeh, Elaine Kao, Yu-Chee Tseng, Jen-Jee Chen

    Abstract: We consider an autonomous navigation robot that can accept human commands through natural language to provide services in an indoor environment. These natural language commands may include time, position, object, and action components. However, we observe that the positional components within such commands usually refer to objects in the environment that may contain different levels of positional… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

  4. arXiv:2410.12165  [pdf, other

    cs.CV cs.AI

    Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution

    Authors: Timothy Wei, Hsien Xin Peng, Elaine Xu, Bryan Zhao, Lei Ding, Diji Yang

    Abstract: As Artificial Intelligence models, such as Large Video-Language models (VLMs), grow in size, their deployment in real-world applications becomes increasingly challenging due to hardware limitations and computational costs. To address this, we design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based… ▽ More

    Submitted 20 October, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

  5. arXiv:2410.12152  [pdf

    physics.flu-dyn

    Fluid Dynamics and Passive Scalar Transport Driven by Non-Uniform Tumbling of a Prolate Spheroid in Simple Shear Flow

    Authors: Yanxing Wang, Hui Wan, Tie Wei, Fangjun Shu

    Abstract: Using high-fidelity numerical simulations based on a lattice Boltzmann framework, the advection-enhanced transport of a passive scalar from a prolate spheroid in simple shear flow has been thoroughly investigated across various parameters, including the spheroid's aspect ratio, particle-to-fluid density ratio, Reynolds number, and Schmidt number. The Reynolds number is constrained to the range fro… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  6. arXiv:2410.11235  [pdf, other

    cs.CL

    Unleashing the Power of LLMs as Multi-Modal Encoders for Text and Graph-Structured Data

    Authors: Jiacheng Lin, Kun Qian, Haoyu Han, Nurendra Choudhary, Tianxin Wei, Zhongruo Wang, Sahika Genc, Edward W Huang, Sheng Wang, Karthik Subbian, Danai Koutra, Jimeng Sun

    Abstract: Graph-structured information offers rich contextual information that can enhance language models by providing structured relationships and hierarchies, leading to more expressive embeddings for various applications such as retrieval, question answering, and classification. However, existing methods for integrating graph and text embeddings, often based on Multi-layer Perceptrons (MLPs) or shallow… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  7. arXiv:2410.06467  [pdf, other

    cs.CR

    WAPITI: A Watermark for Finetuned Open-Source LLMs

    Authors: Lingjie Chen, Ruizhong Qiu, Siyu Yuan, Zhining Liu, Tianxin Wei, Hyunsik Yoo, Zhichen Zeng, Deqing Yang, Hanghang Tong

    Abstract: Watermarking of large language models (LLMs) generation embeds an imperceptible statistical pattern within texts, making it algorithmically detectable. Watermarking is a promising method for addressing potential harm and biases from LLMs, as it enables traceability, accountability, and detection of manipulated content, helping to mitigate unintended consequences. However, for open-source models, w… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  8. arXiv:2410.06109  [pdf, other

    cs.LG

    Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition

    Authors: Zi-Hao Zhou, Siyuan Fang, Zi-Jing Zhou, Tong Wei, Yuanyu Wan, Min-Ling Zhang

    Abstract: Long-tailed semi-supervised learning poses a significant challenge in training models with limited labeled data exhibiting a long-tailed label distribution. Current state-of-the-art LTSSL approaches heavily rely on high-quality pseudo-labels for large-scale unlabeled data. However, these methods often neglect the impact of representations learned by the neural network and struggle with real-world… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Accepted at NeurIPS 2024

  9. arXiv:2410.03937  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Clustering Alzheimer's Disease Subtypes via Similarity Learning and Graph Diffusion

    Authors: Tianyi Wei, Shu Yang, Davoud Ataee Tarzanagh, Jingxuan Bao, Jia Xu, Patryk Orzechowski, Joost B. Wagenaar, Qi Long, Li Shen

    Abstract: Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Due to the heterogeneous nature of AD, its diagnosis and treatment pose critical challenges. Consequently, there is a growing research interest in identifying homogeneous AD subtypes that can assist in addressing these challenges in recent years. In this study, we aim to identify subtypes of… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: ICIBM'23': International Conference on Intelligent Biology and Medicine, Tampa, FL, USA, July 16-19, 2023

  10. arXiv:2410.01212  [pdf, other

    cs.LG

    Absolute State-wise Constrained Policy Optimization: High-Probability State-wise Constraints Satisfaction

    Authors: Weiye Zhao, Feihan Li, Yifan Sun, Yujie Wang, Rui Chen, Tianhao Wei, Changliu Liu

    Abstract: Enforcing state-wise safety constraints is critical for the application of reinforcement learning (RL) in real-world problems, such as autonomous driving and robot manipulation. However, existing safe RL methods only enforce state-wise constraints in expectation or enforce hard state-wise constraints with strong assumptions. The former does not exclude the probability of safety violations, while t… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: submission to Journal of Machine Learning Research

  11. arXiv:2409.19696  [pdf, other

    cs.LG cs.CV

    Vision-Language Models are Strong Noisy Label Detectors

    Authors: Tong Wei, Hao-Tian Li, Chun-Shu Li, Jiang-Xin Shi, Yu-Feng Li, Min-Ling Zhang

    Abstract: Recent research on fine-tuning vision-language models has demonstrated impressive performance in various downstream tasks. However, the challenge of obtaining accurately labeled data in real-world applications poses a significant obstacle during the fine-tuning process. To address this challenge, this paper presents a Denoising Fine-Tuning framework, called DeFT, for adapting vision-language model… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: Accepted at NeurIPS 2024

  12. arXiv:2409.17882  [pdf, other

    cs.MA

    Multi-UAV Enabled MEC Networks: Optimizing Delay through Intelligent 3D Trajectory Planning and Resource Allocation

    Authors: Zhiying Wang, Tianxi Wei, Gang Sun, Xinyue Liu, Hongfang Yu, Dusit Niyato

    Abstract: Mobile Edge Computing (MEC) reduces the computational burden on terminal devices by shortening the distance between these devices and computing nodes. Integrating Unmanned Aerial Vehicles (UAVs) with enhanced MEC networks can leverage the high mobility of UAVs to flexibly adjust network topology, further expanding the applicability of MEC. However, in highly dynamic and complex real-world environm… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  13. arXiv:2409.12159  [pdf, other

    cs.RO

    WeHelp: A Shared Autonomy System for Wheelchair Users

    Authors: Abulikemu Abuduweili, Alice Wu, Tianhao Wei, Weiye Zhao

    Abstract: There is a large population of wheelchair users. Most of the wheelchair users need help with daily tasks. However, according to recent reports, their needs are not properly satisfied due to the lack of caregivers. Therefore, in this project, we develop WeHelp, a shared autonomy system aimed for wheelchair users. A robot with a WeHelp system has three modes, following mode, remote control mode and… ▽ More

    Submitted 18 September, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

  14. arXiv:2409.08443  [pdf, other

    cs.CV

    CF-PRNet: Coarse-to-Fine Prototype Refining Network for Point Cloud Completion and Reconstruction

    Authors: Zhi Chen, Tianqi Wei, Zecheng Zhao, Jia Syuen Lim, Yadan Luo, Hu Zhang, Xin Yu, Scott Chapman, Zi Huang

    Abstract: In modern agriculture, precise monitoring of plants and fruits is crucial for tasks such as high-throughput phenotyping and automated harvesting. This paper addresses the challenge of reconstructing accurate 3D shapes of fruits from partial views, which is common in agricultural settings. We introduce CF-PRNet, a coarse-to-fine prototype refining network, leverages high-resolution 3D data during t… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: Technical Report of the 1st place solution to CVPPA@ECCV2024: Shape Completion and Reconstruction of Sweet Peppers Challenge

  15. arXiv:2409.04777  [pdf, other

    cs.LG math.OC

    Optimization Hyper-parameter Laws for Large Language Models

    Authors: Xingyu Xie, Kuangyu Ding, Shuicheng Yan, Kim-Chuan Toh, Tianwen Wei

    Abstract: Large Language Models have driven significant AI advancements, yet their training is resource-intensive and highly sensitive to hyper-parameter selection. While scaling laws provide valuable guidance on model size and data requirements, they fall short in choosing dynamic hyper-parameters, such as learning-rate (LR) schedules, that evolve during training. To bridge this gap, we present Optimizatio… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  16. arXiv:2409.04038  [pdf, other

    cs.CV

    PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation

    Authors: Tianqi Wei, Zhi Chen, Xin Yu, Scott Chapman, Paul Melloy, Zi Huang

    Abstract: Plant diseases pose significant threats to agriculture. It necessitates proper diagnosis and effective treatment to safeguard crop yields. To automate the diagnosis process, image segmentation is usually adopted for precisely identifying diseased regions, thereby advancing precision agriculture. Developing robust image segmentation models for plant diseases demands high-quality annotations across… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  17. arXiv:2409.04003  [pdf, other

    cs.CV

    DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

    Authors: Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Tiantian Wei, Min Dou, Botian Shi, Yong Liu

    Abstract: Recent advances in diffusion models have significantly enhanced the cotrollable generation of streetscapes for and facilitated downstream perception and planning tasks. However, challenges such as maintaining temporal coherence, generating long videos, and accurately modeling driving scenes persist. Accordingly, we propose DreamForge, an advanced diffusion-based autoregressive video generation mod… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Second place solution for W-CODA-Track2

  18. arXiv:2409.03594  [pdf, other

    cs.GT

    A Complete Landscape of EFX Allocations of Mixed Manna on Graphs

    Authors: Yu Zhou, Tianze Wei, Minming Li, Bo Li

    Abstract: We study envy-free up to any item (EFX) allocations on graphs where vertices and edges represent agents and items respectively. An agent is only interested in items that are incident to her and all other items have zero marginal values to her. Christodoulou et al. [EC, 2023] first proposed this setting and studied the case of goods. We extend this setting to the case of mixed manna where an item m… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted in IJCAI 2024

  19. arXiv:2409.03312  [pdf, other

    quant-ph

    Quantum Algorithm For Testing Convexity of Function

    Authors: Nhat A. Nghiem, Tzu-Chieh Wei

    Abstract: Functions are a fundamental object in mathematics, with countless applications to different fields, and are usually classified based on certain properties, given their domains and images. An important property of a real-valued function is its convexity, which plays a very crucial role in many areas, such as thermodynamics and geometry. Motivated by recent advances in quantum computation as well as… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  20. arXiv:2409.02848  [pdf, other

    quant-ph cond-mat.dis-nn

    Subspace-thermal discrete time crystals from phase transitions between different n-tuple discrete time crystals

    Authors: Hongye Yu, Tzu-Chieh Wei

    Abstract: We propose a new Floquet time crystal model that responds in arbitrary multiples of the driving period. Such an $n$-tuple discrete time crystal is theoretically constructed by permuting spins in a disordered chain and is well suited for experiment implementations. Transitions between these time crystals with different periods give rise to a novel phase of matter that we call subspace-thermal discr… ▽ More

    Submitted 3 October, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: 30 pages, 7 figures

  21. arXiv:2409.02797  [pdf, ps, other

    eess.SP

    Joint Beamforming for Backscatter Integrated Sensing and Communication

    Authors: Zongyao Zhao, Tiankuo Wei, Zhenyu Liu, Xinke Tang, Xiao-Ping Zhang, Yuhan Dong

    Abstract: Integrated sensing and communication (ISAC) is a key technology of next generation wireless communication. Backscatter communication (BackCom) plays an important role for internet of things (IoT). Then the integration of ISAC with BackCom technology enables low-power data transmission while enhancing the system sensing ability, which is expected to provide a potentially revolutionary solution for… ▽ More

    Submitted 4 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: 6 pages, 4 figures, IEEE Global Communications Conference (Globecom) 2024. This paper is the conference version of the following work: arXiv:2407.19235

  22. arXiv:2409.02074  [pdf, other

    cs.CR cs.HC cs.LG cs.SE

    RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer

    Authors: Jiangyi Deng, Xinfeng Li, Yanjiao Chen, Yijie Bai, Haiqin Weng, Yan Liu, Tao Wei, Wenyuan Xu

    Abstract: Malicious shell commands are linchpins to many cyber-attacks, but may not be easy to understand by security analysts due to complicated and often disguised code structures. Advances in large language models (LLMs) have unlocked the possibility of generating understandable explanations for shell commands. However, existing general-purpose LLMs suffer from a lack of expert knowledge and a tendency t… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by NDSS Symposium 2025. Please cite this paper as "Jiangyi Deng, Xinfeng Li, Yanjiao Chen, Yijie Bai, Haiqin Weng, Yan Liu, Tao Wei, Wenyuan Xu. RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer. In the 32nd Annual Network and Distributed System Security Symposium (NDSS 2025)."

  23. arXiv:2408.15251  [pdf, other

    cs.CV cs.LG

    TrajFM: A Vehicle Trajectory Foundation Model for Region and Task Transferability

    Authors: Yan Lin, Tonglong Wei, Zeyu Zhou, Haomin Wen, Jilin Hu, Shengnan Guo, Youfang Lin, Huaiyu Wan

    Abstract: Vehicle trajectories provide valuable movement information that supports various downstream tasks and powers real-world applications. A desirable trajectory learning model should transfer between different regions and tasks without retraining, thus improving computational efficiency and effectiveness with limited training data. However, a model's ability to transfer across regions is limited by th… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  24. arXiv:2408.14723  [pdf, other

    cs.CV cs.IR

    Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild

    Authors: Tianqi Wei, Zhi Chen, Xin Yu

    Abstract: Plant disease recognition is a critical task that ensures crop health and mitigates the damage caused by diseases. A handy tool that enables farmers to receive a diagnosis based on query pictures or the text description of suspicious plants is in high demand for initiating treatment before potential diseases spread further. In this paper, we develop a multimodal plant disease image retrieval syste… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  25. arXiv:2408.08788  [pdf

    cs.LG

    Neighbor Overlay-Induced Graph Attention Network

    Authors: Tiqiao Wei, Ye Yuan

    Abstract: Graph neural networks (GNNs) have garnered significant attention due to their ability to represent graph data. Among various GNN variants, graph attention network (GAT) stands out since it is able to dynamically learn the importance of different nodes. However, present GATs heavily rely on the smoothed node features to obtain the attention coefficients rather than graph structural information, whi… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  26. arXiv:2408.05586  [pdf, other

    cs.LG cs.IR

    Meta Clustering of Neural Bandits

    Authors: Yikun Ban, Yunzhe Qi, Tianxin Wei, Lihui Liu, Jingrui He

    Abstract: The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of $T$ rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance betwee… ▽ More

    Submitted 26 September, 2024; v1 submitted 10 August, 2024; originally announced August 2024.

    Comments: Accepted by KDD 2024

  27. arXiv:2408.03120  [pdf, other

    cs.CV

    Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline

    Authors: Tianqi Wei, Zhi Chen, Zi Huang, Xin Yu

    Abstract: Existing plant disease classification models have achieved remarkable performance in recognizing in-laboratory diseased images. However, their performance often significantly degrades in classifying in-the-wild images. Furthermore, we observed that in-the-wild plant images may exhibit similar appearances across various diseases (i.e., small inter-class discrepancy) while the same diseases may look… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  28. arXiv:2408.02128  [pdf, other

    cs.CL

    Table Transformers for Imputing Textual Attributes

    Authors: Ting-Ruen Wei, Yuan Wang, Yoshitaka Inoue, Hsin-Tai Wu, Yi Fang

    Abstract: Missing data in tabular dataset is a common issue as the performance of downstream tasks usually depends on the completeness of the training dataset. Previous missing data imputation methods focus on numeric and categorical columns, but we propose a novel end-to-end approach called Table Transformers for Imputing Textual Attributes (TTITA) based on the transformer to impute unstructured textual co… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  29. arXiv:2408.01566  [pdf, other

    cs.CV

    Full-range Head Pose Geometric Data Augmentations

    Authors: Huei-Chung Hu, Xuyang Wu, Haowei Liu, Ting-Ruen Wei, Hsin-Tai Wu

    Abstract: Many head pose estimation (HPE) methods promise the ability to create full-range datasets, theoretically allowing the estimation of the rotation and positioning of the head from various angles. However, these methods are only accurate within a range of head angles; exceeding this specific range led to significant inaccuracies. This is dominantly explained by unclear specificity of the coordinate s… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.18104

  30. arXiv:2408.00415  [pdf, other

    cs.RO cs.AI cs.CV

    DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

    Authors: Xuemeng Yang, Licheng Wen, Yukai Ma, Jianbiao Mei, Xin Li, Tiantian Wei, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

    Abstract: This paper presented DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios. DriveArena features a flexible, modular architecture, allowing for the seamless interchange of its core components: Traffic Manager, a traffic simulator capable of generating realistic traffic flow on any worldwide street map, and World Dreamer, a high-fi… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 19 pages, 9 figures

  31. arXiv:2408.00117  [pdf, other

    cs.CV cs.LG cs.RO eess.SY

    Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods

    Authors: Xusheng Luo, Tianhao Wei, Simin Liu, Ziwei Wang, Luis Mattei-Mendez, Taylor Loper, Joshua Neighbor, Casidhe Hutchison, Changliu Liu

    Abstract: This work addresses the certification of the local robustness of vision-based two-stage 6D object pose estimation. The two-stage method for object pose estimation achieves superior accuracy by first employing deep neural network-driven keypoint regression and then applying a Perspective-n-Point (PnP) technique. Despite advancements, the certification of these methods' robustness remains scarce. Th… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

    Comments: 25 pages, 10 figures, 5 tables

  32. arXiv:2407.20532  [pdf, other

    eess.SY

    Scalable Synthesis of Formally Verified Neural Value Function for Hamilton-Jacobi Reachability Analysis

    Authors: Yujie Yang, Hanjiang Hu, Tianhao Wei, Shengbo Eben Li, Changliu Liu

    Abstract: Hamilton-Jacobi (HJ) reachability analysis provides a formal method for guaranteeing safety in constrained control problems. It synthesizes a value function to represent a long-term safe set called feasible region. Early synthesis methods based on state space discretization cannot scale to high-dimensional problems, while recent methods that use neural networks to approximate value functions resul… ▽ More

    Submitted 31 July, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  33. arXiv:2407.20147  [pdf, other

    quant-ph cs.AI cs.ET cs.LG cs.NE

    Quantum Machine Learning Architecture Search via Deep Reinforcement Learning

    Authors: Xin Dai, Tzu-Chieh Wei, Shinjae Yoo, Samuel Yen-Chi Chen

    Abstract: The rapid advancement of quantum computing (QC) and machine learning (ML) has given rise to the burgeoning field of quantum machine learning (QML), aiming to capitalize on the strengths of quantum computing to propel ML forward. Despite its promise, crafting effective QML models necessitates profound expertise to strike a delicate balance between model intricacy and feasibility on Noisy Intermedia… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE International Conference on Quantum Computing and Engineering - QCE 2024

  34. arXiv:2407.19667  [pdf, other

    cs.AI

    Smart Language Agents in Real-World Planning

    Authors: Annabelle Miin, Timothy Wei

    Abstract: Comprehensive planning agents have been a long term goal in the field of artificial intelligence. Recent innovations in Natural Language Processing have yielded success through the advent of Large Language Models (LLMs). We seek to improve the travel-planning capability of such LLMs by extending upon the work of the previous paper TravelPlanner. Our objective is to explore a new method of using LL… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 5 pages, 1 figure

  35. arXiv:2407.19235  [pdf, ps, other

    eess.SP eess.SY

    B-ISAC: Backscatter Integrated Sensing and Communication for 6G IoE Applications

    Authors: Zongyao Zhao, Yuhan Dong, Tiankuo Wei, Xiao-Ping Zhang, Xinke Tang, Zhenyu Liu

    Abstract: The integration of backscatter communication (BackCom) technology with integrated sensing and communication (ISAC) technology not only enhances the system sensing performance, but also enables low-power information transmission. This is expected to provide a new paradigm for communication and sensing in internet of everything (IoE) applications. Existing works only consider sensing rate and detect… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 15 pages, 11 figures, submitted to IEEE Internet of Things Journal (IoTJ) on April 1st 2024

  36. arXiv:2407.19079  [pdf, other

    cs.CV

    UniForensics: Face Forgery Detection via General Facial Representation

    Authors: Ziyuan Fang, Hanqing Zhao, Tianyi Wei, Wenbo Zhou, Ming Wan, Zhanyi Wang, Weiming Zhang, Nenghai Yu

    Abstract: Previous deepfake detection methods mostly depend on low-level textural features vulnerable to perturbations and fall short of detecting unseen forgery methods. In contrast, high-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. Motivated by this, we propose a detection method that utilizes high-level s… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  37. arXiv:2407.15815  [pdf, other

    cs.RO cs.AI cs.CV

    Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning

    Authors: Zhecheng Yuan, Tianming Wei, Shuiqi Cheng, Gu Zhang, Yuanpei Chen, Huazhe Xu

    Abstract: Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose \textbf{Maniwhere}, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning app… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Webpage: https://gemcollector.github.io/maniwhere/

  38. arXiv:2407.11744  [pdf, other

    quant-ph

    Improved Quantum Power Method and Numerical Integration Using Quantum Singular Value Transformation

    Authors: Nhat A. Nghiem, Hiroki Sukeno, Shuyu Zhang, Tzu-Chieh Wei

    Abstract: Quantum singular value transformation (QSVT) is a framework that has been shown to unify many primitives in quantum algorithms. In this work, we leverage the QSVT framework in two directions. We first show that the QSVT framework can accelerate one recently introduced quantum power method, which substantially improves its running time. Additionally, we incorporate several elementary numerical inte… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  39. arXiv:2407.08554  [pdf, other

    cs.AI cs.HC

    Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

    Authors: Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

    Abstract: A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl… ▽ More

    Submitted 28 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 24 pages

  40. arXiv:2407.08348  [pdf, other

    cs.AI cs.CL cs.LG

    Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

    Authors: Liang Zeng, Liangjun Zhong, Liang Zhao, Tianwen Wei, Liu Yang, Jujie He, Cheng Cheng, Rui Hu, Yang Liu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this paper, we investigate the underlying factors that potentially enhance the mathematical reasoning capabilities of large language models (LLMs). We argue that the data scaling law for math reasoning capabilities in modern LLMs is far from being saturated, highlighting how the model's quality improves with increases in data quantity. To support this claim, we introduce the Skywork-Math model… ▽ More

    Submitted 17 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  41. arXiv:2407.01639  [pdf, other

    cs.LG cs.SE

    ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks

    Authors: Tianhao Wei, Luca Marzari, Kai S. Yun, Hanjiang Hu, Peizhi Niu, Xusheng Luo, Changliu Liu

    Abstract: Deep Neural Networks (DNN) are crucial in approximating nonlinear functions across diverse applications, ranging from image classification to control. Verifying specific input-output properties can be a highly challenging task due to the lack of a single, self-contained framework that allows a complete range of verification types. To this end, we present \texttt{ModelVerification.jl (MV)}, the fir… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  42. arXiv:2406.16204  [pdf, other

    cs.CV

    Breaking the Frame: Visual Place Recognition by Overlap Prediction

    Authors: Tong Wei, Philipp Lindenberger, Jiri Matas, Daniel Barath

    Abstract: Visual place recognition methods struggle with occlusions and partial visual overlaps. We propose a novel visual place recognition approach based on overlap prediction, called VOP, shifting from traditional reliance on global image similarities and local features to image overlap prediction. VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backb… ▽ More

    Submitted 7 October, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  43. arXiv:2406.15863  [pdf, other

    cs.CV

    EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation

    Authors: Tianyu Wei, Shanmin Pang, Qi Guo, Yizhuo Ma, Qing Guo

    Abstract: Text-to-image diffusion models can create realistic images based on input texts. Users can describe an object to convey their opinions visually. In this work, we unveil a previously unrecognized and latent risk of using diffusion models to generate images; we utilize emotion in the input texts to introduce negative contents, potentially eliciting unfavorable emotions in users. Emotions play a cruc… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  44. arXiv:2406.14056  [pdf, other

    cs.CV

    VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning

    Authors: Ziyang Meng, Yu Dai, Zezheng Gong, Shaoxiong Guo, Minglong Tang, Tongquan Wei

    Abstract: Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in ha… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages

    MSC Class: 68-04 68-04 ACM Class: I.2.7; I.2.10

  45. arXiv:2406.13187  [pdf, other

    cs.LG

    Boosting Consistency in Dual Training for Long-Tailed Semi-Supervised Learning

    Authors: Kai Gan, Tong Wei, Min-Ling Zhang

    Abstract: While long-tailed semi-supervised learning (LTSSL) has received tremendous attention in many real-world classification problems, existing LTSSL algorithms typically assume that the class distributions of labeled and unlabeled data are almost identical. Those LTSSL algorithms built upon the assumption can severely suffer when the class distributions of labeled and unlabeled data are mismatched sinc… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  46. arXiv:2406.12638  [pdf, other

    cs.CV cs.LG

    Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model

    Authors: Jiang-Xin Shi, Chi Zhang, Tong Wei, Yu-Feng Li

    Abstract: Pre-trained vision-language models like CLIP have shown powerful zero-shot inference ability via image-text matching and prove to be strong few-shot learners in various downstream tasks. However, in real-world scenarios, adapting CLIP to downstream tasks may encounter the following challenges: 1) data may exhibit long-tailed data distributions and might not have abundant samples for all the classe… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  47. arXiv:2406.10801  [pdf, other

    cs.CV

    Saliency-guided and Patch-based Mixup for Long-tailed Skin Cancer Image Classification

    Authors: Tianyunxi Wei, Yijin Huang, Li Lin, Pujin Cheng, Sirui Li, Xiaoying Tang

    Abstract: Medical image datasets often exhibit long-tailed distributions due to the inherent challenges in medical data collection and annotation. In long-tailed contexts, some common disease categories account for most of the data, while only a few samples are available in the rare disease categories, resulting in poor performance of deep learning methods. To address this issue, previous approaches have em… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: IEEE ISBI2024

  48. arXiv:2406.08964  [pdf

    physics.bio-ph

    Kinetics-Optimized Enhanced Sampling Using Mean First Passage Times

    Authors: Tiejun Wei, Balint Dudas, Edina Rosta

    Abstract: Molecular dynamics simulations have become essential in many areas of atomistic modelling from drug discovery to materials science. They provide critical atomic-level insights into key dynamical events experiments cannot easily capture. However, their impact often falls short as the timescales of the important processes are inaccessible using standard molecular dynamics. Enhanced sampling methods… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  49. arXiv:2406.08835  [pdf, other

    cs.SD eess.AS

    EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech Recognition Architecture with High Accuracy and Inference Speed

    Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Ming Fang, Tao Wei, Zijian Li, Ning Cheng, Wei Hu, Shaojun Wang, Jing Xiao

    Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. In this paper, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EffectiveASR. It uses an Index Mappin… ▽ More

    Submitted 28 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Submitted to ICASSP 2025

  50. arXiv:2406.07362  [pdf, other

    cs.HC

    AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database

    Authors: Wanling Gao, Yuan Liu, Zhuoming Yu, Dandan Cui, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Fan Huang, Gangyuan Zhao, Chongrong Jiang, Tianyi Wei, Zhifei Zhang, Yunyou Huang, Jianfeng Zhan

    Abstract: Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f… ▽ More

    Submitted 28 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 12 pages