Skip to main content

Showing 1–50 of 363 results for author: Wu, G

  1. arXiv:2410.15268  [pdf, other

    cs.LG cs.CL

    TAGExplainer: Narrating Graph Explanations for Text-Attributed Graph Learning Models

    Authors: Bo Pan, Zhen Xiong, Guanchen Wu, Zheng Zhang, Yifei Zhang, Liang Zhao

    Abstract: Representation learning of Text-Attributed Graphs (TAGs) has garnered significant attention due to its applications in various domains, including recommendation systems and social networks. Despite advancements in TAG learning methodologies, challenges remain in explainability due to the black-box nature of existing TAG representation learning models. This paper presents TAGExplainer, the first me… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  2. arXiv:2410.15067  [pdf, other

    cs.CV eess.IV

    A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends

    Authors: Junjun Jiang, Zengyuan Zuo, Gang Wu, Kui Jiang, Xianming Liu

    Abstract: Image restoration (IR) refers to the process of improving visual quality of images while removing degradation, such as noise, blur, weather effects, and so on. Traditional IR methods typically target specific types of degradation, which limits their effectiveness in real-world scenarios with complex distortions. In response to this challenge, the all-in-one image restoration (AiOIR) paradigm has e… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  3. arXiv:2410.13211  [pdf, other

    cs.LG cs.AI stat.ML

    Estimating the Probabilities of Rare Outputs in Language Models

    Authors: Gabriel Wu, Jacob Hilton

    Abstract: We consider the problem of low probability estimation: given a machine learning model and a formally-specified input distribution, how can we estimate the probability of a binary property of the model's output, even when that probability is too small to estimate by random sampling? This problem is motivated by the need to improve worst-case performance, which distribution shift can make much more… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 27 pages, 9 figures

  4. arXiv:2410.05740  [pdf, other

    cs.RO cs.AI eess.SY

    Learning to Race in Extreme Turning Scene with Active Exploration and Gaussian Process Regression-based MPC

    Authors: Guoqiang Wu, Cheng Hu, Wangjia Weng, Zhouheng Li, Yonghao Fu, Lei Xie, Hongye Su

    Abstract: Extreme cornering in racing often induces large side-slip angles, presenting a formidable challenge in vehicle control. To tackle this issue, this paper introduces an Active Exploration with Double GPR (AEDGPR) system. The system initiates by planning a minimum-time trajectory with a Gaussian Process Regression(GPR) compensated model. The planning results show that in the cornering section, the ya… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  5. arXiv:2410.03043  [pdf, other

    cs.LG

    Towards Understanding the Feasibility of Machine Unlearning

    Authors: Mahtab Sarvmaili, Hassan Sajjad, Ga Wu

    Abstract: In light of recent privacy regulations, machine unlearning has attracted significant attention in the research community. However, current studies predominantly assess the overall success of unlearning approaches, overlooking the varying difficulty of unlearning individual training samples. As a result, the broader feasibility of machine unlearning remains under-explored. This paper presents a set… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  6. arXiv:2409.19214  [pdf, other

    stat.ML cs.LG

    Group Distributionally Robust Optimization can Suppress Class Imbalance Effect in Network Traffic Classification

    Authors: Wumei Du, Qi Wang, Yiqin Lv, Dong Liang, Guanlin Wu, Xingxing Liang, Zheng Xie

    Abstract: Internet services have led to the eruption of traffic, and machine learning on these Internet data has become an indispensable tool, especially when the application is risk-sensitive. This paper focuses on network traffic classification in the presence of class imbalance, which fundamentally and ubiquitously exists in Internet data analysis. This existence of class imbalance mostly drifts the opti… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  7. arXiv:2409.17805  [pdf, other

    cs.CV

    Cascade Prompt Learning for Vision-Language Model Adaptation

    Authors: Ge Wu, Xin Zhang, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang, Xiang Li

    Abstract: Prompt learning has surfaced as an effective approach to enhance the performance of Vision-Language Models (VLMs) like CLIP when applied to downstream tasks. However, current learnable prompt tokens are primarily used for the single phase of adapting to tasks (i.e., adapting prompt), easily leading to overfitting risks. In this work, we propose a novel Cascade Prompt Learning CasPL framework to en… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: ECCV2024

  8. arXiv:2409.17510  [pdf, other

    q-bio.NC cs.AI cs.CV cs.LG

    NeuroPath: A Neural Pathway Transformer for Joining the Dots of Human Connectomes

    Authors: Ziquan Wei, Tingting Dan, Jiaqi Ding, Guorong Wu

    Abstract: Although modern imaging technologies allow us to study connectivity between two distinct brain regions in-vivo, an in-depth understanding of how anatomical structure supports brain function and how spontaneous functional fluctuations emerge remarkable cognition is still elusive. Meanwhile, tremendous efforts have been made in the realm of machine learning to establish the nonlinear mapping between… ▽ More

    Submitted 1 October, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

    Comments: Accepted by NeurIPS 2024

  9. arXiv:2409.14072  [pdf, other

    cs.CV

    Dynamic 2D Gaussians: Geometrically accurate radiance fields for dynamic objects

    Authors: Shuai Zhang, Guanjun Wu, Xinggang Wang, Bin Feng, Wenyu Liu

    Abstract: Reconstructing objects and extracting high-quality surfaces play a vital role in the real world. Current 4D representations show the ability to render high-quality novel views for dynamic objects but cannot reconstruct high-quality meshes due to their implicit or geometrically inaccurate representations. In this paper, we propose a novel representation that can reconstruct accurate meshes from spa… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

  10. arXiv:2409.11377  [pdf, other

    cs.LG

    Machine Learning on Dynamic Functional Connectivity: Promise, Pitfalls, and Interpretations

    Authors: Jiaqi Ding, Tingting Dan, Ziquan Wei, Hyuna Cho, Paul J. Laurienti, Won Hwa Kim, Guorong Wu

    Abstract: An unprecedented amount of existing functional Magnetic Resonance Imaging (fMRI) data provides a new opportunity to understand the relationship between functional fluctuation and human cognition/behavior using a data-driven approach. To that end, tremendous efforts have been made in machine learning to predict cognitive states from evolving volumetric images of blood-oxygen-level-dependent (BOLD)… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

  11. arXiv:2409.06863  [pdf, other

    cs.LG cs.HC

    Towards Understanding Human Emotional Fluctuations with Sparse Check-In Data

    Authors: Sagar Paresh Shah, Ga Wu, Sean W. Kortschot, Samuel Daviau

    Abstract: Data sparsity is a key challenge limiting the power of AI tools across various domains. The problem is especially pronounced in domains that require active user input rather than measurements derived from automated sensors. It is a critical barrier to harnessing the full potential of AI in domains requiring active user engagement, such as self-reported mood check-ins, where capturing a continuous… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  12. arXiv:2409.04243  [pdf, other

    cs.CV

    Hybrid Cost Volume for Memory-Efficient Optical Flow

    Authors: Yang Zhao, Gangwei Xu, Gang Wu

    Abstract: Current state-of-the-art flow methods are mostly based on dense all-pairs cost volumes. However, as image resolution increases, the computational and spatial complexity of constructing these cost volumes grows at a quartic rate, making these methods impractical for high-resolution images. In this paper, we propose a novel Hybrid Cost Volume for memory-efficient optical flow, named HCV. To construc… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: 10 pages, 6 figures

  13. arXiv:2408.10527  [pdf, other

    cs.CV cs.AI

    EdgeNAT: Transformer for Efficient Edge Detection

    Authors: Jinghuai Jie, Yan Guo, Guixing Wu, Junmin Wu, Baojian Hua

    Abstract: Transformers, renowned for their powerful feature extraction capabilities, have played an increasingly prominent role in various vision tasks. Especially, recent advancements present transformer with hierarchical structures such as Dilated Neighborhood Attention Transformer (DiNAT), demonstrating outstanding ability to efficiently capture both global and local features. However, transformers' appl… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  14. arXiv:2408.10411  [pdf, other

    cs.CL

    Resolving Lexical Bias in Edit Scoping with Projector Editor Networks

    Authors: Hammad Rizwan, Domenic Rosati, Ga Wu, Hassan Sajjad

    Abstract: Weight-preserving model editing techniques heavily rely on the scoping mechanism that decides when to apply an edit to the base model. These scoping mechanisms utilize distance functions in the representation space to ascertain the scope of the edit. In this work, we show that distance-based scoping functions grapple with lexical biases leading to issues such as misfires with irrelevant prompts th… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  15. arXiv:2408.07930  [pdf, other

    cs.CL cs.AI

    MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL

    Authors: Wenxuan Xie, Gaochen Wu, Bowen Zhou

    Abstract: Recent In-Context Learning based methods have achieved remarkable success in Text-to-SQL task. However, there is still a large gap between the performance of these models and human performance on datasets with complex database schema and difficult questions, such as BIRD. Besides, existing work has neglected to supervise intermediate steps when solving questions iteratively with question decomposi… ▽ More

    Submitted 7 October, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: 22 pages, 14 figures

  16. arXiv:2408.05775  [pdf, other

    cs.CV

    Efficient Test-Time Prompt Tuning for Vision-Language Models

    Authors: Yuhan Zhu, Guozhen Zhang, Chen Xu, Haocheng Shen, Xiaoxin Chen, Gangshan Wu, Limin Wang

    Abstract: Vision-language models have showcased impressive zero-shot classification capabilities when equipped with suitable text prompts. Previous studies have shown the effectiveness of test-time prompt tuning; however, these methods typically require per-image prompt adaptation during inference, which incurs high computational budgets and limits scalability and practical deployment. To overcome this issu… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  17. arXiv:2408.01950  [pdf, other

    cs.SD cs.CL eess.AS

    Why Perturbing Symbolic Music is Necessary: Fitting the Distribution of Never-used Notes through a Joint Probabilistic Diffusion Model

    Authors: Shipei Liu, Xiaoya Fan, Guowei Wu

    Abstract: Existing music generation models are mostly language-based, neglecting the frequency continuity property of notes, resulting in inadequate fitting of rare or never-used notes and thus reducing the diversity of generated samples. We argue that the distribution of notes can be modeled by translational invariance and periodicity, especially using diffusion models to generalize notes by injecting freq… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  18. arXiv:2407.21045  [pdf

    cs.CL cs.AI

    Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research

    Authors: Boyan Xu, Liang Wen, Zihao Li, Yuxing Yang, Guanlan Wu, Xiongpeng Tang, Yu Li, Zihao Wu, Qingxian Su, Xueqing Shi, Yue Yang, Rui Tong, How Yong Ng

    Abstract: Recent advancements in Large Language Models (LLMs) have sparked interest in their potential applications across various fields. This paper embarked on a pivotal inquiry: Can existing LLMs effectively serve as "water expert models" for water engineering and research tasks? This study was the first to evaluate LLMs' contributions across various water engineering and research tasks by establishing a… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  19. arXiv:2407.20026  [pdf, other

    cs.MS math.OC

    JAX-SSO: Differentiable Finite Element Analysis Solver for Structural Optimization and Seamless Integration with Neural Networks

    Authors: Gaoyuan Wu

    Abstract: Differentiable numerical simulations of physical systems have gained rising attention in the past few years with the development of automatic differentiation tools. This paper presents JAX-SSO, a differentiable finite element analysis solver built with JAX, Google's high-performance computing library, to assist efficient structural design in the built environment. With the adjoint method and autom… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  20. arXiv:2407.19224  [pdf, other

    cs.SD cs.MM eess.AS

    RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues

    Authors: Tianrui Pan, Jie Liu, Bohan Wang, Jie Tang, Gangshan Wu

    Abstract: While existing Audio-Visual Speech Separation (AVSS) methods primarily concentrate on the audio-visual fusion strategy for two-speaker separation, they demonstrate a severe performance drop in the multi-speaker separation scenarios. Typically, AVSS methods employ guiding videos to sequentially isolate individual speakers from the given audio mixture, resulting in notable missing and noisy parts ac… ▽ More

    Submitted 29 July, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

    Comments: Accepted by MM 2024

  21. arXiv:2407.14105  [pdf, other

    cs.FL cs.LO

    Quasi-Isometric Reductions Between Infinite Strings

    Authors: Karen Frilya Celine, Ziyuan Gao, Sanjay Jain, Ryan Lou, Frank Stephan, Guohua Wu

    Abstract: This paper studies the recursion-theoretic aspects of large-scale geometries of infinite strings, a subject initiated by Khoussainov and Takisaka (2017). We investigate several notions of quasi-isometric reductions between recursive infinite strings and prove various results on the equivalence classes of such reductions. The main result is the construction of two infinite recursive strings $α$ and… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  22. arXiv:2407.12260  [pdf, other

    cs.HC

    HuBar: A Visual Analytics Tool to Explore Human Behaviour based on fNIRS in AR guidance systems

    Authors: Sonia Castelo, Joao Rulff, Parikshit Solunke, Erin McGowan, Guande Wu, Iran Roman, Roque Lopez, Bea Steers, Qi Sun, Juan Bello, Bradley Feest, Michael Middleton, Ryan Mckendrick, Claudio Silva

    Abstract: The concept of an intelligent augmented reality (AR) assistant has significant, wide-ranging applications, with potential uses in medicine, military, and mechanics domains. Such an assistant must be able to perceive the environment and actions, reason about the environment state in relation to a given task, and seamlessly interact with the task performer. These interactions typically involve an AR… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures. This is the author's version of the article that has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics (TVCG)

  23. arXiv:2407.10756  [pdf, other

    cs.CV

    GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation

    Authors: Haonan Wang, Jie Liu, Jie Tang, Gangshan Wu, Bo Xu, Yanbing Chou, Yong Wang

    Abstract: In recent years, 2D human pose estimation has made significant progress on public benchmarks. However, many of these approaches face challenges of less applicability in the industrial community due to the large number of parametric quantities and computational overhead. Efficient human pose estimation remains a hurdle, especially for whole-body pose estimation with numerous keypoints. While most c… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 accepted

  24. arXiv:2407.07332  [pdf, ps, other

    cs.IT

    Several new classes of optimal ternary cyclic codes with two or three zeros

    Authors: Gaofei Wu, Zhuohui You, Zhengbang Zha, Yuqing Zhang

    Abstract: Cyclic codes are a subclass of linear codes and have wide applications in data storage systems, communication systems and consumer electronics due to their efficient encoding and decoding algorithms. Let $α$ be a generator of $\mathbb{F}_{3^m}^*$, where $m$ is a positive integer. Denote by $\mathcal{C}_{(i_1,i_2,\cdots, i_t)}$ the cyclic code with generator polynomial… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 16 pages

  25. arXiv:2407.06516  [pdf, other

    cs.CV

    VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving

    Authors: Yibo Liu, Zheyuan Yang, Guile Wu, Yuan Ren, Kejian Lin, Bingbing Liu, Yang Liu, Jinjun Shan

    Abstract: Generating 3D vehicle assets from in-the-wild observations is crucial to autonomous driving. Existing image-to-3D methods cannot well address this problem because they learn generation merely from image RGB information without a deeper understanding of in-the-wild vehicles (such as car models, manufacturers, etc.). This leads to their poor zero-shot prediction capability to handle real-world obser… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  26. arXiv:2407.04969  [pdf, other

    cs.CL

    EVA-Score: Evaluating Abstractive Long-form Summarization on Informativeness through Extraction and Validation

    Authors: Yuchen Fan, Xin Zhong, Yazhe Wan, Chengsi Wang, Haonan Cheng, Gaoche Wu, Ning Ding, Bowen Zhou

    Abstract: Since LLMs emerged, more attention has been paid to abstractive long-form summarization, where longer input sequences indicate more information contained. Nevertheless, the automatic evaluation of such summaries remains underexplored. The current evaluation metrics for long-form summarization either use similarity-based metrics like ROUGE and BERTScore or LLM-based metrics using appropriate prompt… ▽ More

    Submitted 15 October, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

    Comments: 20 pages

  27. arXiv:2407.04603  [pdf, other

    cs.CV

    AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

    Authors: Yuhan Zhu, Yuyang Ji, Zhiyu Zhao, Gangshan Wu, Limin Wang

    Abstract: Pre-trained vision-language models (VLMs) have shown impressive results in various visual classification tasks. However, we often fail to fully unleash their potential when adapting them for new concept understanding due to limited information on new classes. To address this limitation, we introduce a novel adaptation framework, AWT (Augment, Weight, then Transport). AWT comprises three key compon… ▽ More

    Submitted 6 October, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted by NeurIPS 2024

  28. arXiv:2407.04504  [pdf, other

    cs.CV

    Segment Any 4D Gaussians

    Authors: Shengxiang Ji, Guanjun Wu, Jiemin Fang, Jiazhong Cen, Taoran Yi, Wenyu Liu, Qi Tian, Xinggang Wang

    Abstract: Modeling, understanding, and reconstructing the real world are crucial in XR/VR. Recently, 3D Gaussian Splatting (3D-GS) methods have shown remarkable success in modeling and understanding 3D scenes. Similarly, various 4D representations have demonstrated the ability to capture the dynamics of the 4D world. However, there is a dearth of research focusing on segmentation within 4D representations.… ▽ More

    Submitted 12 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 22 pages

  29. arXiv:2407.03277  [pdf, other

    cs.CL

    Evaluating Automatic Metrics with Incremental Machine Translation Systems

    Authors: Guojun Wu, Shay B. Cohen, Rico Sennrich

    Abstract: We introduce a dataset comprising commercial machine translations, gathered weekly over six years across 12 translation directions. Since human A/B testing is commonly used, we assume commercial systems improve over time, which enables us to evaluate machine translation (MT) metrics based on their preference for more recent translations. Our study not only confirms several prior findings, such as… ▽ More

    Submitted 3 October, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  30. arXiv:2406.18462  [pdf, other

    cs.CV cs.GR

    GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

    Authors: Taoran Yi, Jiemin Fang, Zanwei Zhou, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Xinggang Wang, Qi Tian

    Abstract: Recently, 3D Gaussian splatting (3D-GS) has achieved great success in reconstructing and rendering real-world scenes. To transfer the high rendering quality to generation tasks, a series of research works attempt to generate 3D-Gaussian assets from text. However, the generated assets have not achieved the same quality as those in reconstruction tasks. We observe that Gaussians tend to grow without… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Project page: https://taoranyi.com/gaussiandreamerpro/

  31. arXiv:2406.10857  [pdf, other

    cs.SE

    An LLM-enhanced Multi-objective Evolutionary Search for Autonomous Driving Test Scenario Generation

    Authors: Haoxiang Tian, Xingshuo Han, Guoquan Wu, Yuan Zhou, Shuo Li, Jun Wei, Dan Ye, Wei Wang, Tianwei Zhang

    Abstract: The safety of Autonomous Driving Systems (ADSs) is significantly important for the implementation of autonomous vehicles (AVs). Therefore, ADSs must be evaluated thoroughly before their release and deployment to the public. How to generate diverse safety-critical test scenarios is a key task for ADS testing. This paper proposes LEADE, an LLM-enhanced scenario generation approach for ADS testing, w… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages

  32. arXiv:2406.09016  [pdf, other

    cs.CV

    Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark

    Authors: Gaochang Wu, Yapeng Zhang, Lan Deng, Jingxin Zhang, Tianyou Chai

    Abstract: Fused Magnesium Furnace (FMF) is a crucial industrial equipment in the production of magnesia, and anomaly detection plays a pivotal role in ensuring its efficient, stable, and secure operation. Existing anomaly detection methods primarily focus on analyzing dominant anomalies using the process variables (such as arc current) or constructing neural networks based on abnormal visual features, while… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 14 pages, 6 figures, 5 tables. Submitted to IEEE

  33. arXiv:2406.06252  [pdf, other

    eess.SP cs.CR

    Random Time-hopping Secure Ranging Strategy Against Distance-Reduction Attacks in UWB

    Authors: Wenlong Gou, Chuanhang Yu, Gang Wu

    Abstract: In order to mitigate the distance reduction attack in Ultra-Wide Band (UWB) ranging, this paper proposes a secure ranging scheme based on a random time-hopping mechanism without redundant signaling overhead. Additionally, a secure ranging strategy is designed for backward compatibility with existing standards such as IEEE 802.15.4a/z, combined with an attack detection scheme. The effectiveness and… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    ACM Class: H.1.1

  34. arXiv:2406.05343  [pdf, other

    cs.AI cs.CL

    M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark

    Authors: Wei Song, Yadong Li, Jianhua Xu, Guowei Wu, Lingfeng Ming, Kexin Yi, Weihua Luo, Houyi Li, Yi Du, Fangda Guo, Kaicheng Yu

    Abstract: As recent multi-modality large language models (MLLMs) have shown formidable proficiency on various complex tasks, there has been increasing attention on debating whether these models could eventually mirror human intelligence. However, existing benchmarks mainly focus on evaluating solely on task performance, such as the accuracy of identifying the attribute of an object. Combining well-developed… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  35. arXiv:2406.03843  [pdf, other

    cs.HC cs.AI

    POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models

    Authors: Jianben He, Xingbo Wang, Shiyi Liu, Guande Wu, Claudio Silva, Huamin Qu

    Abstract: Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have primarily focused on textual or visual inputs, thus neglecting the complex interplay between modaliti… ▽ More

    Submitted 30 September, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: 11 pages, 6 figures

    MSC Class: 68 ACM Class: H.5; I.2.1

  36. arXiv:2406.00833  [pdf, other

    cs.CY cs.AI

    Harvard Undergraduate Survey on Generative AI

    Authors: Shikoh Hirabayashi, Rishab Jain, Nikola Jurković, Gabriel Wu

    Abstract: How has generative AI impacted the experiences of college students? We study the influence of AI on the study habits, class choices, and career prospects of Harvard undergraduates (n=326), finding that almost 90% of students use generative AI. For roughly 25% of these students, AI has begun to substitute for attending office hours and completing required readings. Half of students are concerned th… ▽ More

    Submitted 7 August, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  37. arXiv:2405.18255  [pdf, other

    cs.CR cs.SI eess.SP

    Channel Reciprocity Based Attack Detection for Securing UWB Ranging by Autoencoder

    Authors: Wenlong Gou, Chuanhang Yu, Juntao Ma, Gang Wu, Vladimir Mordachev

    Abstract: A variety of ranging threats represented by Ghost Peak attack have raised concerns regarding the security performance of Ultra-Wide Band (UWB) systems with the finalization of the IEEE 802.15.4z standard. Based on channel reciprocity, this paper proposes a low complexity attack detection scheme that compares Channel Impulse Response (CIR) features of both ranging sides utilizing an autoencoder wit… ▽ More

    Submitted 10 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    ACM Class: H.1.1

  38. arXiv:2405.16845  [pdf, other

    cs.LG cs.CL stat.ML

    On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability

    Authors: Chenyu Zheng, Wei Huang, Rongzhen Wang, Guoqiang Wu, Jun Zhu, Chongxuan Li

    Abstract: Autoregressively trained transformers have brought a profound revolution to the world, especially with their in-context learning (ICL) ability to address downstream tasks. Recently, several studies suggest that transformers learn a mesa-optimizer during autoregressive (AR) pretraining to implement ICL. Namely, the forward pass of the trained transformer is equivalent to optimizing an inner objecti… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 37pages

  39. arXiv:2405.12155  [pdf, other

    cs.IT

    Embracing Radiance Field Rendering in 6G: Over-the-Air Training and Inference with 3D Contents

    Authors: Guanlin Wu, Zhonghao Lyu, Juyong Zhang, Jie Xu

    Abstract: The efficient representation, transmission, and reconstruction of three-dimensional (3D) contents are becoming increasingly important for sixth-generation (6G) networks that aim to merge virtual and physical worlds for offering immersive communication experiences. Neural radiance field (NeRF) and 3D Gaussian splatting (3D-GS) have recently emerged as two promising 3D representation techniques base… ▽ More

    Submitted 18 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: 16 pages,7 figures

  40. arXiv:2405.10832  [pdf, other

    cs.CV

    Open-Vocabulary Spatio-Temporal Action Detection

    Authors: Tao Wu, Shuqiu Ge, Jie Qin, Gangshan Wu, Limin Wang

    Abstract: Spatio-temporal action detection (STAD) is an important fine-grained video understanding task. Current methods require box and label supervision for all action classes in advance. However, in real-world applications, it is very likely to come across new action classes not seen in training because the action category space is large and hard to enumerate. Also, the cost of data annotation and model… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  41. arXiv:2405.10691  [pdf, other

    eess.IV cs.CV

    LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

    Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

    Abstract: The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  42. arXiv:2405.08217  [pdf, other

    cs.LG q-bio.GN q-bio.QM stat.ML

    Data Valuation with Gradient Similarity

    Authors: Nathaniel J. Evans, Gordon B. Mills, Guanming Wu, Xubo Song, Shannon McWeeney

    Abstract: High-quality data is crucial for accurate machine learning and actionable analytics, however, mislabeled or noisy data is a common problem in many domains. Distinguishing low- from high-quality data can be challenging, often requiring expert knowledge and considerable manual intervention. Data Valuation algorithms are a class of methods that seek to quantify the value of each sample in a dataset b… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  43. arXiv:2405.03873  [pdf, other

    cs.AI cs.HC

    Investigating Personalized Driving Behaviors in Dilemma Zones: Analysis and Prediction of Stop-or-Go Decisions

    Authors: Ziye Qin, Siyan Li, Guoyuan Wu, Matthew J. Barth, Amr Abdelraouf, Rohit Gupta, Kyungtae Han

    Abstract: Dilemma zones at signalized intersections present a commonly occurring but unsolved challenge for both drivers and traffic operators. Onsets of the yellow lights prompt varied responses from different drivers: some may brake abruptly, compromising the ride comfort, while others may accelerate, increasing the risk of red-light violations and potential safety hazards. Such diversity in drivers' stop… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  44. arXiv:2405.03103  [pdf, other

    cs.LG cs.CV

    Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs

    Authors: Jordan Dotzel, Yuzong Chen, Bahaa Kotb, Sushma Prasad, Gang Wu, Sheng Li, Mohamed S. Abdelfattah, Zhiru Zhang

    Abstract: The increasing size of large language models (LLMs) traditionally requires low-precision integer formats to meet strict latency and power demands. Yet recently, alternative formats such as Normal Float (NF4) have increased model accuracy at the cost of increased chip area. In this work, we first conduct a large-scale analysis of LLM weights and activations across 30 networks and conclude that most… ▽ More

    Submitted 10 June, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  45. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  46. arXiv:2404.11214  [pdf, other

    cs.CV cs.AI

    Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual Conditions

    Authors: Chuheng Wei, Guoyuan Wu, Matthew J. Barth

    Abstract: A significant challenge in the field of object detection lies in the system's performance under non-ideal imaging conditions, such as rain, fog, low illumination, or raw Bayer images that lack ISP processing. Our study introduces "Feature Corrective Transfer Learning", a novel approach that leverages transfer learning and a bespoke loss function to facilitate the end-to-end detection of objects in… ▽ More

    Submitted 19 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 2024 CVPR UG2+ Workshop

  47. arXiv:2404.11181  [pdf, other

    cs.LG cs.AI cs.RO

    KI-GAN: Knowledge-Informed Generative Adversarial Networks for Enhanced Multi-Vehicle Trajectory Forecasting at Signalized Intersections

    Authors: Chuheng Wei, Guoyuan Wu, Matthew J. Barth, Amr Abdelraouf, Rohit Gupta, Kyungtae Han

    Abstract: Reliable prediction of vehicle trajectories at signalized intersections is crucial to urban traffic management and autonomous driving systems. However, it presents unique challenges, due to the complex roadway layout at intersections, involvement of traffic signal controls, and interactions among different types of road users. To address these issues, we present in this paper a novel model called… ▽ More

    Submitted 19 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 2024 CVPR AICity Workshop

  48. arXiv:2404.09842  [pdf, other

    cs.CV

    STMixer: A One-Stage Sparse Action Detector

    Authors: Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang

    Abstract: Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and the feature sampling is constrained inside the box, failing to effectively leverage richer context inf… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Extended version of the paper arXiv:2303.15879 presented at CVPR 2023. Accepted by TPAMI 2024

  49. arXiv:2404.06692  [pdf, other

    cs.CV

    Perception-Oriented Video Frame Interpolation via Asymmetric Blending

    Authors: Guangyang Wu, Xin Tao, Changlin Li, Wenyi Wang, Xiaohong Liu, Qingqing Zheng

    Abstract: Previous methods for Video Frame Interpolation (VFI) have encountered challenges, notably the manifestation of blur and ghosting effects. These issues can be traced back to two pivotal factors: unavoidable motion errors and misalignment in supervision. In practice, motion estimates often prove to be error-prone, resulting in misaligned features. Furthermore, the reconstruction loss tends to bring… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  50. arXiv:2404.04565  [pdf, other

    cs.CV

    SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

    Authors: Tao Wu, Runyu He, Gangshan Wu, Limin Wang

    Abstract: Video-based visual relation detection tasks, such as video scene graph generation, play important roles in fine-grained video understanding. However, current video visual relation detection datasets have two main limitations that hinder the progress of research in this area. First, they do not explore complex human-human interactions in multi-person scenarios. Second, the relation types of existin… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024