Skip to main content

Showing 1–50 of 901 results for author: Ma, S

  1. arXiv:2410.16144  [pdf, other

    cs.CL

    1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs

    Authors: Jinheng Wang, Hansong Zhou, Ting Song, Shaoguang Mao, Shuming Ma, Hongyu Wang, Yan Xia, Furu Wei

    Abstract: Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1.58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption. These developments also enable local LLM deployment across a broad range of devices. In this work, we introduce bitnet.cpp, a tailored software stack designed to unlock the full potential of 1-bit LLMs. Sp… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  2. arXiv:2410.15506  [pdf, ps, other

    cs.IT cs.DS math.CO

    Improved Explicit Near-Optimal Codes in the High-Noise Regimes

    Authors: Xin Li, Songtao Mao

    Abstract: We study uniquely decodable codes and list decodable codes in the high-noise regime, specifically codes that are uniquely decodable from $\frac{1-\varepsilon}{2}$ fraction of errors and list decodable from $1-\varepsilon$ fraction of errors. We present several improved explicit constructions that achieve near-optimal rates, as well as efficient or even linear-time decoding algorithms. Our contribu… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: 28 pages

  3. arXiv:2410.14979  [pdf, other

    cs.AI cs.CL cs.LG

    Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration

    Authors: Wei Xie, Shuoyoucheng Ma, Zhenhua Wang, Enze Wang, Baosheng Wang, Jinshu Su

    Abstract: Despite their proficiency in math tasks, the mechanisms underlying LLMs' mathematical reasoning abilities remain a subject of debate. Recent studies suggest that chain-of-thought (CoT) prompts can bolster mathematical reasoning by encouraging LLMs to employ human-like logical reasoning (System 2), enabling them to excel on the Cognitive Reflection Test (CRT). To assess whether LLMs genuinely posse… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  4. arXiv:2410.14697  [pdf, other

    q-bio.NC cs.AI eess.SP

    Learning Cortico-Muscular Dependence through Orthonormal Decomposition of Density Ratios

    Authors: Shihan Ma, Bo Hu, Tianyu Jia, Alexander Kenneth Clarke, Blanka Zicher, Arnault H. Caillet, Dario Farina, Jose C. Principe

    Abstract: The cortico-spinal neural pathway is fundamental for motor control and movement execution, and in humans it is typically studied using concurrent electroencephalography (EEG) and electromyography (EMG) recordings. However, current approaches for capturing high-level and contextual connectivity between these recordings have important limitations. Here, we present a novel application of statistical… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  5. arXiv:2410.13743  [pdf, other

    cs.LG

    Single-Timescale Multi-Sequence Stochastic Approximation Without Fixed Point Smoothness: Theories and Applications

    Authors: Yue Huang, Zhaoxian Wu, Shiqian Ma, Qing Ling

    Abstract: Stochastic approximation (SA) that involves multiple coupled sequences, known as multiple-sequence SA (MSSA), finds diverse applications in the fields of signal processing and machine learning. However, existing theoretical understandings {of} MSSA are limited: the multi-timescale analysis implies a slow convergence rate, whereas the single-timescale analysis relies on a stringent fixed point smoo… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  6. arXiv:2410.12265  [pdf, other

    cs.CL

    An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation

    Authors: Junjie Chen, Weihang Su, Zhumin Chu, Haitao Li, Qinyao Ai, Yiqun Liu, Min Zhang, Shaoping Ma

    Abstract: With the rapid development of large language models (LLMs), how to efficiently evaluate them has become an important research question. Existing evaluation methods often suffer from high costs, limited test formats, the need of human references, and systematic evaluation biases. To address these limitations, our study introduces the Auto-PRE, an automatic LLM evaluation framework based on peer rev… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  7. arXiv:2410.12220  [pdf, other

    cs.MM

    Rethinking Bjøntegaard Delta for Compression Efficiency Evaluation: Are We Calculating It Precisely and Reliably?

    Authors: Xinyu Hang, Shenpeng Song, Zhimeng Huang, Chuanmin Jia, Siwei Ma, Wen Gao

    Abstract: For decades, the Bjøntegaard Delta (BD) has been the metric for evaluating codec Rate-Distortion (R-D) performance. Yet, in most studies, BD is determined using just 4-5 R-D data points, could this be sufficient? As codecs and quality metrics advance, does the conventional BD estimation still hold up? Crucially, are the performance improvements of new codecs and tools genuine, or merely artifacts… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  8. arXiv:2410.12010  [pdf, other

    cs.LG cs.AI cs.CL

    Bias Similarity Across Large Language Models

    Authors: Hyejun Jeong, Shiqing Ma, Amir Houmansadr

    Abstract: Bias in machine learning models has been a chronic problem, especially as these models influence decision-making in human society. In generative AI, such as Large Language Models, the impact of bias is even more profound compared to the classification models. LLMs produce realistic and human-like content that users may unconsciously trust, which could perpetuate harmful stereotypes to the uncontro… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: under review

  9. arXiv:2410.11005  [pdf, other

    cs.CL cs.LG

    One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks

    Authors: Fangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Jing Yao, Si-Qing Chen, Michael Wooldridge, Furu Wei

    Abstract: Language is not monolithic. While many benchmarks are used as proxies to systematically estimate Large Language Models' (LLM) performance in real-life tasks, they tend to ignore the nuances of within-language variation and thus fail to model the experience of speakers of minority dialects. Focusing on African American Vernacular English (AAVE), we present the first study on LLMs' fairness and robu… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  10. arXiv:2410.10394  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation

    Authors: Kaidong Zhang, Pengzhen Ren, Bingqian Lin, Junfan Lin, Shikui Ma, Hang Xu, Xiaodan Liang

    Abstract: Language-guided robotic manipulation is a challenging task that requires an embodied agent to follow abstract user instructions to accomplish various complex manipulation tasks. Previous work trivially fitting the data without revealing the relation between instruction and low-level executable actions, these models are prone to memorizing the surficial pattern of the data instead of acquiring the… ▽ More

    Submitted 16 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: Accepted to NeurIPS 2024

  11. arXiv:2410.07588  [pdf, other

    cs.CR cs.CY

    Careful About What App Promotion Ads Recommend! Detecting and Explaining Malware Promotion via App Promotion Graph

    Authors: Shang Ma, Chaoran Chen, Shao Yang, Shifu Hou, Toby Jia-Jun Li, Xusheng Xiao, Tao Xie, Yanfang Ye

    Abstract: In Android apps, their developers frequently place app promotion ads, namely advertisements to promote other apps. Unfortunately, the inadequate vetting of ad content allows malicious developers to exploit app promotion ads as a new distribution channel for malware. To help detect malware distributed via app promotion ads, in this paper, we propose a novel approach, named ADGPE, that synergistical… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: NDSS Symposium 2025 Accepted Papers

  12. arXiv:2410.06535  [pdf, other

    cs.CV

    Happy: A Debiased Learning Framework for Continual Generalized Category Discovery

    Authors: Shijie Ma, Fei Zhu, Zhun Zhong, Wenzhuo Liu, Xu-Yao Zhang, Cheng-Lin Liu

    Abstract: Constantly discovering novel concepts is crucial in evolving environments. This paper explores the underexplored task of Continual Generalized Category Discovery (C-GCD), which aims to incrementally discover new classes from unlabeled data while maintaining the ability to recognize previously learned classes. Although several settings are proposed to study the C-GCD task, they have limitations tha… ▽ More

    Submitted 9 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: Accepted at NeurIPS 2024

  13. TouchInsight: Uncertainty-aware Rapid Touch and Text Input for Mixed Reality from Egocentric Vision

    Authors: Paul Streli, Mark Richardson, Fadi Botros, Shugao Ma, Robert Wang, Christian Holz

    Abstract: While passive surfaces offer numerous benefits for interaction in mixed reality, reliably detecting touch input solely from head-mounted cameras has been a long-standing challenge. Camera specifics, hand self-occlusion, and rapid movements of both head and fingers introduce considerable uncertainty about the exact location of touch events. Existing methods have thus not been capable of achieving t… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology (UIST'24)

    ACM Class: I.4; I.5; H.5

  14. arXiv:2410.05762  [pdf

    cs.CV

    Guided Self-attention: Find the Generalized Necessarily Distinct Vectors for Grain Size Grading

    Authors: Fang Gao, Xuetao Li, Jiabao Wang, Shengheng Ma, Jun Yu

    Abstract: With the development of steel materials, metallographic analysis has become increasingly important. Unfortunately, grain size analysis is a manual process that requires experts to evaluate metallographic photographs, which is unreliable and time-consuming. To resolve this problem, we propose a novel classifi-cation method based on deep learning, namely GSNets, a family of hybrid models which can e… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  15. arXiv:2410.05249  [pdf, other

    cs.CV

    LoTLIP: Improving Language-Image Pre-training for Long Text Understanding

    Authors: Wei Wu, Kecheng Zheng, Shuailei Ma, Fan Lu, Yuxin Guo, Yifei Zhang, Wei Chen, Qingpei Guo, Yujun Shen, Zheng-Jun Zha

    Abstract: Understanding long text is of great demands in practice but beyond the reach of most language-image pre-training (LIP) models. In this work, we empirically confirm that the key reason causing such an issue is that the training images are usually paired with short captions, leaving certain tokens easily overshadowed by salient tokens. Towards this problem, our initial attempt is to relabel the data… ▽ More

    Submitted 20 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

  16. arXiv:2410.05140  [pdf, other

    cs.LG stat.ML

    Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis

    Authors: Yifan Yang, Hao Ban, Minhui Huang, Shiqian Ma, Kaiyi Ji

    Abstract: Bilevel optimization has recently attracted considerable attention due to its abundant applications in machine learning problems. However, existing methods rely on prior knowledge of problem parameters to determine stepsizes, resulting in significant effort in tuning stepsizes when these parameters are unknown. In this paper, we propose two novel tuning-free algorithms, D-TFBO and S-TFBO. D-TFBO e… ▽ More

    Submitted 8 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

  17. arXiv:2410.01920  [pdf, other

    cs.LG

    Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

    Authors: Shengyu Feng, Xiang Kong, Shuang Ma, Aonan Zhang, Dong Yin, Chong Wang, Ruoming Pang, Yiming Yang

    Abstract: Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effe… ▽ More

    Submitted 9 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

  18. arXiv:2410.01296  [pdf, other

    cs.LG cs.AI

    Speculative Coreset Selection for Task-Specific Fine-tuning

    Authors: Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen, Tianlin Li, Weipeng Jiang, Yang Liu

    Abstract: Task-specific fine-tuning is essential for the deployment of large language models (LLMs), but it requires significant computational resources and time. Existing solutions have proposed coreset selection methods to improve data efficiency and reduce model training overhead, but they still have limitations: 1) Overlooking valuable samples at high pruning rates, which degrades the coreset's performa… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: 20 pages, 4 figures, 14 tables

  19. arXiv:2410.00289  [pdf, other

    cs.CV cs.MM cs.SI

    Delving Deep into Engagement Prediction of Short Videos

    Authors: Dasong Li, Wenjie Li, Baili Lu, Hongsheng Li, Sizhuo Ma, Gurunandan Krishnan, Jian Wang

    Abstract: Understanding and modeling the popularity of User Generated Content (UGC) short videos on social media platforms presents a critical challenge with broad implications for content creators and recommendation systems. This study delves deep into the intricacies of predicting engagement for newly published videos with limited user interactions. Surprisingly, our findings reveal that Mean Opinion Scor… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: Accepted to ECCV 2024. Project page: https://github.com/dasongli1/SnapUGC_Engagement

    Journal ref: European conference on computer vision 2024

  20. arXiv:2409.19691  [pdf, other

    cs.CL

    CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays

    Authors: Nuowei Liu, Xinhao Chen, Hongyi Wu, Changzhi Sun, Man Lan, Yuanbin Wu, Xiaopeng Bai, Shaoguang Mao, Yan Xia

    Abstract: Existing rhetorical understanding and generation datasets or corpora primarily focus on single coarse-grained categories or fine-grained categories, neglecting the common interrelations between different rhetorical devices by treating them as independent sub-tasks. In this paper, we propose the Chinese Essay Rhetoric Dataset (CERD), consisting of 4 commonly used coarse-grained categories including… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  21. arXiv:2409.17870  [pdf, other

    cs.LG cs.AI cs.AR

    Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores

    Authors: Shaobo Ma, Chao Fang, Haikuo Shao, Zhongfeng Wang

    Abstract: Large language models (LLMs) have been widely applied but face challenges in efficient inference. While quantization methods reduce computational demands, ultra-low bit quantization with arbitrary precision is hindered by limited GPU Tensor Core support and inefficient memory management, leading to suboptimal acceleration. To address these challenges, we propose a comprehensive acceleration scheme… ▽ More

    Submitted 17 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: This paper is accepted by ASP-DAC 2025

  22. arXiv:2409.17610  [pdf, other

    cs.CL cs.CV

    ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue

    Authors: Zhangpu Li, Changhong Zou, Suxue Ma, Zhicheng Yang, Chen Du, Youbao Tang, Zhenjie Cao, Ning Zhang, Jui-Hsin Lai, Ruei-Sung Lin, Yuan Ni, Xingzhi Sun, Jing Xiao, Kai Zhang, Mei Han

    Abstract: The rocketing prosperity of large language models (LLMs) in recent years has boosted the prevalence of vision-language models (VLMs) in the medical sector. In our online medical consultation scenario, a doctor responds to the texts and images provided by a patient in multiple rounds to diagnose her/his health condition, forming a multi-turn multimodal medical dialogue format. Unlike high-quality i… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  23. arXiv:2409.17228  [pdf, other

    astro-ph.EP cs.AI cs.LG

    Disk2Planet: A Robust and Automated Machine Learning Tool for Parameter Inference in Disk-Planet Systems

    Authors: Shunyuan Mao, Ruobing Dong, Kwang Moo Yi, Lu Lu, Sifan Wang, Paris Perdikaris

    Abstract: We introduce Disk2Planet, a machine learning-based tool to infer key parameters in disk-planet systems from observed protoplanetary disk structures. Disk2Planet takes as input the disk structures in the form of two-dimensional density and velocity maps, and outputs disk and planet properties, that is, the Shakura--Sunyaev viscosity, the disk aspect ratio, the planet--star mass ratio, and the plane… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: Accepted to ApJ

  24. arXiv:2409.16914  [pdf, other

    cs.CL

    Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness

    Authors: Shixuan Ma, Quan Wang

    Abstract: The increasing capability and widespread usage of large language models (LLMs) highlight the desirability of automatic detection of LLM-generated text. Zero-shot detectors, due to their training-free nature, have received considerable attention and notable success. In this paper, we identify a new feature, token cohesiveness, that is useful for zero-shot detection, and we demonstrate that LLM-gene… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: To appear at the main conference of EMNLP 2024

  25. arXiv:2409.16385  [pdf, other

    cs.RO

    Embedded IPC: Fast and Intersection-free Simulation in Reduced Subspace for Robot Manipulation

    Authors: Wenxin Du, Chang Yu, Siyu Ma, Ying Jiang, Zeshun Zong, Yin Yang, Joe Masterjohn, Alejandro Castro, Xuchen Han, Chenfanfu Jiang

    Abstract: Physics-based simulation is essential for developing and evaluating robot manipulation policies, particularly in scenarios involving deformable objects and complex contact interactions. However, existing simulators often struggle to balance computational efficiency with numerical accuracy, especially when modeling deformable materials with frictional contact constraints. We introduce an efficient… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  26. arXiv:2409.14200  [pdf, other

    cs.CL cs.CR cs.LG

    Data-centric NLP Backdoor Defense from the Lens of Memorization

    Authors: Zhenting Wang, Zhizhi Wang, Mingyu Jin, Mengnan Du, Juan Zhai, Shiqing Ma

    Abstract: Backdoor attack is a severe threat to the trustworthiness of DNN-based language models. In this paper, we first extend the definition of memorization of language models from sample-wise to more fine-grained sentence element-wise (e.g., word, phrase, structure, and style), and then point out that language model backdoors are a type of element-wise memorization. Through further analysis, we find tha… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

  27. arXiv:2409.13398  [pdf

    cs.IT eess.SP

    Unsourced Sparse Multiple Access foUnsourced Sparse Multiple Access for 6G Massive Communicationr 6G Massive Communication

    Authors: Yifei Yuan, Yuhong Huang, Chunlin Yan, Sen Wang, Shuai Ma, Xiaodong Shen

    Abstract: Massive communication is one of key scenarios of 6G where two magnitude higher connection density would be required to serve diverse services. As a promising direction, unsourced multiple access has been proved to outperform significantly over orthogonal multiple access (OMA) or slotted-ALOHA in massive connections. In this paper we describe a design framework of unsourced sparse multiple access (… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: 7 pages, 5 figures and 1 table

  28. arXiv:2409.12741  [pdf

    cs.CL cs.AI

    Fine Tuning Large Language Models for Medicine: The Role and Importance of Direct Preference Optimization

    Authors: Thomas Savage, Stephen Ma, Abdessalem Boukil, Vishwesh Patel, Ekanath Rangan, Ivan Rodriguez, Jonathan H Chen

    Abstract: Large Language Model (LLM) fine tuning is underutilized in the field of medicine. Two of the most common methods of fine tuning are Supervised Fine Tuning (SFT) and Direct Preference Optimization (DPO), but there is little guidance informing users when to use either technique. In this investigation, we compare the performance of SFT and DPO for five common natural language tasks in medicine: Class… ▽ More

    Submitted 20 September, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

  29. arXiv:2409.11682  [pdf, other

    cs.CV

    SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation

    Authors: Mingze Sun, Chen Guo, Puhua Jiang, Shiwei Mao, Yurun Chen, Ruqi Huang

    Abstract: In this paper, we propose SRIF, a novel Semantic shape Registration framework based on diffusion-based Image morphing and Flow estimation. More concretely, given a pair of extrinsically aligned shapes, we first render them from multi-views, and then utilize an image interpolation framework based on diffusion models to generate sequences of intermediate images between them. The images are later fed… ▽ More

    Submitted 3 October, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: Accepted as a conference paper of SIGGRAPH Asia 2024

  30. arXiv:2409.10579  [pdf, other

    q-bio.QM cs.AI cs.LG

    Recent advances in deep learning and language models for studying the microbiome

    Authors: Binghao Yan, Yunbi Nam, Lingyao Li, Rebecca A. Deek, Hongzhe Li, Siyuan Ma

    Abstract: Recent advancements in deep learning, particularly large language models (LLMs), made a significant impact on how researchers study microbiome and metagenomics data. Microbial protein and genomic sequences, like natural languages, form a language of life, enabling the adoption of LLMs to extract useful insights from complex microbial ecologies. In this paper, we review applications of deep learnin… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

  31. arXiv:2409.10127  [pdf, ps, other

    cs.IT eess.SP

    Joint Beamforming and Illumination Pattern Design for Beam-Hopping LEO Satellite Communications

    Authors: Jing Wang, Chenhao Qi, Shui Yu, Shiwen Mao

    Abstract: Since hybrid beamforming (HBF) can approach the performance of fully-digital beamforming (FDBF) with much lower hardware complexity, we investigate the HBF design for beam-hopping (BH) low earth orbit (LEO) satellite communications (SatComs). Aiming at maximizing the sum-rate of totally illuminated beam positions during the whole BH period, we consider joint beamforming and illumination pattern de… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

  32. arXiv:2409.08459  [pdf, other

    cs.SI

    Toward satisfactory public accessibility: A crowdsourcing approach through online reviews to inclusive urban design

    Authors: Lingyao Li, Songhua Hu, Yinpei Dai, Min Deng, Parisa Momeni, Gabriel Laverghetta, Lizhou Fan, Zihui Ma, Xi Wang, Siyuan Ma, Jay Ligatti, Libby Hemphill

    Abstract: As urban populations grow, the need for accessible urban design has become urgent. Traditional survey methods for assessing public perceptions of accessibility are often limited in scope. Crowdsourcing via online reviews offers a valuable alternative to understanding public perceptions, and advancements in large language models can facilitate their use. This study uses Google Maps reviews across t… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  33. arXiv:2409.06946  [pdf, other

    cs.IT eess.SP

    Refracting Reconfigurable Intelligent Surface Assisted URLLC for Millimeter Wave High-Speed Train Communication Coverage Enhancement

    Authors: Changzhu Liu, Ruisi He, Yong Niu, Shiwen Mao, Bo Ai, Ruifeng Chen

    Abstract: High-speed train (HST) has garnered significant attention from both academia and industry due to the rapid development of railways worldwide. Millimeter wave (mmWave) communication, known for its large bandwidth is an effective way to address performance bottlenecks in cellular network based HST wireless communication systems. However, mmWave signals suffer from significant path loss when traversi… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 11 figures, accepted by IEEE Transactions on Vehicular Technology

  34. arXiv:2409.00956  [pdf

    eess.IV cs.CV

    Physics-Informed Neural Network Based Digital Image Correlation Method

    Authors: Boda Li, Shichao Zhou, Qinwei Ma, Shaopeng Ma

    Abstract: Digital Image Correlation (DIC) is a key technique in experimental mechanics for full-field deformation measurement, traditionally relying on subset matching to determine displacement fields. However, selecting optimal parameters like shape functions and subset size can be challenging in non-uniform deformation scenarios. Recent deep learning-based DIC approaches, both supervised and unsupervised,… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  35. arXiv:2408.15245  [pdf, other

    cs.CV cs.AI

    An Edge AI System Based on FPGA Platform for Railway Fault Detection

    Authors: Jiale Li, Yulin Fu, Dongwei Yan, Sean Longyu Ma, Chiu-Wing Sham

    Abstract: As the demands for railway transportation safety increase, traditional methods of rail track inspection no longer meet the needs of modern railway systems. To address the issues of automation and efficiency in rail fault detection, this study introduces a railway inspection system based on Field Programmable Gate Array (FPGA). This edge AI system collects track images via cameras and uses Convolut… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted at the 2024 IEEE 13th Global Conference on Consumer Electronics (GCCE 2024)

  36. arXiv:2408.14478  [pdf, other

    q-bio.NC cs.AI cs.CY cs.IT

    Uncertainty Quantification in Alzheimer's Disease Progression Modeling

    Authors: Wael Mobeirek, Shirley Mao

    Abstract: With the increasing number of patients diagnosed with Alzheimer's Disease, prognosis models have the potential to aid in early disease detection. However, current approaches raise dependability concerns as they do not account for uncertainty. In this work, we compare the performance of Monte Carlo Dropout, Variational Inference, Markov Chain Monte Carlo, and Ensemble Learning trained on 512 patien… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: This work was done as part of degree requirements for the authors in 2021-2022

  37. arXiv:2408.13960  [pdf, other

    cs.LG cs.AI cs.CY

    Time Series Analysis for Education: Methods, Applications, and Future Directions

    Authors: Shengzhong Mao, Chaoli Zhang, Yichi Song, Jindong Wang, Xiao-Jun Zeng, Zenglin Xu, Qingsong Wen

    Abstract: Recent advancements in the collection and analysis of sequential educational data have brought time series analysis to a pivotal position in educational research, highlighting its essential role in facilitating data-driven decision-making. However, there is a lack of comprehensive summaries that consolidate these advancements. To the best of our knowledge, this paper is the first to provide a comp… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 24 pages, 3 figures, 6 tables, project page: see https://github.com/ai-for-edu/time-series-analysis-for-education

  38. arXiv:2408.13759  [pdf, other

    cs.RO

    MASQ: Multi-Agent Reinforcement Learning for Single Quadruped Robot Locomotion

    Authors: Qi Liu, Jingxiang Guo, Sixu Lin, Shuaikang Ma, Jinxuan Zhu, Yanjie Li

    Abstract: This paper proposes a novel method to improve locomotion learning for a single quadruped robot using multi-agent deep reinforcement learning (MARL). Many existing methods use single-agent reinforcement learning for an individual robot or MARL for the cooperative task in multi-robot systems. Unlike existing methods, this paper proposes using MARL for the locomotion learning of a single quadruped ro… ▽ More

    Submitted 17 October, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

  39. arXiv:2408.11313  [pdf, other

    cs.AI

    Unlocking Adversarial Suffix Optimization Without Affirmative Phrases: Efficient Black-box Jailbreaking via LLM as Optimizer

    Authors: Weipeng Jiang, Zhenting Wang, Juan Zhai, Shiqing Ma, Zhengyu Zhao, Chao Shen

    Abstract: Despite prior safety alignment efforts, mainstream LLMs can still generate harmful and unethical content when subjected to jailbreaking attacks. Existing jailbreaking methods fall into two main categories: template-based and optimization-based methods. The former requires significant manual effort and domain knowledge, while the latter, exemplified by Greedy Coordinate Gradient (GCG), which seeks… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  40. arXiv:2408.08977  [pdf, other

    cs.DC

    FedFQ: Federated Learning with Fine-Grained Quantization

    Authors: Haowei Li, Weiying Xie, Hangyu Ye, Jitao Ma, Shuran Ma, Yunsong Li

    Abstract: Federated learning (FL) is a decentralized approach, enabling multiple participants to collaboratively train a model while ensuring the protection of data privacy. The transmission of updates from numerous edge clusters to the server creates a significant communication bottleneck in FL. Quantization is an effective compression technology, showcasing immense potential in addressing this bottleneck… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  41. arXiv:2408.08862  [pdf, other

    cs.LG

    Visual Agents as Fast and Slow Thinkers

    Authors: Guangyan Sun, Mingyu Jin, Zhenting Wang, Cheng-Long Wang, Siqi Ma, Qifan Wang, Ying Nian Wu, Yongfeng Zhang, Dongfang Liu

    Abstract: Achieving human-level intelligence requires refining cognitive distinctions between System 1 and System 2 thinking. While contemporary AI, driven by large language models, demonstrates human-like traits, it falls short of genuine cognition. Transitioning from structured benchmarks to real-world scenarios presents challenges for visual agents, often leading to inaccurate and overly confident respon… ▽ More

    Submitted 6 September, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  42. arXiv:2408.08765  [pdf, other

    cs.NI

    Rethinking Generative Semantic Communication for Multi-User Systems with Multi-Modal LLM

    Authors: Wanting Yang, Zehui Xiong, Shiwen Mao, Tony Q. S. Quek, Ping Zhang, Merouane Debbah, Rahim Tafazolli

    Abstract: The surge in connected devices in 6G with typical complex tasks requiring multi-user cooperation, such as smart agriculture and smart cities, poses significant challenges to unsustainable traditional communication. Fortunately, the booming artificial intelligence technology and the growing computational power of devices offer a promising 6G enabler: semantic communication (SemCom). However, existi… ▽ More

    Submitted 18 October, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  43. arXiv:2408.06648  [pdf, other

    cs.RO

    A Miniature Vision-Based Localization System for Indoor Blimps

    Authors: Shicong Ma

    Abstract: With increasing attention paid to blimp research, I hope to build an indoor blimp to interact with humans. To begin with, I propose developing a visual localization system to enable blimps to localize themselves in an indoor environment autonomously. This system initially reconstructs an indoor environment by employing Structure from Motion with Superpoint visual features. Next, with the previousl… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  44. Towards Effective and Interpretable Semantic Communications

    Authors: Youlong Wu, Yuanmin Shi, Shuai Ma, Chunxiao Jiang, Wei Zhang, Khaled B. Letaief

    Abstract: With the exponential surge in traffic data and the pressing need for ultra-low latency in emerging intelligence applications, it is envisioned that 6G networks will demand disruptive communication technologies to foster ubiquitous intelligence and succinctness within the human society. Semantic communication, a novel paradigm, holds the promise of significantly curtailing communication overhead an… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by IEEE Network Magazine

  45. arXiv:2408.04682  [pdf, other

    cs.CL cs.AI cs.LG

    ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

    Authors: Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang

    Abstract: Recent large language models (LLMs) advancements sparked a growing research interest in tool assisted LLMs solving real-world challenges, which calls for comprehensive evaluation of tool-use capabilities. While previous works focused on either evaluating over stateless web services (RESTful API), based on a single turn user prompt, or an off-policy dialog trajectory, ToolSandbox includes stateful… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  46. arXiv:2408.02103  [pdf, other

    cs.CL

    Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process

    Authors: Peng Wang, Xiaobin Wang, Chao Lou, Shengyu Mao, Pengjun Xie, Yong Jiang

    Abstract: In-context learning (ICL) is a few-shot learning paradigm that involves learning mappings through input-output pairs and appropriately applying them to new instances. Despite the remarkable ICL capabilities demonstrated by Large Language Models (LLMs), existing works are highly dependent on large-scale labeled support sets, not always feasible in practical scenarios. To refine this approach, we fo… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  47. arXiv:2408.01946  [pdf, other

    cs.CV

    Masked Angle-Aware Autoencoder for Remote Sensing Images

    Authors: Zhihao Li, Biao Hou, Siteng Ma, Zitong Wu, Xianpeng Guo, Bo Ren, Licheng Jiao

    Abstract: To overcome the inherent domain gap between remote sensing (RS) images and natural images, some self-supervised representation learning methods have made promising progress. However, they have overlooked the diverse angles present in RS objects. This paper proposes the Masked Angle-Aware Autoencoder (MA3E) to perceive and learn angles during pre-training. We design a \textit{scaling center crop} o… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by ECCV 2024

  48. arXiv:2408.01173  [pdf, other

    cs.NI cs.LG

    Sustainable Diffusion-based Incentive Mechanism for Generative AI-driven Digital Twins in Industrial Cyber-Physical Systems

    Authors: Jinbo Wen, Jiawen Kang, Dusit Niyato, Yang Zhang, Shiwen Mao

    Abstract: Industrial Cyber-Physical Systems (ICPSs) are an integral component of modern manufacturing and industries. By digitizing data throughout the product life cycle, Digital Twins (DTs) in ICPSs enable a shift from current industrial infrastructures to intelligent and adaptive infrastructures. Thanks to data process capability, Generative Artificial Intelligence (GAI) can drive the construction and up… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  49. arXiv:2408.01090  [pdf, other

    cs.CL cs.AR cs.NE

    General-purpose Dataflow Model with Neuromorphic Primitives

    Authors: Weihao Zhang, Yu Du, Hongyi Li, Songchen Ma, Rong Zhao

    Abstract: Neuromorphic computing exhibits great potential to provide high-performance benefits in various applications beyond neural networks. However, a general-purpose program execution model that aligns with the features of neuromorphic computing is required to bridge the gap between program versatility and neuromorphic hardware efficiency. The dataflow model offers a potential solution, but it faces hig… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  50. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.