Skip to main content

Showing 1–50 of 519 results for author: He, S

  1. arXiv:2410.14101  [pdf, other

    cs.SD cs.AI eess.AS

    Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech

    Authors: Shuwei He, Rui Liu, Haizhou Li

    Abstract: Visual Text-to-Speech (VTTS) aims to take the spatial environmental image as the prompt to synthesize the reverberation speech for the spoken content. Previous research focused on the RGB modality for global environmental modeling, overlooking the potential of multi-source spatial knowledge like depth, speaker position, and environmental semantics. To address the issues, we propose a novel multi-s… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 5 pages, 1 figure

  2. arXiv:2410.13618  [pdf, other

    cs.CV

    LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning

    Authors: Yiming Shi, Jiwei Wei, Yujia Wu, Ran Ran, Chengwei Sun, Shiyuan He, Yang Yang

    Abstract: The rapid growth of model scale has necessitated substantial computational resources for fine-tuning. Existing approach such as Low-Rank Adaptation (LoRA) has sought to address the problem of handling the large updated parameters in full fine-tuning. However, LoRA utilize random initialization and optimization of low-rank matrices to approximate updated weights, which can result in suboptimal conv… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 13 pages, 7 figures

  3. arXiv:2410.13184  [pdf, other

    cs.CL

    Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

    Authors: Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, Ang Li, Dong Yu

    Abstract: Traditional transformer models often allocate a fixed amount of computational resources to every input token, leading to inefficient and unnecessary computation. To address this, the Mixture of Depths (MoD) was introduced to dynamically adjust the computational depth by skipping less important layers. Despite its promise, current MoD approaches remain under-explored and face two main challenges: (… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  4. arXiv:2410.12053  [pdf, other

    cs.CV

    SOE: SO(3)-Equivariant 3D MRI Encoding

    Authors: Shizhe He, Magdalini Paschali, Jiahong Ouyang, Adnan Masood, Akshay Chaudhari, Ehsan Adeli

    Abstract: Representation learning has become increasingly important, especially as powerful models have shifted towards learning latent representations before fine-tuning for downstream tasks. This approach is particularly valuable in leveraging the structural information within brain anatomy. However, a common limitation of recent models developed for MRIs is their tendency to ignore or remove geometric in… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Journal ref: International Workshop on Machine Learning in Clinical Neuroimaging (MLCN) 2024

  5. arXiv:2410.10696  [pdf, other

    cs.CV cs.GR

    TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model

    Authors: Jiazhi Guan, Quanwei Yang, Kaisiyuan Wang, Hang Zhou, Shengyi He, Zhiliang Xu, Haocheng Feng, Errui Ding, Jingdong Wang, Hongtao Xie, Youjian Zhao, Ziwei Liu

    Abstract: Recently, 2D speaking avatars have increasingly participated in everyday scenarios due to the fast development of facial animation techniques. However, most existing works neglect the explicit control of human bodies. In this paper, we propose to drive not only the faces but also the torso and gesture movements of a speaking figure. Inspired by recent advances in diffusion models, we propose the M… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: Accepted to SIGGRAPH Asia 2024 (conference track). Project page: https://guanjz20.github.io/projects/TALK-Act

  6. arXiv:2410.07331  [pdf, other

    cs.CL cs.AI

    DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models

    Authors: Yiming Huang, Jianwen Luo, Yan Yu, Yitong Zhang, Fangyu Lei, Yifan Wei, Shizhu He, Lifu Huang, Xiao Liu, Jun Zhao, Kang Liu

    Abstract: We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks and demanding advanced coding skills in grounding and planning. Second, examples in DA-Code are all based on real a… ▽ More

    Submitted 10 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: EMNLP 2024

  7. arXiv:2410.06802  [pdf, other

    cs.CL

    Seg2Act: Global Context-aware Action Generation for Document Logical Structuring

    Authors: Zichao Li, Shaojie He, Meng Liao, Xuanang Chen, Yaojie Lu, Hongyu Lin, Yanxiong Lu, Xianpei Han, Le Sun

    Abstract: Document logical structuring aims to extract the underlying hierarchical structure of documents, which is crucial for document intelligence. Traditional approaches often fall short in handling the complexity and the variability of lengthy documents. To address these issues, we introduce Seg2Act, an end-to-end, generation-based method for document logical structuring, revisiting logical structure e… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: Accepted by EMNLP 2024 Main Conference

  8. arXiv:2410.05292  [pdf, other

    cs.LG cs.AI q-bio.QM

    CaLMFlow: Volterra Flow Matching using Causal Language Models

    Authors: Sizhuang He, Daniel Levine, Ivan Vrkic, Marco Francesco Bressana, David Zhang, Syed Asad Rizvi, Yangtian Zhang, Emanuele Zappala, David van Dijk

    Abstract: We introduce CaLMFlow (Causal Language Models for Flow Matching), a novel framework that casts flow matching as a Volterra integral equation (VIE), leveraging the power of large language models (LLMs) for continuous data generation. CaLMFlow enables the direct application of LLMs to learn complex flows by formulating flow matching as a sequence modeling task, bridging discrete language modeling an… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: 10 pages, 9 figures, 7 tables

  9. arXiv:2410.02536  [pdf, other

    cs.AI cs.NE

    Intelligence at the Edge of Chaos

    Authors: Shiyang Zhang, Aakash Patel, Syed A Rizvi, Nianchen Liu, Sizhuang He, Amin Karbasi, Emanuele Zappala, David van Dijk

    Abstract: We explore the emergence of intelligent behavior in artificial systems by investigating how the complexity of rule-based systems influences the capabilities of models trained to predict these rules. Our study focuses on elementary cellular automata (ECA), simple yet powerful one-dimensional systems that generate behaviors ranging from trivial to highly complex. By training distinct Large Language… ▽ More

    Submitted 8 October, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: 15 pages,8 Figures

  10. arXiv:2410.00461  [pdf, other

    cs.LG

    Enhancing Solution Efficiency in Reinforcement Learning: Leveraging Sub-GFlowNet and Entropy Integration

    Authors: Siyi He

    Abstract: Traditional reinforcement learning often struggles to generate diverse, high-reward solutions, especially in domains like drug design and black-box function optimization. Markov Chain Monte Carlo (MCMC) methods provide an alternative method of RL in candidate selection but suffer from high computational costs and limited candidate diversity exploration capabilities. In response, GFlowNet, a novel… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  11. arXiv:2410.00320  [pdf, other

    cs.CV cs.CL

    PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection

    Authors: Qihang Zhou, Jiangtao Yan, Shibo He, Wenchao Meng, Jiming Chen

    Abstract: Zero-shot (ZS) 3D anomaly detection is a crucial yet unexplored field that addresses scenarios where target 3D training samples are unavailable due to practical concerns like privacy protection. This paper introduces PointAD, a novel approach that transfers the strong generalization capabilities of CLIP for recognizing 3D anomalies on unseen objects. PointAD provides a unified framework to compreh… ▽ More

    Submitted 11 October, 2024; v1 submitted 30 September, 2024; originally announced October 2024.

    Comments: NeurIPS 2024

  12. arXiv:2409.18386  [pdf, other

    cs.DB

    ChARLES: Change-Aware Recovery of Latent Evolution Semantics in Relational Data

    Authors: Shiyi He, Alexandra Meliou, Anna Fariha

    Abstract: Data-driven decision-making is at the core of many modern applications, and understanding the data is critical in supporting trust in these decisions. However, data is dynamic and evolving, just like the real-world entities it represents. Thus, an important component of understanding data is analyzing and drawing insights from the changes it undergoes. Existing methods for exploring data change li… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  13. arXiv:2409.17517  [pdf, other

    cs.LG cs.AI

    Dataset Distillation-based Hybrid Federated Learning on Non-IID Data

    Authors: Xiufang Shi, Wei Zhang, Mincheng Wu, Guangyi Liu, Zhenyu Wen, Shibo He, Tejal Shah, Rajiv Ranjan

    Abstract: In federated learning, the heterogeneity of client data has a great impact on the performance of model training. Many heterogeneity issues in this process are raised by non-independently and identically distributed (Non-IID) data. This study focuses on the issue of label distribution skew. To address it, we propose a hybrid federated learning framework called HFLDD, which integrates dataset distil… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  14. Fast Extrinsic Calibration for Multiple Inertial Measurement Units in Visual-Inertial System

    Authors: Youwei Yu, Yanqing Liu, Fengjie Fu, Sihan He, Dongchen Zhu, Lei Wang, Xiaolin Zhang, Jiamao Li

    Abstract: In this paper, we propose a fast extrinsic calibration method for fusing multiple inertial measurement units (MIMU) to improve visual-inertial odometry (VIO) localization accuracy. Currently, data fusion algorithms for MIMU highly depend on the number of inertial sensors. Based on the assumption that extrinsic parameters between inertial sensors are perfectly calibrated, the fusion algorithm provi… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  15. arXiv:2409.13203  [pdf, other

    cs.CL

    Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks

    Authors: Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Kang Liu, Jun Zhao

    Abstract: In this paper, we propose $\textbf{Ne}$ural-$\textbf{Sy}$mbolic $\textbf{C}$ollaborative $\textbf{D}$istillation ($\textbf{NesyCD}$), a novel knowledge distillation method for learning the complex reasoning abilities of Large Language Models (LLMs, e.g., \textgreater 13B). We argue that complex reasoning tasks are difficult for Small Language Models (SLMs, e.g., $\leq$ 7B), as these tasks demand n… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  16. arXiv:2409.13183  [pdf, other

    cs.CL

    $\textit{SKIntern}$: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models

    Authors: Huanxuan Liao, Shizhu He, Yupu Hao, Xiang Li, Yuanzhe Zhang, Kang Liu, Jun Zhao

    Abstract: Small Language Models (SLMs) are attracting attention due to the high computational demands and privacy concerns of Large Language Models (LLMs). Some studies fine-tune SLMs using Chains of Thought (CoT) data distilled from LLMs, aiming to enhance their reasoning ability. Furthermore, Some CoT distillation methods introduce external symbolic knowledge into the generation process to improve the lim… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  17. arXiv:2409.11585  [pdf, other

    cs.LG cs.CR cs.DC

    Advances in APPFL: A Comprehensive and Extensible Federated Learning Framework

    Authors: Zilinghan Li, Shilan He, Ze Yang, Minseok Ryu, Kibaek Kim, Ravi Madduri

    Abstract: Federated learning (FL) is a distributed machine learning paradigm enabling collaborative model training while preserving data privacy. In today's landscape, where most data is proprietary, confidential, and distributed, FL has become a promising approach to leverage such data effectively, particularly in sensitive domains such as medicine and the electric grid. Heterogeneity and security are the… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

  18. arXiv:2409.05926  [pdf, other

    cs.LG cs.CL

    SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

    Authors: Chengwei Sun, Jiwei Wei, Yujia Wu, Yiming Shi, Shiyuan He, Zeyu Ma, Ning Xie, Yang Yang

    Abstract: Large pre-trained models (LPMs) have demonstrated exceptional performance in diverse natural language processing and computer vision tasks. However, fully fine-tuning these models poses substantial memory challenges, particularly in resource-constrained environments. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, mitigate this issue by adjusting only a small subset of parameters. Ne… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  19. arXiv:2409.01522  [pdf, other

    cs.CV

    Lagrangian Motion Fields for Long-term Motion Generation

    Authors: Yifei Yang, Zikai Huang, Chenshu Xu, Shengfeng He

    Abstract: Long-term motion generation is a challenging task that requires producing coherent and realistic sequences over extended durations. Current methods primarily rely on framewise motion representations, which capture only static spatial details and overlook temporal dynamics. This approach leads to significant redundancy across the temporal dimension, complicating the generation of effective long-ter… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 13 pages, 9 figures

  20. arXiv:2409.00657  [pdf, other

    cs.DC

    HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration

    Authors: Weijian Chen, Shuibing He, Haoyang Qu, Xuechen Zhang

    Abstract: Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a featu… ▽ More

    Submitted 8 September, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

  21. arXiv:2408.13395  [pdf, other

    cs.CV

    Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing

    Authors: Yangyang Xu, Wenqi Shao, Yong Du, Haiming Zhu, Yang Zhou, Ping Luo, Shengfeng He

    Abstract: Recent advancements in text-guided diffusion models have unlocked powerful image manipulation capabilities, yet balancing reconstruction fidelity and editability for real images remains a significant challenge. In this work, we introduce \textbf{T}ask-\textbf{O}riented \textbf{D}iffusion \textbf{I}nversion (\textbf{TODInv}), a novel framework that inverts and edits real images tailored to specific… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  22. arXiv:2408.13006  [pdf, other

    cs.CL

    Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates

    Authors: Hui Wei, Shenghua He, Tian Xia, Andy Wong, Jingyang Lin, Mei Han

    Abstract: Alignment approaches such as RLHF and DPO are actively investigated to align large language models (LLMs) with human preferences. Commercial large language models (LLMs) like GPT-4 have been recently employed to evaluate and compare different LLM alignment approaches. These models act as surrogates for human evaluators due to their promising abilities to approximate human preferences with remarkab… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Preprint, under review. 17 pages, 7 figures, 16 tables

  23. arXiv:2408.11429  [pdf, other

    cs.RO cs.AI

    Long-Range Vision-Based UAV-assisted Localization for Unmanned Surface Vehicles

    Authors: Waseem Akram, Siyuan Yang, Hailiang Kuang, Xiaoyu He, Muhayy Ud Din, Yihao Dong, Defu Lin, Lakmal Seneviratne, Shaoming He, Irfan Hussain

    Abstract: The global positioning system (GPS) has become an indispensable navigation method for field operations with unmanned surface vehicles (USVs) in marine environments. However, GPS may not always be available outdoors because it is vulnerable to natural interference and malicious jamming attacks. Thus, an alternative navigation system is required when the use of GPS is restricted or prohibited. To th… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  24. arXiv:2408.10327  [pdf, other

    cs.SE

    An Empirical Study on Package-Level Deprecation in Python Ecosystem

    Authors: Zhiqing Zhong, Shilin He, Haoxuan Wang, Boxi Yu, Haowen Yang, Pinjia He

    Abstract: Open-source software (OSS) plays a crucial role in modern software development. Utilizing OSS code can greatly accelerate software development, reduce redundancy, and enhance reliability. Python, a widely adopted programming language, is renowned for its extensive and diverse third-party package ecosystem. However, a significant number of OSS packages within the Python ecosystem are in poor mainte… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted by 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE'25)

  25. arXiv:2408.09458  [pdf, other

    cs.CV

    G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors

    Authors: Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He

    Abstract: Reversible face anonymization, unlike traditional face pixelization, seeks to replace sensitive identity information in facial images with synthesized alternatives, preserving privacy without sacrificing image clarity. Traditional methods, such as encoder-decoder networks, often result in significant loss of facial details due to their limited learning capacity. Additionally, relying on latent man… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  26. VrdONE: One-stage Video Visual Relation Detection

    Authors: Xinjie Jiang, Chenxi Zheng, Xuemiao Xu, Bangzhen Liu, Weiying Zheng, Huaidong Zhang, Shengfeng He

    Abstract: Video Visual Relation Detection (VidVRD) focuses on understanding how entities interact over time and space in videos, a key step for gaining deeper insights into video scenes beyond basic visual tasks. Traditional methods for VidVRD, challenged by its complexity, typically split the task into two parts: one for identifying what relation categories are present and another for determining their tem… ▽ More

    Submitted 16 October, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: 12 pages, 8 figures, accepted by ACM Multimedia 2024

  27. arXiv:2408.06665  [pdf, ps, other

    cs.LG cs.AI

    RW-NSGCN: A Robust Approach to Structural Attacks via Negative Sampling

    Authors: Shuqi He, Jun Zhuang, Ding Wang, Jun Song

    Abstract: Node classification using Graph Neural Networks (GNNs) has been widely applied in various practical scenarios, such as predicting user interests and detecting communities in social networks. However, recent studies have shown that graph-structured networks often contain potential noise and attacks, in the form of topological perturbations and weight disturbances, which can lead to decreased classi… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  28. arXiv:2408.05358  [pdf, other

    eess.SP cs.CV cs.HC cs.LG

    GesturePrint: Enabling User Identification for mmWave-based Gesture Recognition Systems

    Authors: Lilin Xu, Keyi Wang, Chaojie Gu, Xiuzhen Guo, Shibo He, Jiming Chen

    Abstract: The millimeter-wave (mmWave) radar has been exploited for gesture recognition. However, existing mmWave-based gesture recognition methods cannot identify different users, which is important for ubiquitous gesture interaction in many applications. In this paper, we propose GesturePrint, which is the first to achieve gesture recognition and gesture-based user identification using a commodity mmWave… ▽ More

    Submitted 25 July, 2024; originally announced August 2024.

    Comments: Accepted to the 44th IEEE International Conference on Distributed Computing Systems (ICDCS 2024)

  29. arXiv:2408.03284  [pdf, other

    cs.CV cs.GR cs.MM

    ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

    Authors: Jiazhi Guan, Zhiliang Xu, Hang Zhou, Kaisiyuan Wang, Shengyi He, Zhanwang Zhang, Borong Liang, Haocheng Feng, Errui Ding, Jingtuo Liu, Jingdong Wang, Youjian Zhao, Ziwei Liu

    Abstract: Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers. While recent studies explore high-fidelity lip-sync with different techniques, their task-orientated models either require long-term videos for clip-specific training or retain visible artifacts. In this paper, we propose a unified and effective framework ReSyn… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2024. Project page: https://guanjz20.github.io/projects/ReSyncer

  30. arXiv:2408.02268  [pdf, other

    cs.HC

    CHORDination: Evaluating Visual Design Choices in Chord Diagrams for Network Data

    Authors: Kai Wang, Shuqi He, Wenlu Wang, Jinbei Yu, Yu Liu, Lingyun Yu

    Abstract: Chord diagrams are widely used for visualizing data connectivity and flow between nodes in a network. They are effective for representing complex structures through an intuitive and visually appealing circular layout. While previous work has focused on improving aesthetics and interactivity, the influence of fundamental design elements on user perception and information retrieval remains under-exp… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 12 pages, 4 pages of appendix, 8 figures, VINCI 2024

  31. arXiv:2408.02074  [pdf

    eess.IV cs.AI cs.CV

    Applying Conditional Generative Adversarial Networks for Imaging Diagnosis

    Authors: Haowei Yang, Yuxiang Hu, Shuyao He, Ting Xu, Jiajie Yuan, Xingxin Gu

    Abstract: This study introduces an innovative application of Conditional Generative Adversarial Networks (C-GAN) integrated with Stacked Hourglass Networks (SHGN) aimed at enhancing image segmentation, particularly in the challenging environment of medical imaging. We address the problem of overfitting, common in deep learning models applied to complex imaging datasets, by augmenting data through rotation a… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

  32. arXiv:2408.01429  [pdf, ps, other

    cs.NI cs.AI cs.LG

    An Agile Adaptation Method for Multi-mode Vehicle Communication Networks

    Authors: Shiwen He, Kanghong Chen, Shiyue Huang, Wei Huang, Zhenyu An

    Abstract: This paper focuses on discovering the impact of communication mode allocation on communication efficiency in the vehicle communication networks. To be specific, Markov decision process and reinforcement learning are applied to establish an agile adaptation mechanism for multi-mode communication devices according to the driving scenarios and business requirements. Then, Q-learning is used to train… ▽ More

    Submitted 18 July, 2024; originally announced August 2024.

  33. arXiv:2408.00123  [pdf, other

    cs.IR cs.AI cs.MM cs.SI

    Semantic Codebook Learning for Dynamic Recommendation Models

    Authors: Zheqi Lv, Shaoxuan He, Tianyu Zhan, Shengyu Zhang, Wenqiao Zhang, Jingyuan Chen, Zhou Zhao, Fei Wu

    Abstract: Dynamic sequential recommendation (DSR) can generate model parameters based on user behavior to improve the personalization of sequential recommendation under various user preferences. However, it faces the challenges of large parameter search space and sparse and noisy user-item interactions, which reduces the applicability of the generated model parameters. The Semantic Codebook Learning for Dyn… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  34. arXiv:2407.20947  [pdf, other

    cs.NE

    An Asynchronous Multi-core Accelerator for SNN inference

    Authors: Zhuo Chen, De Ma, Xiaofei Jin, Qinghui Xing, Ouwen Jin, Xin Du, Shuibing He, Gang Pan

    Abstract: Spiking Neural Networks (SNNs) are extensively utilized in brain-inspired computing and neuroscience research. To enhance the speed and energy efficiency of SNNs, several many-core accelerators have been developed. However, maintaining the accuracy of SNNs often necessitates frequent explicit synchronization among all cores, which presents a challenge to overall efficiency. In this paper, we propo… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  35. arXiv:2407.18244  [pdf, other

    cs.CV

    RefMask3D: Language-Guided Transformer for 3D Referring Segmentation

    Authors: Shuting He, Henghui Ding

    Abstract: 3D referring segmentation is an emerging and challenging vision-language task that aims to segment the object described by a natural language expression in a point cloud scene. The key challenge behind this task is vision-language feature fusion and alignment. In this work, we propose RefMask3D to explore the comprehensive multi-modal feature interaction and understanding. First, we propose a Geom… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: ACM MM 2024, Code: https://github.com/heshuting555/RefMask3D

  36. arXiv:2407.17272  [pdf, other

    cs.CV

    DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy

    Authors: Yi Lei, Huilin Zhu, Jingling Yuan, Guangli Xiang, Xian Zhong, Shengfeng He

    Abstract: Drone-based crowd tracking faces difficulties in accurately identifying and monitoring objects from an aerial perspective, largely due to their small size and close proximity to each other, which complicates both localization and tracking. To address these challenges, we present the Density-aware Tracking (DenseTrack) framework. DenseTrack capitalizes on crowd counting to precisely determine objec… ▽ More

    Submitted 26 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  37. arXiv:2407.15004  [pdf, other

    cs.CE

    Incorporating lane-change prediction into energy-efficient speed control of connected autonomous vehicles at intersections

    Authors: Maziar Zamanpour, Suiyi He, Michael W. Levin, Zongxuan Sun

    Abstract: Connected and autonomous vehicles (CAVs) possess the capability of perception and information broadcasting with other CAVs and connected intersections. Additionally, they exhibit computational abilities and can be controlled strategically, offering energy benefits. One potential control strategy is real-time speed control, which adjusts the vehicle speed by taking advantage of broadcasted traffic… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: Under review for Transportation Research Part C

  38. arXiv:2407.13761  [pdf, other

    cs.CV

    SegPoint: Segment Any Point Cloud via Large Language Model

    Authors: Shuting He, Henghui Ding, Xudong Jiang, Bihan Wen

    Abstract: Despite significant progress in 3D point cloud segmentation, existing methods primarily address specific tasks and depend on explicit instructions to identify targets, lacking the capability to infer and understand implicit user intentions in a unified framework. In this work, we propose a model, called SegPoint, that leverages the reasoning capabilities of a multi-modal Large Language Model (LLM)… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, Project Page: https://heshuting555.github.io/SegPoint

  39. arXiv:2407.08440  [pdf, other

    cs.CL cs.AI

    Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models

    Authors: Wangtao Sun, Chenxiang Zhang, XueYou Zhang, Xuanqing Yu, Ziyang Huang, Pei Chen, Haotian Xu, Shizhu He, Jun Zhao, Kang Liu

    Abstract: Although Large Language Models (LLMs) have demonstrated strong ability, they are further supposed to be controlled and guided by in real-world scenarios to be safe, accurate, and intelligent. This demands the possession of capability of LLMs. However, no prior work has made a clear evaluation of the inferential rule-following capability of LLMs. Previous studies that try to evaluate the inferentia… ▽ More

    Submitted 17 October, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  40. arXiv:2407.07554  [pdf, other

    cs.GR cs.SD eess.AS

    Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation

    Authors: Zikai Huang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Chenxi Zheng, Jing Qin, Shengfeng He

    Abstract: Dance, as an art form, fundamentally hinges on the precise synchronization with musical beats. However, achieving aesthetically pleasing dance sequences from music is challenging, with existing methods often falling short in controllability and beat alignment. To address these shortcomings, this paper introduces Beat-It, a novel framework for beat-specific, key pose-guided dance generation. Unlike… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  41. arXiv:2407.05092  [pdf, other

    cs.CL

    Exploring Sound Change Over Time: A Review of Computational and Human Perception

    Authors: Siqi He, Wei Zhao

    Abstract: Computational and human perception are often considered separate approaches for studying sound changes over time; few works have touched on the intersection of both. To fill this research gap, we provide a pioneering review contrasting computational with human perception from the perspectives of methods and tasks. Overall, computational approaches rely on computer-driven models to perceive histori… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: LChange24 Camera Ready

  42. arXiv:2407.04997  [pdf, other

    cs.SE cs.AI cs.HC

    Achieving Tool Calling Functionality in LLMs Using Only Prompt Engineering Without Fine-Tuning

    Authors: Shengtao He

    Abstract: Currently, the vast majority of locally deployed open-source large language models (LLMs) and some commercial model interfaces do not support stable tool calling functionality. The existing solution involves fine-tuning LLMs, which results in significant time and computational resource consumption. This paper proposes a method that enables LLMs to achieve stable tool calling capabilities using onl… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 5 pages, 2 figures,review comments welcome

    ACM Class: I.2.7

  43. arXiv:2407.04948  [pdf, other

    cs.CV

    Zero-shot Object Counting with Good Exemplars

    Authors: Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Zheng Wang, Xian Zhong, Shengfeng He

    Abstract: Zero-shot object counting (ZOC) aims to enumerate objects in images using only the names of object classes during testing, without the need for manual annotations. However, a critical challenge in current ZOC methods lies in their inability to identify high-quality exemplars effectively. This deficiency hampers scalability across diverse classes and undermines the development of strong visual asso… ▽ More

    Submitted 9 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  44. arXiv:2407.04621  [pdf, other

    cs.CV

    OneRestore: A Universal Restoration Framework for Composite Degradation

    Authors: Yu Guo, Yuan Gao, Yuxu Lu, Huilin Zhu, Ryan Wen Liu, Shengfeng He

    Abstract: In real-world scenarios, image impairments often manifest as composite degradations, presenting a complex interplay of elements such as low light, haze, rain, and snow. Despite this reality, existing restoration methods typically target isolated degradation types, thereby falling short in environments where multiple degrading factors coexist. To bridge this gap, our study proposes a versatile imag… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  45. arXiv:2407.04152  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation

    Authors: I-Chun Arthur Liu, Sicheng He, Daniel Seita, Gaurav Sukhatme

    Abstract: Bimanual manipulation is critical to many robotics applications. In contrast to single-arm manipulation, bimanual manipulation tasks are challenging due to higher-dimensional action spaces. Prior works leverage large amounts of data and primitive actions to address this problem, but may suffer from sample inefficiency and limited generalization across various tasks. To this end, we propose VoxAct-… ▽ More

    Submitted 5 October, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to the Conference on Robot Learning (CoRL) 2024

  46. arXiv:2407.01688  [pdf, other

    cs.SE

    How We Built Cedar: A Verification-Guided Approach

    Authors: Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Michael Hicks, Kesha Hietala, John Kastner, Anwar Mamat, Matt McCutchen, Neha Rungta, Bhakti Shah, Emina Torlak, Andrew Wells

    Abstract: This paper presents verification-guided development (VGD), a software engineering process we used to build Cedar, a new policy language for expressive, fast, safe, and analyzable authorization. Developing a system with VGD involves writing an executable model of the system and mechanically proving properties about the model; writing production code for the system and using differential random test… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  47. arXiv:2407.00987  [pdf, other

    cs.NI eess.SY

    Exploiting Dependency-Aware Priority Adjustment for Mixed-Criticality TSN Flow Scheduling

    Authors: Miao Guo, Yifei Sun, Chaojie Gu, Shibo He, Zhiguo Shi

    Abstract: Time-Sensitive Networking (TSN) serves as a one-size-fits-all solution for mixed-criticality communication, in which flow scheduling is vital to guarantee real-time transmissions. Traditional approaches statically assign priorities to flows based on their associated applications, resulting in significant queuing delays. In this paper, we observe that assigning different priorities to a flow leads… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by IWQoS'24

  48. arXiv:2406.19708  [pdf, other

    cs.NE cs.AI cs.CE q-bio.NC

    A Differentiable Approach to Multi-scale Brain Modeling

    Authors: Chaoming Wang, Muyang Lyu, Tianqiu Zhang, Sichao He, Si Wu

    Abstract: We present a multi-scale differentiable brain modeling workflow utilizing BrainPy, a unique differentiable brain simulator that combines accurate brain simulation with powerful gradient-based optimization. We leverage this capability of BrainPy across different brain scales. At the single-neuron level, we implement differentiable neuron models and employ gradient methods to optimize their fit to e… ▽ More

    Submitted 25 September, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: 2nd Differentiable Almost Everything Workshop at ICML 2024. https://github.com/chaoming0625/differentiable-brain-modeling-workflow

  49. arXiv:2406.18548  [pdf

    eess.IV cs.CV

    Exploration of Multi-Scale Image Fusion Systems in Intelligent Medical Image Analysis

    Authors: Yuxiang Hu, Haowei Yang, Ting Xu, Shuyao He, Jiajie Yuan, Haozhang Deng

    Abstract: The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is a… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  50. arXiv:2406.18085  [pdf, other

    cs.CL

    Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints

    Authors: Ran Song, Shizhu He, Shengxiang Gao, Li Cai, Kang Liu, Zhengtao Yu, Jun Zhao

    Abstract: Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 11 pages, ACL 2023