Skip to main content

Showing 1–50 of 328 results for author: Jiang, B

  1. arXiv:2410.15358  [pdf, ps, other

    eess.SP cs.IT math.OC

    A New Adaptive Balanced Augmented Lagrangian Method with Application to ISAC Beamforming Design

    Authors: Jiageng Wu, Bo Jiang, Xinxin Li, Ya-Feng Liu, Jianhua Yuan

    Abstract: In this paper, we consider a class of convex programming problems with linear equality constraints, which finds broad applications in machine learning and signal processing. We propose a new adaptive balanced augmented Lagrangian (ABAL) method for solving these problems. The proposed ABAL method adaptively selects the stepsize parameter and enjoys a low per-iteration complexity, involving only the… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: 7 pages, 1 table

  2. arXiv:2410.12329  [pdf, other

    cs.CL cs.AI

    Understanding the Role of LLMs in Multimodal Evaluation Benchmarks

    Authors: Botian Jiang, Lei Li, Xiaonan Li, Zhaowei Li, Xiachong Feng, Lingpeng Kong, Qi Liu, Xipeng Qiu

    Abstract: The rapid advancement of Multimodal Large Language Models (MLLMs) has been accompanied by the development of various benchmarks to evaluate their capabilities. However, the true nature of these evaluations and the extent to which they assess multimodal reasoning versus merely leveraging the underlying Large Language Model (LLM) backbone remain unclear. This paper presents a comprehensive investiga… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  3. arXiv:2410.12309  [pdf, other

    cs.CR

    Correction to Local Information Privacy and Its Applications to Data Aggregation

    Authors: Bo Jiang, Ming Li, Ravi Tandon

    Abstract: In our previous works, we defined Local Information Privacy (LIP) as a context-aware privacy notion and presented the corresponding privacy-preserving mechanism. Then we claim that the mechanism satisfies epsilon-LIP for any epsilon>0 for arbitrary Px. However, this claim is not completely correct. In this document, we provide a correction to the valid range of privacy parameters of our previously… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  4. arXiv:2410.11182  [pdf, other

    cs.LG cs.AI cs.CR

    Archilles' Heel in Semi-open LLMs: Hiding Bottom against Recovery Attacks

    Authors: Hanbo Huang, Yihan Li, Bowen Jiang, Lin Liu, Ruoyu Sun, Zhuotao Liu, Shiyu Liang

    Abstract: Closed-source large language models deliver strong performance but have limited downstream customizability. Semi-open models, combining both closed-source and public layers, were introduced to improve customizability. However, parameters in the closed-source layers are found vulnerable to recovery attacks. In this paper, we explore the design of semi-open models with fewer closed-source layers, ai… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 10 pages for main content of the paper

  5. arXiv:2410.08260  [pdf, other

    cs.CV cs.AI

    Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content

    Authors: Qiuheng Wang, Yukai Shi, Jiarong Ou, Rui Chen, Ke Lin, Jiahao Wang, Boyuan Jiang, Haotian Yang, Mingwu Zheng, Xin Tao, Fei Yang, Pengfei Wan, Di Zhang

    Abstract: As visual generation technologies continue to advance, the scale of video datasets has expanded rapidly, and the quality of these datasets is critical to the performance of video generation models. We argue that temporal splitting, detailed captions, and video quality filtering are three key factors that determine dataset quality. However, existing datasets exhibit various limitations in these are… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: Project page: https://koala36m.github.io/

  6. arXiv:2410.07854  [pdf, other

    cs.CV cs.MM

    HeGraphAdapter: Tuning Multi-Modal Vision-Language Models with Heterogeneous Graph Adapter

    Authors: Yumiao Zhao, Bo Jiang, Xiao Wang, Qin Xu, Jin Tang

    Abstract: Adapter-based tuning methods have shown significant potential in transferring knowledge from pre-trained Vision-Language Models to the downstream tasks. However, after reviewing existing adapters, we find they generally fail to fully explore the interactions between different modalities in constructing task-specific knowledge. Also, existing works usually only focus on similarity matching between… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  7. arXiv:2410.04616  [pdf, other

    cs.CL

    LRQ-Fact: LLM-Generated Relevant Questions for Multimodal Fact-Checking

    Authors: Alimohammad Beigi, Bohan Jiang, Dawei Li, Tharindu Kumarage, Zhen Tan, Pouya Shaeri, Huan Liu

    Abstract: Human fact-checkers have specialized domain knowledge that allows them to formulate precise questions to verify information accuracy. However, this expert-driven approach is labor-intensive and is not scalable, especially when dealing with complex multimodal misinformation. In this paper, we propose a fully-automated framework, LRQ-Fact, for multimodal fact-checking. Firstly, the framework leverag… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

  8. arXiv:2410.03026  [pdf, other

    cs.CL cs.LG

    Characterizing Context Influence and Hallucination in Summarization

    Authors: James Flemings, Wanrong Zhang, Bo Jiang, Zafar Takhirov, Murali Annavaram

    Abstract: Although Large Language Models (LLMs) have achieved remarkable performance in numerous downstream tasks, their ubiquity has raised two significant concerns. One is that LLMs can hallucinate by generating content that contradicts relevant contextual information; the other is that LLMs can inadvertently leak private information due to input regurgitation. Many prior works have extensively studied ea… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  9. arXiv:2410.00982  [pdf, other

    cs.CV

    ScVLM: a Vision-Language Model for Driving Safety Critical Event Understanding

    Authors: Liang Shi, Boyu Jiang, Feng Guo

    Abstract: Accurately identifying, understanding, and describing driving safety-critical events (SCEs), including crashes and near-crashes, is crucial for traffic safety, automated driving systems, and advanced driver assistance systems research and application. As SCEs are rare events, most general Vision-Language Models (VLMs) have not been trained sufficiently to link SCE videos and narratives, which coul… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  10. arXiv:2410.00379  [pdf, other

    cs.CV cs.AI cs.LG

    CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

    Authors: Xiao Wang, Fuling Wang, Yuehang Li, Qingchuan Ma, Shiao Wang, Bo Jiang, Chuanfu Li, Jin Tang

    Abstract: X-ray image-based medical report generation (MRG) is a pivotal area in artificial intelligence which can significantly reduce diagnostic burdens and patient wait times. Despite significant progress, we believe that the task has reached a bottleneck due to the limited benchmark datasets and the existing large models' insufficient capability enhancements in this specialized domain. Specifically, the… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: In Peer Review

  11. arXiv:2409.18486  [pdf, other

    cs.CL

    Evaluation of OpenAI o1: Opportunities and Challenges of AGI

    Authors: Tianyang Zhong, Zhengliang Liu, Yi Pan, Yutong Zhang, Yifan Zhou, Shizhe Liang, Zihao Wu, Yanjun Lyu, Peng Shu, Xiaowei Yu, Chao Cao, Hanqi Jiang, Hanxu Chen, Yiwei Li, Junhao Chen, Huawen Hu, Yihen Liu, Huaqin Zhao, Shaochen Xu, Haixing Dai, Lin Zhao, Ruidong Zhang, Wei Zhao, Zhenyuan Yang, Jingyuan Chen , et al. (53 additional authors not shown)

    Abstract: This comprehensive study evaluates the performance of OpenAI's o1-preview large language model across a diverse array of complex reasoning tasks, spanning multiple domains, including computer science, mathematics, natural sciences, medicine, linguistics, and social sciences. Through rigorous testing, o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performan… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  12. arXiv:2409.17728  [pdf, other

    cs.CV cs.AI

    AlterMOMA: Fusion Redundancy Pruning for Camera-LiDAR Fusion Models with Alternative Modality Masking

    Authors: Shiqi Sun, Yantao Lu, Ning Liu, Bo Jiang, JinChao Chen, Ying Zhang

    Abstract: Camera-LiDAR fusion models significantly enhance perception performance in autonomous driving. The fusion mechanism leverages the strengths of each modality while minimizing their weaknesses. Moreover, in practice, camera-LiDAR fusion models utilize pre-trained backbones for efficient training. However, we argue that directly loading single-modal pre-trained camera and LiDAR backbones into camera-… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: 17 pages, 3 figures, Accepted by NeurIPS 2024

  13. arXiv:2409.14115  [pdf, other

    cs.RO

    Aerial Grasping with Soft Aerial Vehicle Using Disturbance Observer-Based Model Predictive Control

    Authors: Hiu Ching Cheung, Bailun Jiang, Yang Hu, Henry K. Chu, Chih-Yung Wen, Ching-Wei Chang

    Abstract: Aerial grasping, particularly soft aerial grasping, holds significant promise for drone delivery and harvesting tasks. However, controlling UAV dynamics during aerial grasping presents considerable challenges. The increased mass during payload grasping adversely affects thrust prediction, while unpredictable environmental disturbances further complicate control efforts. In this study, our objectiv… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

    Comments: 8 pages, 10 figures, submitted to IEEE Robotics Automation Letters

  14. arXiv:2409.06741  [pdf, other

    cs.SE cs.AI

    Generative AI for Requirements Engineering: A Systematic Literature Review

    Authors: Haowei Cheng, Jati H. Husen, Sien Reeve Peralta, Bowen Jiang, Nobukazu Yoshioka, Naoyasu Ubayashi, Hironori Washizaki

    Abstract: Context: Generative AI (GenAI) has emerged as a transformative tool in software engineering, with requirements engineering (RE) actively exploring its potential to revolutionize processes and outcomes. The integration of GenAI into RE presents both promising opportunities and significant challenges that necessitate systematic analysis and evaluation. Objective: This paper presents a comprehensive… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  15. arXiv:2409.06299  [pdf, other

    cs.CV cs.AI

    Enhancing Long Video Understanding via Hierarchical Event-Based Memory

    Authors: Dingxin Cheng, Mingda Li, Jingyu Liu, Yongxin Guo, Bin Jiang, Qingbin Liu, Xi Chen, Bo Zhao

    Abstract: Recently, integrating visual foundation models into large language models (LLMs) to form video understanding systems has attracted widespread attention. Most of the existing models compress diverse semantic information within the whole video and feed it into LLMs for content comprehension. While this method excels in short video understanding, it may result in a blend of multiple event information… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  16. arXiv:2409.04768  [pdf, other

    cs.CV

    Medical Image Segmentation via Single-Source Domain Generalization with Random Amplitude Spectrum Synthesis

    Authors: Qiang Qiao, Wenyu Wang, Meixia Qu, Kun Su, Bin Jiang, Qiang Guo

    Abstract: The field of medical image segmentation is challenged by domain generalization (DG) due to domain shifts in clinical datasets. The DG challenge is exacerbated by the scarcity of medical data and privacy concerns. Traditional single-source domain generalization (SSDG) methods primarily rely on stacking data augmentation techniques to minimize domain discrepancies. In this paper, we propose Random A… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

    Comments: 11 pages, 4 figures, Medical Image Computing and Computer Assisted Intervention 2024

  17. arXiv:2409.02834  [pdf, other

    cs.CL

    CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models

    Authors: Wentao Liu, Qianjun Pan, Yi Zhang, Zhuo Liu, Ji Wu, Jie Zhou, Aimin Zhou, Qin Chen, Bo Jiang, Liang He

    Abstract: Large language models (LLMs) have obtained promising results in mathematical reasoning, which is a foundational skill for human intelligence. Most previous studies focus on improving and measuring the performance of LLMs based on textual math reasoning datasets (e.g., MATH, GSM8K). Recently, a few researchers have released English multimodal math datasets (e.g., MATHVISTA and MATH-V) to evaluate t… ▽ More

    Submitted 6 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

  18. arXiv:2409.00968  [pdf, other

    math.OC cs.AI cs.LG

    Solving Integrated Process Planning and Scheduling Problem via Graph Neural Network Based Deep Reinforcement Learning

    Authors: Hongpei Li, Han Zhang, Ziyan He, Yunkai Jia, Bo Jiang, Xiang Huang, Dongdong Ge

    Abstract: The Integrated Process Planning and Scheduling (IPPS) problem combines process route planning and shop scheduling to achieve high efficiency in manufacturing and maximize resource utilization, which is crucial for modern manufacturing systems. Traditional methods using Mixed Integer Linear Programming (MILP) and heuristic algorithms can not well balance solution quality and speed when solving IPPS… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 24 pages, 13 figures

  19. arXiv:2408.15018  [pdf, other

    cs.HC cs.AI

    Cross-subject Brain Functional Connectivity Analysis for Multi-task Cognitive State Evaluation

    Authors: Jun Chen, Anqi Chen, Bingkun Jiang, Mohammad S. Obaidat, Ni Li, Xinyu Zhang

    Abstract: Cognition refers to the function of information perception and processing, which is the fundamental psychological essence of human beings. It is responsible for reasoning and decision-making, while its evaluation is significant for the aviation domain in mitigating potential safety risks. Existing studies tend to use varied methods for cognitive state evaluation yet have limitations in timeliness,… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  20. arXiv:2408.14122  [pdf, other

    cs.CR

    FG-SAT: Efficient Flow Graph for Encrypted Traffic Classification under Environment Shifts

    Authors: Susu Cui, Xueying Han, Dongqi Han, Zhiliang Wang, Weihang Wang, Yun Li, Bo Jiang, Baoxu Liu, Zhigang Lu

    Abstract: Encrypted traffic classification plays a critical role in network security and management. Currently, mining deep patterns from side-channel contents and plaintext fields through neural networks is a major solution. However, existing methods have two major limitations: (1) They fail to recognize the critical link between transport layer mechanisms and applications, missing the opportunity to learn… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Ready to submit to IEEE Transactions on Information Forensics and Security (TIFS)

  21. arXiv:2408.12340  [pdf, other

    cs.CV

    VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding

    Authors: Yujie Liang, Xiaobin Hu, Boyuan Jiang, Donghao Luo, Kai WU, Wenhui Han, Taisong Jin, Chengjie Wang

    Abstract: Although diffusion-based image virtual try-on has made considerable progress, emerging approaches still struggle to effectively address the issue of hand occlusion (i.e., clothing regions occluded by the hand part), leading to a notable degradation of the try-on performance. To tackle this issue widely existing in real-world scenarios, we propose VTON-HandFit, leveraging the power of hand priors t… ▽ More

    Submitted 26 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: The project page is \url{https://vton-handfit.github.io}

  22. arXiv:2408.10488  [pdf, other

    cs.CV cs.AI cs.CL cs.NE

    Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm

    Authors: Xiao Wang, Yao Rong, Fuling Wang, Jianing Li, Lin Zhu, Bo Jiang, Yaowei Wang

    Abstract: Sign Language Translation (SLT) is a core task in the field of AI-assisted disability. Unlike traditional SLT based on visible light videos, which is easily affected by factors such as lighting, rapid hand movements, and privacy breaches, this paper proposes the use of high-definition Event streams for SLT, effectively mitigating the aforementioned issues. This is primarily because Event streams h… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: First Large-scale and High-Definition Benchmark Dataset for Event-based Sign Language Translation

  23. arXiv:2408.10487  [pdf, other

    cs.CV cs.AI

    MambaEVT: Event Stream based Visual Object Tracking using State Space Model

    Authors: Xiao Wang, Chao wang, Shiao Wang, Xixi Wang, Zhicheng Zhao, Lin Zhu, Bo Jiang

    Abstract: Event camera-based visual tracking has drawn more and more attention in recent years due to the unique imaging principle and advantages of low energy consumption, high dynamic range, and dense temporal resolution. Current event-based tracking algorithms are gradually hitting their performance bottlenecks, due to the utilization of vision Transformer and the static template for target object locali… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: In Peer Review

  24. arXiv:2408.09764  [pdf, other

    cs.CV cs.AI cs.NE

    Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms

    Authors: Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian

    Abstract: Human Action Recognition (HAR) stands as a pivotal research domain in both computer vision and artificial intelligence, with RGB cameras dominating as the preferred tool for investigation and innovation in this field. However, in real-world applications, RGB cameras encounter numerous challenges, including light conditions, fast motion, and privacy concerns. Consequently, bio-inspired event camera… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: In Peer Review

  25. arXiv:2408.09743  [pdf, other

    cs.CV cs.AI cs.CL

    R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation

    Authors: Xiao Wang, Yuehang Li, Fuling Wang, Shiao Wang, Chuanfu Li, Bo Jiang

    Abstract: Inspired by the tremendous success of Large Language Models (LLMs), existing X-ray medical report generation methods attempt to leverage large models to achieve better performance. They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation. How to extract more effective information for the LLMs to help them improve f… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: In Peer Review

  26. arXiv:2408.08078  [pdf, other

    cs.CV cs.AI

    Treat Stillness with Movement: Remote Sensing Change Detection via Coarse-grained Temporal Foregrounds Mining

    Authors: Xixi Wang, Zitian Wang, Jingtao Jiang, Lan Chen, Xiao Wang, Bo Jiang

    Abstract: Current works focus on addressing the remote sensing change detection task using bi-temporal images. Although good performance can be achieved, however, seldom of they consider the motion cues which may also be vital. In this work, we revisit the widely adopted bi-temporal images-based framework and propose a novel Coarse-grained Temporal Mining Augmented (CTMA) framework. To be specific, given th… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: In Peer Review

  27. arXiv:2408.03723  [pdf, other

    cs.RO

    MS-Mapping: An Uncertainty-Aware Large-Scale Multi-Session LiDAR Mapping System

    Authors: Xiangcheng Hu, Jin Wu, Jianhao Jiao, Binqian Jiang, Wei Zhang, Wenshuo Wang, Ping Tan

    Abstract: Large-scale multi-session LiDAR mapping is essential for a wide range of applications, including surveying, autonomous driving, crowdsourced mapping, and multi-agent navigation. However, existing approaches often struggle with data redundancy, robustness, and accuracy in complex environments. To address these challenges, we present MS-Mapping, an novel multi-session LiDAR mapping system that emplo… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 18 pages, 22 figures

  28. arXiv:2408.03519  [pdf, other

    cs.SE cs.AI

    RepoMasterEval: Evaluating Code Completion via Real-World Repositories

    Authors: Qinyun Wu, Chao Peng, Pengfei Gao, Ruida Hu, Haoyu Gan, Bo Jiang, Jinhe Tang, Zhiwen Deng, Zhanming Guan, Cuiyun Gao, Xia Liu, Ping Yang

    Abstract: With the growing reliance on automated code completion tools in software development, the need for robust evaluation benchmarks has become critical. However, existing benchmarks focus more on code generation tasks in function and class level and provide rich text description to prompt the model. By contrast, such descriptive prompt is commonly unavailable in real development and code completion ca… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  29. arXiv:2408.02503  [pdf, other

    cs.CL

    UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model

    Authors: Zhaowei Li, Wei Wang, YiQing Cai, Xu Qi, Pengyu Wang, Dong Zhang, Hang Song, Botian Jiang, Zhida Huang, Tao Wang

    Abstract: Significant advancements has recently been achieved in the field of multi-modal large language models (MLLMs), demonstrating their remarkable capabilities in understanding and reasoning across diverse tasks. However, these models are often trained for specific tasks and rely on task-specific input-output formats, limiting their applicability to a broader range of tasks. This raises a fundamental q… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  30. CoEdPilot: Recommending Code Edits with Learned Prior Edit Relevance, Project-wise Awareness, and Interactive Nature

    Authors: Chenyan Liu, Yufan Cai, Yun Lin, Yuhuan Huang, Yunrui Pei, Bo Jiang, Ping Yang, Jin Song Dong, Hong Mei

    Abstract: Recent years have seen the development of LLM-based code generation. Compared to generating code in a software project, incremental code edits are empirically observed to be more frequent. The emerging code editing approaches usually formulate the problem as generating an edit based on known relevant prior edits and context. However, practical code edits can be more complicated. First, an editing… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 13 pages, 7 figures

  31. arXiv:2407.17535  [pdf, other

    cs.AI cs.LG cs.SE

    LAMBDA: A Large Model Based Data Agent

    Authors: Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan, Jian Huang

    Abstract: We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system that leverages the power of large models. LAMBDA is designed to address data analysis challenges in complex data-driven applications through innovatively designed data agents that operate iteratively and generatively using natural language. At the core of LAMBDA are two key agent rol… ▽ More

    Submitted 14 September, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: 51 pages, 23 figures and 6 tables

    MSC Class: 62-04; 62-08; 68T01; 68T09

  32. arXiv:2407.17451  [pdf, other

    cs.SI cs.CY cs.IR

    BlueTempNet: A Temporal Multi-network Dataset of Social Interactions in Bluesky Social

    Authors: Ujun Jeong, Bohan Jiang, Zhen Tan, H. Russell Bernard, Huan Liu

    Abstract: Decentralized social media platforms like Bluesky Social (Bluesky) have made it possible to publicly disclose some user behaviors with millisecond-level precision. Embracing Bluesky's principles of open-source and open-data, we present the first collection of the temporal dynamics of user-driven social interactions. BlueTempNet integrates multiple types of networks into a single multi-network, inc… ▽ More

    Submitted 2 October, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: accepted to IEEE Data Descriptions 24

  33. arXiv:2407.17349  [pdf, other

    cs.CL

    Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching

    Authors: Yuyang Ding, Hanglei Hu, Jie Zhou, Qin Chen, Bo Jiang, Liang He

    Abstract: With the introduction of large language models (LLMs), automatic math reasoning has seen tremendous success. However, current methods primarily focus on providing solutions or using techniques like Chain-of-Thought to enhance problem-solving accuracy. In this paper, we focus on improving the capability of mathematics teaching via a Socratic teaching-based LLM (\texttt{SocraticLLM}), which guides l… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted By CIKM 2024

  34. arXiv:2407.08585  [pdf, other

    cs.RO cs.AI cs.LG

    HACMan++: Spatially-Grounded Motion Primitives for Manipulation

    Authors: Bowen Jiang, Yilin Wu, Wenxuan Zhou, Chris Paxton, David Held

    Abstract: Although end-to-end robot learning has shown some success for robot manipulation, the learned policies are often not sufficiently robust to variations in object pose or geometry. To improve the policy generalization, we introduce spatially-grounded parameterized motion primitives in our method HACMan++. Specifically, we propose an action representation consisting of three components: what primitiv… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  35. arXiv:2407.05550  [pdf, other

    cs.HC cs.AI

    MEEG and AT-DGNN: Improving EEG Emotion Recognition with Music Introducing and Graph-based Learning

    Authors: Minghao Xiao, Zhengxi Zhu, Bin Jiang, Meixia Qu, Wenyu Wang

    Abstract: We present the MEEG dataset, a multi-modal collection of music-induced electroencephalogram (EEG) recordings designed to capture emotional responses to various musical stimuli across different valence and arousal levels. This public dataset facilitates an in-depth examination of brainwave patterns within musical contexts, providing a robust foundation for studying brain network topology during emo… ▽ More

    Submitted 14 August, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

  36. arXiv:2407.03900  [pdf, other

    cs.CV

    Oracle Bone Inscriptions Multi-modal Dataset

    Authors: Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

    Abstract: Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  37. arXiv:2407.03876  [pdf, other

    cs.CR cs.CL

    Automated Progressive Red Teaming

    Authors: Bojian Jiang, Yi Jing, Tianhao Shen, Tong Wu, Qing Yang, Deyi Xiong

    Abstract: Ensuring the safety of large language models (LLMs) is paramount, yet identifying potential vulnerabilities is challenging. While manual red teaming is effective, it is time-consuming, costly and lacks scalability. Automated red teaming (ART) offers a more cost-effective alternative, automatically generating adversarial prompts to expose LLM vulnerabilities. However, in current ART efforts, a robu… ▽ More

    Submitted 5 October, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  38. SUPER: Seated Upper Body Pose Estimation using mmWave Radars

    Authors: Bo Zhang, Zimeng Zhou, Boyu Jiang, Rong Zheng

    Abstract: In industrial countries, adults spend a considerable amount of time sedentary each day at work, driving and during activities of daily living. Characterizing the seated upper body human poses using mmWave radars is an important, yet under-studied topic with many applications in human-machine interaction, transportation and road safety. In this work, we devise SUPER, a framework for seated upper bo… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  39. arXiv:2406.18572  [pdf, other

    cs.CV cs.LG

    GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

    Authors: Ling Li, Yu Ye, Bingchuan Jiang, Wei Zeng

    Abstract: This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise… ▽ More

    Submitted 16 October, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  40. arXiv:2406.17992  [pdf, other

    cs.CL cs.AI

    Catching Chameleons: Detecting Evolving Disinformation Generated using Large Language Models

    Authors: Bohan Jiang, Chengshuai Zhao, Zhen Tan, Huan Liu

    Abstract: Despite recent advancements in detecting disinformation generated by large language models (LLMs), current efforts overlook the ever-evolving nature of this disinformation. In this work, we investigate a challenging yet practical research problem of detecting evolving LLM-generated disinformation. Disinformation evolves constantly through the rapid development of LLMs and their variants. As a cons… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures

  41. arXiv:2406.17518  [pdf, other

    cs.AI cs.SI

    Enhancing Explainability of Knowledge Learning Paths: Causal Knowledge Networks

    Authors: Yuang Wei, Yizhou Zhou, Yuan-Hao Jiang, Bo Jiang

    Abstract: A reliable knowledge structure is a prerequisite for building effective adaptive learning systems and intelligent tutoring systems. Pursuing an explainable and trustworthy knowledge structure, we propose a method for constructing causal knowledge networks. This approach leverages Bayesian networks as a foundation and incorporates causal relationship analysis to derive a causal network. Additionall… ▽ More

    Submitted 25 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures, Educational Data Mining 2024, Human-Centric eXplainable AI in Education

  42. arXiv:2406.17238  [pdf, other

    cs.LG cs.CV eess.IV

    Generative Expansion of Small Datasets: An Expansive Graph Approach

    Authors: Vahid Jebraeeli, Bo Jiang, Hamid Krim, Derya Cansever

    Abstract: Limited data availability in machine learning significantly impacts performance and generalization. Traditional augmentation methods enhance moderately sufficient datasets. GANs struggle with convergence when generating diverse samples. Diffusion models, while effective, have high computational costs. We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from… ▽ More

    Submitted 1 October, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures and 2 tables. Under review in ICASSP 2025

  43. arXiv:2406.14846  [pdf, other

    cs.LG

    Graph Edge Representation via Tensor Product Graph Convolutional Representation

    Authors: Bo Jiang, Sheng Ge, Ziyan Zhang, Beibei Wang, Jin Tang, Bin Luo

    Abstract: Graph Convolutional Networks (GCNs) have been widely studied. The core of GCNs is the definition of convolution operators on graphs. However, existing Graph Convolution (GC) operators are mainly defined on adjacency matrix and node features and generally focus on obtaining effective node embeddings which cannot be utilized to address the graphs with (high-dimensional) edge features. To address thi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  44. arXiv:2406.12896  [pdf, other

    cs.AI cs.CY cs.LG

    Leveraging Pedagogical Theories to Understand Student Learning Process with Graph-based Reasonable Knowledge Tracing

    Authors: Jiajun Cui, Hong Qian, Bo Jiang, Wei Zhang

    Abstract: Knowledge tracing (KT) is a crucial task in intelligent education, focusing on predicting students' performance on given questions to trace their evolving knowledge. The advancement of deep learning in this field has led to deep-learning knowledge tracing (DLKT) models that prioritize high predictive accuracy. However, many existing DLKT methods overlook the fundamental goal of tracking students'… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Preprint, accepted to appear in SIGKDD 2024, 12 pages. The source code is available at https://github.com/JJCui96/GRKT. Keywords: interpretable knowledge tracing, student behavior modeling, intelligence education

  45. arXiv:2406.12373  [pdf, other

    cs.CL cs.AI cs.LG

    WebCanvas: Benchmarking Web Agents in Online Environments

    Authors: Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, Zhengyang Wu

    Abstract: For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web… ▽ More

    Submitted 16 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Our platform, tool and dataset are publically available at https://www.imean.ai/web-canvas/ and https://huggingface.co/datasets/iMeanAI/Mind2Web-Live/

    MSC Class: 68T50 ACM Class: I.2.7

  46. arXiv:2406.11050  [pdf, other

    cs.CL cs.AI

    A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

    Authors: Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth

    Abstract: This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We go beyond evaluating LLMs on accuracy; rather, we aim to investigate their token bias in solving logical reasoning tasks. Specifically, we develop carefully controlled synthetic datasets, featuring conjunction fallacy and syll… ▽ More

    Submitted 4 October, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted at EMNLP 2024

  47. arXiv:2406.10498  [pdf, other

    cs.LG cs.SI

    A Unified Graph Selective Prompt Learning for Graph Neural Networks

    Authors: Bo Jiang, Hao Wu, Ziyan Zhang, Beibei Wang, Jin Tang

    Abstract: In recent years, graph prompt learning/tuning has garnered increasing attention in adapting pre-trained models for graph representation learning. As a kind of universal graph prompt learning method, Graph Prompt Feature (GPF) has achieved remarkable success in adapting pre-trained models for Graph Neural Networks (GNNs). By fixing the parameters of a pre-trained GNN model, the aim of GPF is to mod… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  48. arXiv:2406.07698  [pdf, other

    cs.LG

    Label Smoothing Improves Machine Unlearning

    Authors: Zonglin Di, Zhaowei Zhu, Jinghan Jia, Jiancheng Liu, Zafar Takhirov, Bo Jiang, Yuanshun Yao, Sijia Liu, Yang Liu

    Abstract: The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  49. arXiv:2406.07411  [pdf, other

    cs.SE cs.CL

    VersiCode: Towards Version-controllable Code Generation

    Authors: Tongtong Wu, Weigang Wu, Xingyu Wang, Kang Xu, Suyu Ma, Bo Jiang, Ping Yang, Zhenchang Xing, Yuan-Fang Li, Gholamreza Haffari

    Abstract: Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development, marked by frequent library updates. This gap significantly limits LLMs' deployment in realistic settings. In this paper, we propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware c… ▽ More

    Submitted 16 October, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  50. arXiv:2406.01414  [pdf, other

    cs.LG eess.SP

    CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework

    Authors: Yiyang Zhao, Yunzhuo Liu, Bo Jiang, Tian Guo

    Abstract: This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-le… ▽ More

    Submitted 17 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.04131