Skip to main content

Showing 1–50 of 82 results for author: Cai, P

  1. arXiv:2410.10352  [pdf, other

    eess.IV cs.CV

    Pubic Symphysis-Fetal Head Segmentation Network Using BiFormer Attention Mechanism and Multipath Dilated Convolution

    Authors: Pengzhou Cai, Lu Jiang, Yanxin Li, Xiaojuan Liu, Libin Lan

    Abstract: Pubic symphysis-fetal head segmentation in transperineal ultrasound images plays a critical role for the assessment of fetal head descent and progression. Existing transformer segmentation methods based on sparse attention mechanism use handcrafted static patterns, which leads to great differences in terms of segmentation performance on specific datasets. To address this issue, we introduce a dyna… ▽ More

    Submitted 14 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: MMM2025;Camera-ready Version;The code is available at https://github.com/Caipengzhou/BRAU-Net

  2. arXiv:2410.05646  [pdf, other

    cs.LG cs.AI cs.IT

    Score-Based Variational Inference for Inverse Problems

    Authors: Zhipeng Xue, Penghao Cai, Xiaojun Yuan, Xiqi Gao

    Abstract: Existing diffusion-based methods for inverse problems sample from the posterior using score functions and accept the generated random samples as solutions. In applications that posterior mean is preferred, we have to generate multiple samples from the posterior which is time-consuming. In this work, by analyzing the probability density evolution of the conditional reverse diffusion process, we pro… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: 10 pages, 7 figures, conference

  3. arXiv:2409.18411  [pdf, other

    cs.RO cs.AI

    BoT-Drive: Hierarchical Behavior and Trajectory Planning for Autonomous Driving using POMDPs

    Authors: Xuanjin Jin, Chendong Zeng, Shengfa Zhu, Chunxiao Liu, Panpan Cai

    Abstract: Uncertainties in dynamic road environments pose significant challenges for behavior and trajectory planning in autonomous driving. This paper introduces BoT-Drive, a planning algorithm that addresses uncertainties at both behavior and trajectory levels within a Partially Observable Markov Decision Process (POMDP) framework. BoT-Drive employs driver models to characterize unknown behavioral intenti… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  4. arXiv:2409.17656  [pdf, other

    cs.SD cs.AI eess.AS

    Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection

    Authors: Pengfei Cai, Yan Song, Nan Jiang, Qing Gu, Ian McLoughlin

    Abstract: A significant challenge in sound event detection (SED) is the effective utilization of unlabeled data, given the limited availability of labeled data due to high annotation costs. Semi-supervised algorithms rely on labeled data to learn from unlabeled data, and the performance is constrained by the quality and size of the former. In this paper, we introduce the Prototype based Masked Audio Model~(… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP2025; The code for this paper will be available at https://github.com/cai525/Transformer4SED after the paper is accepted

  5. arXiv:2409.11752  [pdf, other

    eess.IV cs.CV

    Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation using Rein to Fine-tune Vision Foundation Models

    Authors: Pengzhou Cai, Xueyuan Zhang, Libin Lan, Ze Zhao

    Abstract: In recent years, significant progress has been made in tumor segmentation within the field of digital pathology. However, variations in organs, tissue preparation methods, and image acquisition processes can lead to domain discrepancies among digital pathology images. To address this problem, in this paper, we use Rein, a fine-tuning method, to parametrically and efficiently fine-tune various visi… ▽ More

    Submitted 29 September, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

  6. arXiv:2409.11694  [pdf, other

    cs.RO

    From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving

    Authors: Xu Han, Xianda Chen, Zhenghan Cai, Pinlong Cai, Meixin Zhu, Xiaowen Chu

    Abstract: Autonomous driving technology has witnessed rapid advancements, with foundation models improving interactivity and user experiences. However, current autonomous vehicles (AVs) face significant limitations in delivering command-based driving styles. Most existing methods either rely on predefined driving styles that require expert input or use data-driven techniques like Inverse Reinforcement Learn… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: 6 pages, 7 figures

  7. arXiv:2409.10980  [pdf

    eess.IV cs.CV

    PSFHS Challenge Report: Pubic Symphysis and Fetal Head Segmentation from Intrapartum Ultrasound Images

    Authors: Jieyun Bai, Zihao Zhou, Zhanhong Ou, Gregor Koehler, Raphael Stock, Klaus Maier-Hein, Marawan Elbatel, Robert Martí, Xiaomeng Li, Yaoyang Qiu, Panjie Gou, Gongping Chen, Lei Zhao, Jianxun Zhang, Yu Dai, Fangyijie Wang, Guénolé Silvestre, Kathleen Curran, Hongkun Sun, Jing Xu, Pengzhou Cai, Lu Jiang, Libin Lan, Dong Ni, Mei Zhong , et al. (4 additional authors not shown)

    Abstract: Segmentation of the fetal and maternal structures, particularly intrapartum ultrasound imaging as advocated by the International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) for monitoring labor progression, is a crucial first step for quantitative diagnosis and clinical decision-making. This requires specialized analysis by obstetrics professionals, in a task that i) is highly time-… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

  8. arXiv:2409.01695  [pdf, other

    cs.SD cs.AI eess.AS

    USTC-KXDIGIT System Description for ASVspoof5 Challenge

    Authors: Yihao Chen, Haochen Wu, Nan Jiang, Xiang Xia, Qing Gu, Yunqi Hao, Pengfei Cai, Yu Guan, Jialong Wang, Weilin Xie, Lei Fang, Sian Fang, Yan Song, Wu Guo, Lin Liu, Minqiang Xu

    Abstract: This paper describes the USTC-KXDIGIT system submitted to the ASVspoof5 Challenge for Track 1 (speech deepfake detection) and Track 2 (spoofing-robust automatic speaker verification, SASV). Track 1 showcases a diverse range of technical qualities from potential processing algorithms and includes both open and closed conditions. For these conditions, our system consists of a cascade of a frontend f… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: ASVspoof5 workshop paper

  9. arXiv:2409.00353  [pdf, other

    cs.CV

    RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

    Authors: Kunming Su, Qiuxia Wu, Panpan Cai, Xiaogang Zhu, Xuequan Lu, Zhiyong Wang, Kun Hu

    Abstract: Masked point modeling methods have recently achieved great success in self-supervised learning for point cloud data. However, these methods are sensitive to rotations and often exhibit sharp performance drops when encountering rotational variations. In this paper, we propose a novel Rotation-Invariant Masked AutoEncoders (RI-MAE) to address two major challenges: 1) achieving rotation-invariant lat… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  10. arXiv:2408.13498  [pdf, other

    cs.LG

    Rethinking State Disentanglement in Causal Reinforcement Learning

    Authors: Haiyao Cao, Zhen Zhang, Panpan Cai, Yuhang Liu, Jinan Zou, Ehsan Abbasnejad, Biwei Huang, Mingming Gong, Anton van den Hengel, Javen Qinfeng Shi

    Abstract: One of the significant challenges in reinforcement learning (RL) when dealing with noise is estimating latent states from observations. Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability. Consequently, some existing work focuses on establishing identifiability from a causal perspective to aid in the design of al… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  11. arXiv:2408.09675  [pdf, other

    cs.AI cs.MA cs.RO

    Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

    Authors: Ruiqi Zhang, Jing Hou, Florian Walter, Shangding Gu, Jiayi Guan, Florian Röhrbein, Yali Du, Panpan Cai, Guang Chen, Alois Knoll

    Abstract: Reinforcement Learning (RL) is a potent tool for sequential decision-making and has achieved performance surpassing human capabilities across many challenging real-world tasks. As the extension of RL in the multi-agent system domain, multi-agent RL (MARL) not only need to learn the control policy but also requires consideration regarding interactions with all other agents in the environment, mutua… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 23 pages, 6 figures and 2 tables. Submitted to IEEE Journal

  12. arXiv:2408.08673  [pdf, other

    cs.SD cs.AI eess.AS

    MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection

    Authors: Pengfei Cai, Yan Song, Kang Li, Haoyu Song, Ian McLoughlin

    Abstract: Sound event detection (SED) methods that leverage a large pre-trained Transformer encoder network have shown promising performance in recent DCASE challenges. However, they still rely on an RNN-based context network to model temporal dependencies, largely due to the scarcity of labeled data. In this work, we propose a pure Transformer-based SED model with masked-reconstruction based pre-training,… ▽ More

    Submitted 19 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: Received by interspeech 2024

  13. arXiv:2408.00415  [pdf, other

    cs.RO cs.AI cs.CV

    DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

    Authors: Xuemeng Yang, Licheng Wen, Yukai Ma, Jianbiao Mei, Xin Li, Tiantian Wei, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

    Abstract: This paper presented DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios. DriveArena features a flexible, modular architecture, allowing for the seamless interchange of its core components: Traffic Manager, a traffic simulator capable of generating realistic traffic flow on any worldwide street map, and World Dreamer, a high-fi… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 19 pages, 9 figures

  14. arXiv:2407.21256  [pdf, other

    cs.CV

    Leveraging Adaptive Implicit Representation Mapping for Ultra High-Resolution Image Segmentation

    Authors: Ziyu Zhao, Xiaoguang Li, Pingping Cai, Canyu Zhang, Song Wang

    Abstract: Implicit representation mapping (IRM) can translate image features to any continuous resolution, showcasing its potent capability for ultra-high-resolution image segmentation refinement. Current IRM-based methods for refining ultra-high-resolution image segmentation often rely on CNN-based encoders to extract image features and apply a Shared Implicit Representation Mapping Function (SIRMF) to con… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  15. arXiv:2407.18656  [pdf, other

    cs.CV

    Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner

    Authors: Pengxiang Cai, Zhiwei Liu, Guibo Zhu, Yunfang Niu, Jinqiao Wang

    Abstract: Pixel-level fine-grained image editing remains an open challenge. Previous works fail to achieve an ideal trade-off between control granularity and inference speed. They either fail to achieve pixel-level fine-grained control, or their inference speed requires optimization. To address this, this paper for the first time employs a regression-based network to learn the variation patterns of StyleGAN… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted as a poster paper for ACM Multimedia 2024

  16. arXiv:2407.14239  [pdf, other

    cs.AI

    KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

    Authors: Kemou Jiang, Xuan Cai, Zhiyong Cui, Aoyong Li, Yilong Ren, Haiyang Yu, Hao Yang, Daocheng Fu, Licheng Wen, Pinlong Cai

    Abstract: Large language models (LLMs) as autonomous agents offer a novel avenue for tackling real-world challenges through a knowledge-driven manner. These LLM-enhanced methodologies excel in generalization and interpretability. However, the complexity of driving tasks often necessitates the collaboration of multiple, heterogeneous agents, underscoring the need for such LLM-driven agents to engage in coope… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 13 pages, 18 figures

  17. arXiv:2406.16026  [pdf

    physics.med-ph cs.LG eess.IV

    CEST-KAN: Kolmogorov-Arnold Networks for CEST MRI Data Analysis

    Authors: Jiawen Wang, Pei Cai, Ziyan Wang, Huabin Zhang, Jianpan Huang

    Abstract: Purpose: This study aims to propose and investigate the feasibility of using Kolmogorov-Arnold Network (KAN) for CEST MRI data analysis (CEST-KAN). Methods: CEST MRI data were acquired from twelve healthy volunteers at 3T. Data from ten subjects were used for training, while the remaining two were reserved for testing. The performance of multi-layer perceptron (MLP) and KAN models with the same ne… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  18. arXiv:2406.11633  [pdf, other

    cs.CV

    DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

    Authors: Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao

    Abstract: Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extract… ▽ More

    Submitted 11 September, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Homepage of DocGenome: https://unimodal4reasoning.github.io/DocGenome_page 22 pages, 11 figures

  19. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 12 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  20. arXiv:2405.15324  [pdf, other

    cs.RO cs.AI cs.CV

    Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

    Authors: Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

    Abstract: Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitiv… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 23 pages, 16 figures

  21. arXiv:2405.11317  [pdf, other

    cs.RO

    Neural Randomized Planning for Whole Body Robot Motion

    Authors: Yunfan Lu, Yuchen Ma, David Hsu, Panpan Cai

    Abstract: Robot motion planning has made vast advances over the past decades, but the challenge remains: robot mobile manipulators struggle to plan long-range whole-body motion in common household environments in real time, because of high-dimensional robot configuration space and complex environment geometry. To tackle the challenge, this paper proposes Neural Randomized Planner (NRP), which combines a glo… ▽ More

    Submitted 12 August, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  22. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  23. arXiv:2404.01359  [pdf

    quant-ph cs.AI cs.NE

    Parallel Proportional Fusion of Spiking Quantum Neural Network for Optimizing Image Classification

    Authors: Zuyu Xu, Kang Shen, Pengnian Cai, Tao Yang, Yuanming Hu, Shixian Chen, Yunlai Zhu, Zuheng Wu, Yuehua Dai, Jun Wang, Fei Yang

    Abstract: The recent emergence of the hybrid quantum-classical neural network (HQCNN) architecture has garnered considerable attention due to the potential advantages associated with integrating quantum principles to enhance various facets of machine learning algorithms and computations. However, the current investigated serial structure of HQCNN, wherein information sequentially passes from one network to… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  24. arXiv:2403.10101  [pdf, other

    cs.RO

    Agile and Safe Trajectory Planning for Quadruped Navigation with Motion Anisotropy Awareness

    Authors: Wentao Zhang, Shaohang Xu, Peiyuan Cai, Lijun Zhu

    Abstract: Quadruped robots demonstrate robust and agile movements in various terrains; however, their navigation autonomy is still insufficient. One of the challenges is that the motion capabilities of the quadruped robot are anisotropic along different directions, which significantly affects the safety of quadruped robot navigation. This paper proposes a navigation framework that takes into account the mot… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  25. arXiv:2402.03830  [pdf, other

    cs.CV

    OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving

    Authors: Guohang Yan, Jiahao Pi, Jianfei Guo, Zhaotong Luo, Min Dou, Nianchen Deng, Qiusheng Huang, Daocheng Fu, Licheng Wen, Pinlong Cai, Xing Gao, Xinyu Cai, Bo Zhang, Xuemeng Yang, Yeqi Bai, Hongbin Zhou, Botian Shi

    Abstract: With deep learning and computer vision technology development, autonomous driving provides new solutions to improve traffic safety and efficiency. The importance of building high-quality datasets is self-evident, especially with the rise of end-to-end autonomous driving algorithms in recent years. Data plays a core role in the algorithm closed-loop system. However, collecting real-world data is ex… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 10 pages, 9 figures

  26. arXiv:2402.03047  [pdf, other

    cs.CV cs.LG

    PFDM: Parser-Free Virtual Try-on via Diffusion Model

    Authors: Yunfang Niu, Dong Yi, Lingxiang Wu, Zhiwei Liu, Pengxiang Cai, Jinqiao Wang

    Abstract: Virtual try-on can significantly improve the garment shopping experiences in both online and in-store scenarios, attracting broad interest in computer vision. However, to achieve high-fidelity try-on performance, most state-of-the-art methods still rely on accurate segmentation masks, which are often produced by near-perfect parsers or manual labeling. To overcome the bottleneck, we propose a pars… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted by IEEE ICASSP 2024

  27. arXiv:2402.01246  [pdf, other

    cs.RO eess.SY

    LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving

    Authors: Daocheng Fu, Wenjie Lei, Licheng Wen, Pinlong Cai, Song Mao, Min Dou, Botian Shi, Yu Qiao

    Abstract: The emergence of Multimodal Large Language Models ((M)LLMs) has ushered in new avenues in artificial intelligence, particularly for autonomous driving by offering enhanced understanding and reasoning capabilities. This paper introduces LimSim++, an extended version of LimSim designed for the application of (M)LLMs in autonomous driving. Acknowledging the limitations of existing simulation platform… ▽ More

    Submitted 12 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted by 35th IEEE Intelligent Vehicles Symposium (IV 2024)

  28. arXiv:2401.00722  [pdf, other

    cs.CV

    BRAU-Net++: U-Shaped Hybrid CNN-Transformer Network for Medical Image Segmentation

    Authors: Libin Lan, Pengzhou Cai, Lu Jiang, Xiaojuan Liu, Yongmei Li, Yudong Zhang

    Abstract: Accurate medical image segmentation is essential for clinical quantification, disease diagnosis, treatment planning and many other applications. Both convolution-based and transformer-based u-shaped architectures have made significant success in various medical image segmentation tasks. The former can efficiently learn local information of images while requiring much more image-specific inductive… ▽ More

    Submitted 30 September, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

    Comments: 13 pages, 7 figures, 9 tables. This work has been submitted to the IEEE TETCI for possible publication. Code: https://github.com/Caipengzhou/BRAU-Netplusplus

  29. arXiv:2312.13156  [pdf, other

    cs.CE cs.AI

    AccidentGPT: Accident Analysis and Prevention from V2X Environmental Perception with Multi-modal Large Model

    Authors: Lening Wang, Yilong Ren, Han Jiang, Pinlong Cai, Daocheng Fu, Tianqi Wang, Zhiyong Cui, Haiyang Yu, Xuesong Wang, Hanchu Zhou, Helai Huang, Yinhai Wang

    Abstract: Traffic accidents, being a significant contributor to both human casualties and property damage, have long been a focal point of research for many scholars in the field of traffic safety. However, previous studies, whether focusing on static environmental assessments or dynamic driving analyses, as well as pre-accident predictions or post-accident rule analyses, have typically been conducted in is… ▽ More

    Submitted 28 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 21 pages, 19 figures

  30. arXiv:2312.08177  [pdf

    cs.CV

    Advanced Image Segmentation Techniques for Neural Activity Detection via C-fos Immediate Early Gene Expression

    Authors: Peilin Cai

    Abstract: This paper investigates the application of advanced image segmentation techniques to analyze C-fos immediate early gene expression, a crucial marker for neural activity. Due to the complexity and high variability of neural circuits, accurate segmentation of C-fos images is paramount for the development of new insights into neural function. Amidst this backdrop, this research aims to improve accura… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  31. arXiv:2312.04316  [pdf, other

    cs.RO cs.AI cs.CV

    Towards Knowledge-driven Autonomous Driving

    Authors: Xin Li, Yeqi Bai, Pinlong Cai, Licheng Wen, Daocheng Fu, Bo Zhang, Xuemeng Yang, Xinyu Cai, Tao Ma, Jianfei Guo, Xing Gao, Min Dou, Yikang Li, Botian Shi, Yong Liu, Liang He, Yu Qiao

    Abstract: This paper explores the emerging knowledge-driven autonomous driving technologies. Our investigation highlights the limitations of current autonomous driving systems, in particular their sensitivity to data bias, difficulty in handling long-tail scenarios, and lack of interpretability. Conversely, knowledge-driven methods with the abilities of cognition, generalization and life-long learning emerg… ▽ More

    Submitted 27 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

  32. arXiv:2312.03408  [pdf, other

    cs.CV

    Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

    Authors: Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao

    Abstract: With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem. Current autonomous driving datasets can broadly be categorized into two generations. The first-generation autonomous driving datasets are characterized by relatively sim… ▽ More

    Submitted 22 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: This article is a simplified English translation of corresponding Chinese article. Please refer to Chinese version for the complete content

  33. arXiv:2312.02519  [pdf, other

    cs.AI cs.LG

    Creative Agents: Empowering Agents with Imagination for Creative Tasks

    Authors: Chi Zhang, Penglin Cai, Yuhui Fu, Haoqi Yuan, Zongqing Lu

    Abstract: We study building embodied agents for open-ended creative tasks. While existing methods build instruction-following agents that can perform diverse open-ended tasks, none of them demonstrates creativity -- the ability to give novel and diverse task solutions implicit in the language instructions. This limitation comes from their inability to convert abstract language instructions into concrete tas… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: The first two authors contribute equally

  34. arXiv:2311.05332  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

    Authors: Licheng Wen, Xuemeng Yang, Daocheng Fu, Xiaofeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao

    Abstract: The pursuit of autonomous driving technology hinges on the sophisticated integration of perception, decision-making, and control systems. Traditional approaches, both data-driven and rule-based, have been hindered by their inability to grasp the nuance of complex driving environments and the intentions of other road users. This has been a significant bottleneck, particularly in the development of… ▽ More

    Submitted 28 November, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

  35. arXiv:2310.00289  [pdf, other

    eess.IV cs.CV

    Pubic Symphysis-Fetal Head Segmentation Using Pure Transformer with Bi-level Routing Attention

    Authors: Pengzhou Cai, Jiang Lu, Yanxin Li, Libin Lan

    Abstract: In this paper, we propose a method, named BRAU-Net, to solve the pubic symphysis-fetal head segmentation task. The method adopts a U-Net-like pure Transformer architecture with bi-level routing attention and skip connections, which effectively learns local-global semantic information. The proposed BRAU-Net was evaluated on transperineal Ultrasound images dataset from the pubic symphysis-fetal head… ▽ More

    Submitted 7 October, 2023; v1 submitted 30 September, 2023; originally announced October 2023.

  36. arXiv:2309.16292  [pdf, other

    cs.RO cs.CL

    DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

    Authors: Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qiao

    Abstract: Recent advancements in autonomous driving have relied on data-driven approaches, which are widely adopted but face challenges including dataset bias, overfitting, and uninterpretability. Drawing inspiration from the knowledge-driven nature of human driving, we explore the question of how to instill similar capabilities into autonomous driving systems and summarize a paradigm that integrates an int… ▽ More

    Submitted 21 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Published as a conference paper at ICLR 2024

  37. arXiv:2309.06719  [pdf, other

    cs.AI cs.HC

    TrafficGPT: Viewing, Processing and Interacting with Traffic Foundation Models

    Authors: Siyao Zhang, Daocheng Fu, Zhao Zhang, Bin Yu, Pinlong Cai

    Abstract: With the promotion of chatgpt to the public, Large language models indeed showcase remarkable common sense, reasoning, and planning skills, frequently providing insightful guidance. These capabilities hold significant promise for their application in urban traffic management and control. However, LLMs struggle with addressing traffic issues, especially processing numerical data and interacting wit… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  38. arXiv:2308.16008  [pdf, other

    cs.RO cs.AI cs.LG

    EnsembleFollower: A Hybrid Car-Following Framework Based On Reinforcement Learning and Hierarchical Planning

    Authors: Xu Han, Xianda Chen, Meixin Zhu, Pinlong Cai, Jianshan Zhou, Xiaowen Chu

    Abstract: Car-following models have made significant contributions to our understanding of longitudinal driving behavior. However, they often exhibit limited accuracy and flexibility, as they cannot fully capture the complexity inherent in car-following processes, or may falter in unseen scenarios due to their reliance on confined driving skills present in training data. It is worth noting that each car-fol… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 12 pages, 10 figures

  39. arXiv:2308.12797  [pdf, other

    cs.RO cs.MA eess.SY

    TrafficMCTS: A Closed-Loop Traffic Flow Generation Framework with Group-Based Monte Carlo Tree Search

    Authors: Licheng Wen, Ze Fu, Pinlong Cai, Daocheng Fu, Song Mao, Botian Shi

    Abstract: Digital twins for intelligent transportation systems are currently attracting great interests, in which generating realistic, diverse, and human-like traffic flow in simulations is a formidable challenge. Current approaches often hinge on predefined driver models, objective optimization, or reliance on pre-recorded driving datasets, imposing limitations on their scalability, versatility, and adapt… ▽ More

    Submitted 31 August, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  40. arXiv:2308.03253  [pdf, other

    cs.CL cs.AI

    PaniniQA: Enhancing Patient Education Through Interactive Question Answering

    Authors: Pengshan Cai, Zonghai Yao, Fei Liu, Dakuo Wang, Meghan Reilly, Huixue Zhou, Lingxi Li, Yi Cao, Alok Kapoor, Adarsha Bajracharya, Dan Berlowitz, Hong Yu

    Abstract: Patient portal allows discharged patients to access their personalized discharge instructions in electronic health records (EHRs). However, many patients have difficulty understanding or memorizing their discharge instructions. In this paper, we present PaniniQA, a patient-centric interactive question answering system designed to help patients understand their discharge instructions. PaniniQA firs… ▽ More

    Submitted 20 August, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

    Comments: Accepted to TACL 2023. Equal contribution for the first two authors. This arXiv version is a pre-MIT Press publication version

  41. arXiv:2307.07162  [pdf, other

    cs.RO cs.CL

    Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

    Authors: Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, Yu Qiao

    Abstract: In this paper, we explore the potential of using a large language model (LLM) to understand the driving environment in a human-like manner and analyze its ability to reason, interpret, and memorize when facing complex scenarios. We argue that traditional optimization-based and modular autonomous driving (AD) systems face inherent performance limitations when dealing with long-tail corner cases. To… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

  42. arXiv:2307.06648  [pdf, other

    eess.SY cs.RO

    LimSim: A Long-term Interactive Multi-scenario Traffic Simulator

    Authors: Licheng Wen, Daocheng Fu, Song Mao, Pinlong Cai, Min Dou, Yikang Li, Yu Qiao

    Abstract: With the growing popularity of digital twin and autonomous driving in transportation, the demand for simulation systems capable of generating high-fidelity and reliable scenarios is increasing. Existing simulation systems suffer from a lack of support for different types of scenarios, and the vehicle models used in these systems are too simplistic. Thus, such systems fail to represent driving styl… ▽ More

    Submitted 26 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Accepted by 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

  43. arXiv:2306.17456  [pdf, other

    cs.RO cs.HC

    Human-like Decision-making at Unsignalized Intersection using Social Value Orientation

    Authors: Yan Tong, Licheng Wen, Pinlong Cai, Daocheng Fu, Song Mao, Yikang Li

    Abstract: With the commercial application of automated vehicles (AVs), the sharing of roads between AVs and human-driven vehicles (HVs) becomes a common occurrence in the future. While research has focused on improving the safety and reliability of autonomous driving, it's also crucial to consider collaboration between AVs and HVs. Human-like interaction is a required capability for AVs, especially at commo… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  44. arXiv:2306.15136  [pdf, other

    cs.RO cs.AI

    What Truly Matters in Trajectory Prediction for Autonomous Driving?

    Authors: Phong Tran, Haoran Wu, Cunjun Yu, Panpan Cai, Sifa Zheng, David Hsu

    Abstract: Trajectory prediction plays a vital role in the performance of autonomous driving systems, and prediction accuracy, such as average displacement error (ADE) or final displacement error (FDE), is widely used as a performance metric. However, a significant disparity exists between the accuracy of predictors on fixed datasets and driving performance when the predictors are used downstream for vehicle… ▽ More

    Submitted 6 November, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  45. arXiv:2305.10640  [pdf, other

    cs.CV

    Learning Restoration is Not Enough: Transfering Identical Mapping for Single-Image Shadow Removal

    Authors: Xiaoguang Li, Qing Guo, Pingping Cai, Wei Feng, Ivor Tsang, Song Wang

    Abstract: Shadow removal is to restore shadow regions to their shadow-free counterparts while leaving non-shadow regions unchanged. State-of-the-art shadow removal methods train deep neural networks on collected shadow & shadow-free image pairs, which are desired to complete two distinct tasks via shared weights, i.e., data restoration for shadow regions and identical mapping for non-shadow regions. We find… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  46. arXiv:2305.03308  [pdf

    eess.SP cs.LG

    Tiny-PPG: A Lightweight Deep Neural Network for Real-Time Detection of Motion Artifacts in Photoplethysmogram Signals on Edge Devices

    Authors: Yali Zheng, Chen Wu, Peizheng Cai, Zhiqiang Zhong, Hongda Huang, Yuqi Jiang

    Abstract: Photoplethysmogram (PPG) signals are easily contaminated by motion artifacts in real-world settings, despite their widespread use in Internet-of-Things (IoT) based wearable and smart health devices for cardiovascular health monitoring. This study proposed a lightweight deep neural network, called Tiny-PPG, for accurate and real-time PPG artifact segmentation on IoT edge devices. The model was trai… ▽ More

    Submitted 10 October, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

  47. arXiv:2303.16563  [pdf, other

    cs.LG cs.AI

    Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks

    Authors: Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu

    Abstract: We study building multi-task agents in open-world environments. Without human demonstrations, learning to accomplish long-horizon tasks in a large open-world environment with reinforcement learning (RL) is extremely inefficient. To tackle this challenge, we convert the multi-task learning problem into learning basic skills and planning over the skills. Using the popular open-world game Minecraft a… ▽ More

    Submitted 4 December, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: 24 pages, presented in Foundation Models for Decision Making Workshop at NeurIPS 2023

  48. Parametric Surface Constrained Upsampler Network for Point Cloud

    Authors: Pingping Cai, Zhenyao Wu, Xinyi Wu, Song Wang

    Abstract: Designing a point cloud upsampler, which aims to generate a clean and dense point cloud given a sparse point representation, is a fundamental and challenging problem in computer vision. A line of attempts achieves this goal by establishing a point-to-point mapping function via deep neural networks. However, these approaches are prone to produce outlier points due to the lack of explicit surface-le… ▽ More

    Submitted 3 December, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: Update Supplementary Files

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. 37, 1 (Jun. 2023), 250-258

  49. arXiv:2303.06768  [pdf, other

    cs.AI cs.RO

    The Planner Optimization Problem: Formulations and Frameworks

    Authors: Yiyuan Lee, Katie Lee, Panpan Cai, David Hsu, Lydia E. Kavraki

    Abstract: Identifying internal parameters for planning is crucial to maximizing the performance of a planner. However, automatically tuning internal parameters which are conditioned on the problem instance is especially challenging. A recent line of work focuses on learning planning parameter generators, but lack a consistent problem definition and software framework. This work proposes the unified planner… ▽ More

    Submitted 14 March, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: 4 pages (+2 pages references, +6 pages appendix)

  50. arXiv:2302.06803  [pdf, other

    cs.RO cs.MA

    Bringing Diversity to Autonomous Vehicles: An Interpretable Multi-vehicle Decision-making and Planning Framework

    Authors: Licheng Wen, Pinlong Cai, Daocheng Fu, Song Mao, Yikang Li

    Abstract: With the development of autonomous driving, it is becoming increasingly common for autonomous vehicles (AVs) and human-driven vehicles (HVs) to travel on the same roads. Existing single-vehicle planning algorithms on board struggle to handle sophisticated social interactions in the real world. Decisions made by these methods are difficult to understand for humans, raising the risk of crashes and m… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    ACM Class: I.2.9