Skip to main content

Showing 1–24 of 24 results for author: Chu, R

  1. arXiv:2410.09741  [pdf, other

    cs.LG stat.ML

    Real-time Fuel Leakage Detection via Online Change Point Detection

    Authors: Ruimin Chu, Li Chik, Yiliao Song, Jeffrey Chan, Xiaodong Li

    Abstract: Early detection of fuel leakage at service stations with underground petroleum storage systems is a crucial task to prevent catastrophic hazards. Current data-driven fuel leakage detection methods employ offline statistical inventory reconciliation, leading to significant detection delays. Consequently, this can result in substantial financial loss and environmental impact on the surrounding commu… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  2. arXiv:2407.13803  [pdf, other

    cs.CR cs.AI cs.CL

    Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality

    Authors: Duy C. Hoang, Hung T. Q. Le, Rui Chu, Ping Li, Weijie Zhao, Yingjie Lao, Khoa D. Doan

    Abstract: With the widespread adoption of Large Language Models (LLMs), concerns about potential misuse have emerged. To this end, watermarking has been adapted to LLM, enabling a simple and effective way to detect and monitor generated text. However, while the existing methods can differentiate between watermarked and unwatermarked text with high accuracy, they often face a trade-off between the quality of… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2405.16200  [pdf, other

    cs.CV

    FlightPatchNet: Multi-Scale Patch Network with Differential Coding for Flight Trajectory Prediction

    Authors: Lan Wu, Xuebin Wang, Ruijuan Chu, Guangyi Liu, Yingchun Chen, Jing Zhang, Linyu Wang

    Abstract: Accurate multi-step flight trajectory prediction plays an important role in Air Traffic Control, which can ensure the safety of air transportation. Two main issues limit the flight trajectory prediction performance of existing works. The first issue is the negative impact on prediction accuracy caused by the significant differences in data range. The second issue is that real-world flight trajecto… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  4. arXiv:2403.18814  [pdf, other

    cs.CV cs.AI cs.CL

    Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

    Authors: Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

    Abstract: In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to advanced models like GPT-4 and Gemini. We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from thr… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Code and models are available at https://github.com/dvlab-research/MiniGemini

  5. arXiv:2403.16996  [pdf, other

    cs.CV cs.RO

    DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving

    Authors: Tianqi Wang, Enze Xie, Ruihang Chu, Zhenguo Li, Ping Luo

    Abstract: End-to-end driving has made significant progress in recent years, demonstrating benefits such as system simplicity and competitive driving performance under both open-loop and closed-loop settings. Nevertheless, the lack of interpretability and controllability in its driving decisions hinders real-world deployment for end-to-end driving systems. In this paper, we collect a comprehensive end-to-end… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  6. arXiv:2403.08857  [pdf, other

    cs.CV

    DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

    Authors: Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, Wei Liu

    Abstract: Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the need for specialized prompt engineering knowledge and the inability to perform multi-turn image generation, hindering a dynamic and iterative creation process. Recent attempts have tried to equip Multi-modal Large Language M… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Project page: https://hunyuan-dialoggen.github.io/

  7. arXiv:2312.11562  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    A Survey of Reasoning with Foundation Models

    Authors: Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng , et al. (9 additional authors not shown)

    Abstract: Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring… ▽ More

    Submitted 25 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: 20 Figures, 160 Pages, 750+ References, Project Page https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models

  8. arXiv:2309.01692  [pdf, other

    cs.CV

    Mask-Attention-Free Transformer for 3D Instance Segmentation

    Authors: Xin Lai, Yuhui Yuan, Ruihang Chu, Yukang Chen, Han Hu, Jiaya Jia

    Abstract: Recently, transformer-based methods have dominated 3D instance segmentation, where mask attention is commonly involved. Specifically, object queries are guided by the initial instance masks in the first cross-attention, and then iteratively refine themselves in a similar manner. However, we observe that the mask-attention pipeline usually leads to slow convergence due to low-recall initial instanc… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023. Code and models are available at https://github.com/dvlab-research/Mask-Attention-Free-Transformer

  9. arXiv:2307.01831  [pdf, other

    cs.CV cs.AI cs.LG

    DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

    Authors: Shentong Mo, Enze Xie, Ruihang Chu, Lewei Yao, Lanqing Hong, Matthias Nießner, Zhenguo Li

    Abstract: Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerful effectiveness in generating high-quality 2D images. However, it is still being determined whether the Transformer architecture performs equally well in 3D shape generation, as previous 3D diffusion methods mostly adopted the U-Net architecture. To bridge this gap, we propose a novel Diffusion Transformer for 3D shape genera… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: Project Page: https://dit-3d.github.io/

  10. arXiv:2306.16329  [pdf, other

    cs.CV

    DiffComplete: Diffusion-based Generative 3D Shape Completion

    Authors: Ruihang Chu, Enze Xie, Shentong Mo, Zhenguo Li, Matthias Nießner, Chi-Wing Fu, Jiaya Jia

    Abstract: We introduce a new diffusion-based approach for shape completion on 3D range scans. Compared with prior deterministic and probabilistic methods, we strike a balance between realism, multi-modality, and high fidelity. We propose DiffComplete by casting shape completion as a generative task conditioned on the incomplete shape. Our key designs are two-fold. First, we devise a hierarchical feature agg… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Project Page: https://ruihangchu.com/diffcomplete.html

  11. arXiv:2303.16485  [pdf, other

    cs.CV

    TriVol: Point Cloud Rendering via Triple Volumes

    Authors: Tao Hu, Xiaogang Xu, Ruihang Chu, Jiaya Jia

    Abstract: Existing learning-based methods for point cloud rendering adopt various 3D representations and feature querying mechanisms to alleviate the sparsity problem of point clouds. However, artifacts still appear in rendered images, due to the challenges in extracting continuous and discriminative 3D features from point clouds. In this paper, we present a dense while lightweight 3D representation, named… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  12. arXiv:2211.00509  [pdf, other

    cs.CV

    Self-Supervised Intensity-Event Stereo Matching

    Authors: Jinjin Gu, Jinan Zhou, Ringo Sai Wo Chu, Yan Chen, Jiawei Zhang, Xuanye Cheng, Song Zhang, Jimmy S. Ren

    Abstract: Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes in microsecond accuracy with a high dynamic range and low power consumption. Despite these advantages, event cameras cannot be directly applied to computational imaging tasks due to the inability to obtain high-quality intensity and events simultaneously. This paper aims to connect a standalone event came… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: This paper has been accepted by the Journal of Imaging Science & Technology

  13. arXiv:2208.11630  [pdf, other

    physics.comp-ph astro-ph.IM cs.MS

    Flash-X, a multiphysics simulation software instrument

    Authors: Anshu Dubey, Klaus Weide, Jared O'Neal, Akash Dhruv, Sean Couch, J. Austin Harris, Tom Klosterman, Rajeev Jain, Johann Rudi, Bronson Messer, Michael Pajkos, Jared Carlson, Ran Chu, Mohamed Wahib, Saurabh Chawdhary, Paul M. Ricker, Dongwook Lee, Katie Antypas, Katherine M. Riley, Christopher Daley, Murali Ganapathy, Francis X. Timmes, Dean M. Townsley, Marcos Vanella, John Bachan , et al. (6 additional authors not shown)

    Abstract: Flash-X is a highly composable multiphysics software system that can be used to simulate physical phenomena in several scientific domains. It derives some of its solvers from FLASH, which was first released in 2000. Flash-X has a new framework that relies on abstractions and asynchronous communications for performance portability across a range of increasingly heterogeneous hardware platforms. Fla… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: 16 pages, 5 Figures, published open access in SoftwareX

    Journal ref: SoftwareX, Volume 19, 2022, 101168,ISSN 2352-7110

  14. arXiv:2108.11771  [pdf, other

    cs.CV cs.RO

    ICM-3D: Instantiated Category Modeling for 3D Instance Segmentation

    Authors: Ruihang Chu, Yukang Chen, Tao Kong, Lu Qi, Lei Li

    Abstract: Separating 3D point clouds into individual instances is an important task for 3D vision. It is challenging due to the unknown and varying number of instances in a scene. Existing deep learning based works focus on a two-step pipeline: first learn a feature embedding and then cluster the points. Such a two-step pipeline leads to disconnected intermediate objectives. In this paper, we propose an int… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: IEEE Robotics and Automation Letters (RA-L). Preprint Version. Accepted August, 2021

  15. arXiv:2108.02425  [pdf, other

    cs.RO cs.CV

    Simultaneous Semantic and Collision Learning for 6-DoF Grasp Pose Estimation

    Authors: Yiming Li, Tao Kong, Ruihang Chu, Yifeng Li, Peng Wang, Lei Li

    Abstract: Grasping in cluttered scenes has always been a great challenge for robots, due to the requirement of the ability to well understand the scene and object information. Previous works usually assume that the geometry information of the objects is available, or utilize a step-wise, multi-stage strategy to predict the feasible 6-DoF grasp poses. In this work, we propose to formalize the 6-DoF grasp pos… ▽ More

    Submitted 26 September, 2021; v1 submitted 5 August, 2021; originally announced August 2021.

    Comments: International Conference on Intelligent Robots and Systems (IROS) 2021

  16. arXiv:2103.17220  [pdf, other

    cs.CV

    Scale-aware Automatic Augmentation for Object Detection

    Authors: Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia

    Abstract: We propose Scale-aware AutoAug to learn data augmentation policies for object detection. We define a new scale-aware search space, where both image- and box-level augmentations are designed for maintaining scale invariance. Upon this search space, we propose a new search metric, termed Pareto Scale Balance, to facilitate search with high efficiency. In experiments, Scale-aware AutoAug yields signi… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

    Comments: Accepted by CVPR 2021

  17. arXiv:2011.05617  [pdf, other

    cs.AI cs.LG

    Sim-To-Real Transfer for Miniature Autonomous Car Racing

    Authors: Yeong-Jia Roger Chu, Ting-Han Wei, Jin-Bo Huang, Yuan-Hao Chen, I-Chen Wu

    Abstract: Sim-to-real, a term that describes where a model is trained in a simulator then transferred to the real world, is a technique that enables faster deep reinforcement learning (DRL) training. However, differences between the simulator and the real world often cause the model to perform poorly in the real world. Domain randomization is a way to bridge the sim-to-real gap by exposing the model to a wi… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

  18. arXiv:2003.14013  [pdf, other

    eess.IV cs.CV cs.LG

    Supervised Raw Video Denoising with a Benchmark Dataset on Dynamic Scenes

    Authors: Huanjing Yue, Cong Cao, Lei Liao, Ronghe Chu, Jingyu Yang

    Abstract: In recent years, the supervised learning strategy for real noisy image denoising has been emerging and has achieved promising results. In contrast, realistic noise removal for raw noisy videos is rarely studied due to the lack of noisy-clean pairs for dynamic scenes. Clean video frames for dynamic scenes cannot be captured with a long-exposure shutter or averaging multi-shots as was done for stati… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: CVPR2020 accepted paper

  19. arXiv:1910.04104  [pdf, other

    cs.CV

    Vehicle Re-identification with Viewpoint-aware Metric Learning

    Authors: Ruihang Chu, Yifan Sun, Yadong Li, Zheng Liu, Chi Zhang, Yichen Wei

    Abstract: This paper considers vehicle re-identification (re-ID) problem. The extreme viewpoint variation (up to 180 degrees) poses great challenges for existing approaches. Inspired by the behavior in human's recognition process, we propose a novel viewpoint-aware metric learning approach. It learns two metrics for similar viewpoints and different viewpoints in two feature spaces, respectively, giving rise… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

    Comments: Accepted by ICCV 2019

  20. arXiv:1906.11981  [pdf, other

    cs.CV cs.LG eess.IV

    Convolution Based Spectral Partitioning Architecture for Hyperspectral Image Classification

    Authors: Ringo S. W. Chu, Ho-Cheung Ng, Xiwei Wang, Wayne Luk

    Abstract: Hyperspectral images (HSIs) can distinguish materials with high number of spectral bands, which is widely adopted in remote sensing applications and benefits in high accuracy land cover classifications. However, HSIs processing are tangled with the problem of high dimensionality and limited amount of labelled data. To address these challenges, this paper proposes a deep learning architecture using… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

    Comments: Accepted for publication in IGARSS'2019

  21. arXiv:1906.11834  [pdf, other

    eess.IV cs.CV

    Optimizing CNN-based Hyperspectral Image Classification on FPGAs

    Authors: Shuanglong Liu, Ringo S. W. Chu, Xiwei Wang, Wayne Luk

    Abstract: Hyperspectral image (HSI) classification has been widely adopted in applications involving remote sensing imagery analysis which require high classification accuracy and real-time processing speed. Methods based on Convolutional neural networks (CNNs) have been proven to achieve state-of-the-art accuracy in classifying HSIs. However, CNN models are often too computationally intensive to achieve re… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

    Comments: This article is accepted for publication at ARC'2019

  22. arXiv:1812.03264  [pdf, other

    cs.CV

    Neural Abstract Style Transfer for Chinese Traditional Painting

    Authors: Bo Li, Caiming Xiong, Tianfu Wu, Yu Zhou, Lun Zhang, Rufeng Chu

    Abstract: Chinese traditional painting is one of the most historical artworks in the world. It is very popular in Eastern and Southeast Asia due to being aesthetically appealing. Compared with western artistic painting, it is usually more visually abstract and textureless. Recently, neural network based style transfer methods have shown promising and appealing results which are mainly focused on western pai… ▽ More

    Submitted 12 December, 2018; v1 submitted 7 December, 2018; originally announced December 2018.

    Comments: Conference: ACCV 2018. Project Page: https://github.com/lbsswu/Chinese_style_transfer

  23. arXiv:1807.02842  [pdf, other

    cs.CV

    Auto-Context R-CNN

    Authors: Bo Li, Tianfu Wu, Lun Zhang, Rufeng Chu

    Abstract: Region-based convolutional neural networks (R-CNN)~\cite{fast_rcnn,faster_rcnn,mask_rcnn} have largely dominated object detection. Operators defined on RoIs (Region of Interests) play an important role in R-CNNs such as RoIPooling~\cite{fast_rcnn} and RoIAlign~\cite{mask_rcnn}. They all only utilize information inside RoIs for RoI prediction, even with their recent deformable extensions~\cite{defo… ▽ More

    Submitted 8 July, 2018; originally announced July 2018.

    Comments: Rejected by ECCV18

  24. arXiv:1612.00534  [pdf, other

    cs.CV

    Object Detection via Aspect Ratio and Context Aware Region-based Convolutional Networks

    Authors: Bo Li, Tianfu Wu, Shuai Shao, Lun Zhang, Rufeng Chu

    Abstract: Jointly integrating aspect ratio and context has been extensively studied and shown performance improvement in traditional object detection systems such as the DPMs. It, however, has been largely ignored in deep neural network based detection systems. This paper presents a method of integrating a mixture of object models and region-based convolutional networks for accurate object detection. Each m… ▽ More

    Submitted 22 March, 2017; v1 submitted 1 December, 2016; originally announced December 2016.