subscribe to arXiv mailings

Order-aware Interactive Segmentation

Authors: Bin Wang, Anwesa Choudhuri, Meng Zheng, Zhongpai Gao, Benjamin Planche, Andong Deng, Qin Liu, Terrence Chen, Ulas Bagci, Ziyan Wu

Abstract: Interactive segmentation aims to accurately segment target objects with minimal user interactions. However, current methods often fail to accurately separate target objects from the background, due to a limited understanding of order, the relative depth between objects in a scene. To address this issue, we propose OIS: order-aware interactive segmentation, where we explicitly encode the relative d… ▽ More Interactive segmentation aims to accurately segment target objects with minimal user interactions. However, current methods often fail to accurately separate target objects from the background, due to a limited understanding of order, the relative depth between objects in a scene. To address this issue, we propose OIS: order-aware interactive segmentation, where we explicitly encode the relative depth between objects into order maps. We introduce a novel order-aware attention, where the order maps seamlessly guide the user interactions (in the form of clicks) to attend to the image features. We further present an object-aware attention module to incorporate a strong object-level understanding to better differentiate objects with similar order. Our approach allows both dense and sparse integration of user clicks, enhancing both accuracy and efficiency as compared to prior works. Experimental results demonstrate that OIS achieves state-of-the-art performance, improving mIoU after one click by 7.61 on the HQSeg44K dataset and 1.32 on the DAVIS dataset as compared to the previous state-of-the-art SegNext, while also doubling inference speed compared to current leading methods. The project page is https://ukaukaaaa.github.io/projects/OIS/index.html △ Less

Submitted 17 October, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

Comments: Interactive demo can be found in project page: https://ukaukaaaa.github.io/projects/OIS/index.html

arXiv:2410.07577 [pdf, other]

3D Vision-Language Gaussian Splatting

Authors: Qucheng Peng, Benjamin Planche, Zhongpai Gao, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Chen Chen, Ziyan Wu

Abstract: Recent advancements in 3D reconstruction methods and vision-language models have propelled the development of multi-modal 3D scene understanding, which has vital applications in robotics, autonomous driving, and virtual/augmented reality. However, current multi-modal scene understanding approaches have naively embedded semantic representations into 3D reconstruction methods without striking a bala… ▽ More Recent advancements in 3D reconstruction methods and vision-language models have propelled the development of multi-modal 3D scene understanding, which has vital applications in robotics, autonomous driving, and virtual/augmented reality. However, current multi-modal scene understanding approaches have naively embedded semantic representations into 3D reconstruction methods without striking a balance between visual and language modalities, which leads to unsatisfying semantic rasterization of translucent or reflective objects, as well as over-fitting on color modality. To alleviate these limitations, we propose a solution that adequately handles the distinct visual and semantic modalities, i.e., a 3D vision-language Gaussian splatting model for scene understanding, to put emphasis on the representation learning of language modality. We propose a novel cross-modal rasterizer, using modality fusion along with a smoothed semantic indicator for enhancing semantic rasterization. We also employ a camera-view blending technique to improve semantic consistency between existing and synthesized views, thereby effectively mitigating over-fitting. Extensive experiments demonstrate that our method achieves state-of-the-art performance in open-vocabulary semantic segmentation, surpassing existing methods by a significant margin. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: main paper + supplementary material

arXiv:2410.04974 [pdf, other]

6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering

Authors: Zhongpai Gao, Benjamin Planche, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Ziyan Wu

Abstract: Novel view synthesis has advanced significantly with the development of neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS). However, achieving high quality without compromising real-time rendering remains challenging, particularly for physically-based ray tracing with view-dependent effects. Recently, N-dimensional Gaussians (N-DG) introduced a 6D spatial-angular representation to bett… ▽ More Novel view synthesis has advanced significantly with the development of neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS). However, achieving high quality without compromising real-time rendering remains challenging, particularly for physically-based ray tracing with view-dependent effects. Recently, N-dimensional Gaussians (N-DG) introduced a 6D spatial-angular representation to better incorporate view-dependent effects, but the Gaussian representation and control scheme are sub-optimal. In this paper, we revisit 6D Gaussians and introduce 6D Gaussian Splatting (6DGS), which enhances color and opacity representations and leverages the additional directional information in the 6D space for optimized Gaussian control. Our approach is fully compatible with the 3DGS framework and significantly improves real-time radiance field rendering by better modeling view-dependent effects and fine details. Experiments demonstrate that 6DGS significantly outperforms 3DGS and N-DG, achieving up to a 15.73 dB improvement in PSNR with a reduction of 66.5% Gaussian points compared to 3DGS. The project page is: https://gaozhongpai.github.io/6dgs/ △ Less

Submitted 10 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

Comments: Project: https://gaozhongpai.github.io/6dgs/ and fixed iteration typos

arXiv:2404.03657 [pdf, other]

OW-VISCap: Open-World Video Instance Segmentation and Captioning

Authors: Anwesa Choudhuri, Girish Chowdhary, Alexander G. Schwing

Abstract: Open-world video instance segmentation is an important video understanding task. Yet most methods either operate in a closed-world setting, require an additional user-input, or use classic region-based proposals to identify never before seen objects. Further, these methods only assign a one-word label to detected objects, and don't generate rich object-centric descriptions. They also often suffer… ▽ More Open-world video instance segmentation is an important video understanding task. Yet most methods either operate in a closed-world setting, require an additional user-input, or use classic region-based proposals to identify never before seen objects. Further, these methods only assign a one-word label to detected objects, and don't generate rich object-centric descriptions. They also often suffer from highly overlapping predictions. To address these issues, we propose Open-World Video Instance Segmentation and Captioning (OW-VISCap), an approach to jointly segment, track, and caption previously seen or unseen objects in a video. For this, we introduce open-world object queries to discover never before seen objects without additional user-input. We generate rich and descriptive object-centric captions for each detected object via a masked attention augmented LLM input. We introduce an inter-query contrastive loss to ensure that the object queries differ from one another. Our generalized approach matches or surpasses state-of-the-art on three tasks: open-world video instance segmentation on the BURST dataset, dense video object captioning on the VidSTG dataset, and closed-world video instance segmentation on the OVIS dataset. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Project page: https://anwesachoudhuri.github.io/OpenWorldVISCap/

arXiv:2112.10764 [pdf, other]

Mask2Former for Video Instance Segmentation

Authors: Bowen Cheng, Anwesa Choudhuri, Ishan Misra, Alexander Kirillov, Rohit Girdhar, Alexander G. Schwing

Abstract: We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouT… ▽ More We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouTubeVIS-2019 and 52.6 AP on YouTubeVIS-2021. We believe Mask2Former is also capable of handling video semantic and panoptic segmentation, given its versatility in image segmentation. We hope this will make state-of-the-art video segmentation research more accessible and bring more attention to designing universal image and video segmentation architectures. △ Less

Submitted 20 December, 2021; originally announced December 2021.

Comments: Code and models: https://github.com/facebookresearch/Mask2Former

arXiv:1503.01314 [pdf]

An Incentivized Approach for Fair Participation in Wireless Ad hoc Networks

Authors: Arka Rai Choudhuri, Kalyanasundaram S, Shriyak Sridhar, Annappa B

Abstract: In Wireless Ad hoc networks (WANETs), nodes separated by considerable distance communicate with each other by relaying their messages through other nodes. However, it might not be in the best interests of a node to forward the message of another node due to power constraints. In addition, all nodes being rational, some nodes may be selfish, i.e. they might not relay data from other nodes so as to… ▽ More In Wireless Ad hoc networks (WANETs), nodes separated by considerable distance communicate with each other by relaying their messages through other nodes. However, it might not be in the best interests of a node to forward the message of another node due to power constraints. In addition, all nodes being rational, some nodes may be selfish, i.e. they might not relay data from other nodes so as to increase their lifetime. In this paper, we present a fair and incentivized approach for participation in Ad hoc networks. Given the power required for each transmission, we are able to determine the power saving contributed by each intermediate hop. We propose the FAir Share incenTivizEd Ad hoc paRticipation protocol (FASTER), which takes a selected route from a routing protocol as input, to calculate the worth of each node using the cooperative game theory concept of 'Shapley Value' applied on the power saved by each node. This value can be used for allocation of Virtual Currency to the nodes, which can be spent on subsequent message transmissions. △ Less

Submitted 4 March, 2015; originally announced March 2015.

Comments: 6 pages, 4 figures, published in the International Journal of Recent Development in Engineering and Technology

Journal ref: IJRDET 2, no. 3 (2014): 117-121

Showing 1–6 of 6 results for author: Choudhuri, A