Skip to main content

Showing 1–50 of 135 results for author: Stiefelhagen, R

  1. arXiv:2409.17555  [pdf, ps, other

    cs.LG cs.CV

    Advancing Open-Set Domain Generalization Using Evidential Bi-Level Hardest Domain Scheduler

    Authors: Kunyu Peng, Di Wen, Kailun Yang, Ao Luo, Yufan Chen, Jia Fu, M. Saquib Sarfraz, Alina Roitberg, Rainer Stiefelhagen

    Abstract: In Open-Set Domain Generalization (OSDG), the model is exposed to both new variations of data appearance (domains) and open-set conditions, where both known and novel categories are present at test time. The challenges of this task arise from the dual need to generalize across diverse domains and accurately quantify category novelty, which is critical for applications in dynamic environments. Rece… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: Accepted to NeurIPS 2024. The source code will be available at https://github.com/KPeng9510/EBiL-HaDS

  2. arXiv:2409.16763  [pdf, other

    cs.CV

    Statewide Visual Geolocalization in the Wild

    Authors: Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens, Rainer Stiefelhagen

    Abstract: This work presents a method that is able to predict the geolocation of a street-view photo taken in the wild within a state-sized search region by matching against a database of aerial reference imagery. We partition the search region into geographical cells and train a model to map cells and corresponding photos into a joint embedding space that is used to perform retrieval at test time. The mode… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  3. arXiv:2409.14215  [pdf, other

    cs.CV

    @Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology

    Authors: Xin Jiang, Junwei Zheng, Ruiping Liu, Jiahang Li, Jiaming Zhang, Sven Matthiesen, Rainer Stiefelhagen

    Abstract: As Vision-Language Models (VLMs) advance, human-centered Assistive Technologies (ATs) for helping People with Visual Impairments (PVIs) are evolving into generalists, capable of performing multiple tasks simultaneously. However, benchmarking VLMs for ATs remains under-explored. To bridge this gap, we first create a novel AT benchmark (@Bench). Guided by a pre-design user study with PVIs, our bench… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

    Comments: Accepted by WACV 2025, project page: https://junweizheng93.github.io/publications/ATBench/ATBench.html

  4. arXiv:2409.13912  [pdf, other

    cs.CV

    OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping

    Authors: Jiale Wei, Junwei Zheng, Ruiping Liu, Jie Hu, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: In the field of autonomous driving, Bird's-Eye-View (BEV) perception has attracted increasing attention in the community since it provides more comprehensive information compared with pinhole front-view images and panoramas. Traditional BEV methods, which rely on multiple narrow-field cameras and complex pose estimations, often face calibration and synchronization issues. To break the wall of the… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: Accepted by ACCV 2024. Project code at: https://github.com/JialeWei/OneBEV

  5. arXiv:2409.13548  [pdf, other

    eess.IV cs.CV

    Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation?

    Authors: Alexander Jaus, Simon Reiß, Jens Klesiek, Rainer Stiefelhagen

    Abstract: In this work, we describe our approach to compete in the autoPET3 datacentric track. While conventional wisdom suggests that larger datasets lead to better model performance, recent studies indicate that excluding certain training samples can enhance model accuracy. We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics by produci… ▽ More

    Submitted 4 October, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

  6. arXiv:2408.03046  [pdf, other

    cs.CV

    Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression

    Authors: Jonas Schmitt, Ruiping Liu, Junwei Zheng, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: Lightweight and effective models are essential for devices with limited resources, such as intelligent vehicles. Structured pruning offers a promising approach to model compression and efficiency enhancement. However, existing methods often tie pruning techniques to specific model architectures or vision tasks. To address this limitation, we propose a novel unified pruning framework Comb, Prune, D… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by ITSC 2024. Code is publicly available at: https://github.com/Cranken/CPD

  7. arXiv:2407.05844  [pdf, other

    cs.CV

    Anatomy-guided Pathology Segmentation

    Authors: Alexander Jaus, Constantin Seibold, Simon Reiß, Lukas Heine, Anton Schily, Moon Kim, Fin Hendrik Bahnsen, Ken Herrmann, Rainer Stiefelhagen, Jens Kleesiek

    Abstract: Pathological structures in medical images are typically deviations from the expected anatomy of a patient. While clinicians consider this interplay between anatomy and pathology, recent deep learning algorithms specialize in recognizing either one of the two, rarely considering the patient's body from such a joint perspective. In this paper, we develop a generalist segmentation model that combines… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  8. arXiv:2407.02685  [pdf, other

    cs.CV

    Open Panoramic Segmentation

    Authors: Junwei Zheng, Ruiping Liu, Yufan Chen, Kunyu Peng, Chengzhi Wu, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: Panoramic images, capturing a 360° field of view (FoV), encompass omnidirectional spatial information crucial for scene understanding. However, it is not only costly to obtain training-sufficient dense-annotated panoramas but also application-restricted when training models in a close-vocabulary setting. To tackle this problem, in this work, we define a new task termed Open Panoramic Segmentation… ▽ More

    Submitted 11 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project page: https://junweizheng93.github.io/publications/OPS/OPS.html

  9. arXiv:2407.02182  [pdf, other

    cs.CV cs.RO eess.IV

    Occlusion-Aware Seamless Segmentation

    Authors: Yihong Cao, Jiaming Zhang, Hao Shi, Kunyu Peng, Yuhongxuan Zhang, Hui Zhang, Rainer Stiefelhagen, Kailun Yang

    Abstract: Panoramic images can broaden the Field of View (FoV), occlusion-aware prediction can deepen the understanding of the scene, and domain adaptation can transfer across viewing domains. In this work, we introduce a novel task, Occlusion-Aware Seamless Segmentation (OASS), which simultaneously tackles all these three challenges. For benchmarking OASS, we establish a new human-annotated dataset for Ble… ▽ More

    Submitted 17 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. The fresh dataset and source code are available at https://github.com/yihong-97/OASS

  10. arXiv:2407.01872  [pdf, other

    cs.CV cs.RO eess.IV

    Referring Atomic Video Action Recognition

    Authors: Kunyu Peng, Jia Fu, Kailun Yang, Di Wen, Yufan Chen, Ruiping Liu, Junwei Zheng, Jiaming Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg

    Abstract: We introduce a new task called Referring Atomic Video Action Recognition (RAVAR), aimed at identifying atomic actions of a particular person based on a textual description and the video data of this person. This task differs from traditional action recognition and localization, where predictions are delivered for all present individuals. In contrast, we focus on recognizing the correct atomic acti… ▽ More

    Submitted 10 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. The dataset and code will be made publicly available at https://github.com/KPeng9510/RAVAR

  11. arXiv:2406.10421  [pdf, other

    cs.CL

    SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

    Authors: Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Tobias Röddiger, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues

    Abstract: With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx -… ▽ More

    Submitted 2 October, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP 2024 Main Conference

    ACM Class: I.2.7

  12. arXiv:2405.19124  [pdf, other

    cs.CV

    ACCSAMS: Automatic Conversion of Exam Documents to Accessible Learning Material for Blind and Visually Impaired

    Authors: David Wilkening, Omar Moured, Thorsten Schwarz, Karin Muller, Rainer Stiefelhagen

    Abstract: Exam documents are essential educational materials for exam preparation. However, they pose a significant academic barrier for blind and visually impaired students, as they are often created without accessibility considerations. Typically, these documents are incompatible with screen readers, contain excessive white space, and lack alternative text for visual elements. This situation frequently re… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted at ICCHP 2024

  13. arXiv:2405.19117  [pdf, other

    cs.CV

    ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGs

    Authors: Omar Moured, Sara Alzalabny, Anas Osman, Thorsten Schwarz, Karin Muller, Rainer Stiefelhagen

    Abstract: Visualizations, such as charts, are crucial for interpreting complex data. However, they are often provided as raster images, which are not compatible with assistive technologies for people with blindness and visual impairments, such as embossed papers or tactile displays. At the same time, creating accessible vector graphics requires a skilled sighted person and is time-intensive. In this work, w… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted at ICCHP 2024. Codes will be available at https://github.com/nsothman/ChartFormer

  14. arXiv:2405.19111  [pdf, other

    cs.CV cs.HC

    Alt4Blind: A User Interface to Simplify Charts Alt-Text Creation

    Authors: Omar Moured, Shahid Ali Farooqui, Karin Muller, Sharifeh Fadaeijouybari, Thorsten Schwarz, Mohammed Javed, Rainer Stiefelhagen

    Abstract: Alternative Texts (Alt-Text) for chart images are essential for making graphics accessible to people with blindness and visual impairments. Traditionally, Alt-Text is manually written by authors but often encounters issues such as oversimplification or complication. Recent trends have seen the use of AI for Alt-Text generation. However, existing models are susceptible to producing inaccurate or mi… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted at ICCHP 2024. Codes will be available at https://moured.github.io/alt4blind/

  15. arXiv:2405.13580  [pdf, other

    cs.CV cs.HC

    AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks

    Authors: Omar Moured, Jiaming Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen

    Abstract: Chart summarization is a crucial task for blind and visually impaired individuals as it is their primary means of accessing and interpreting graphical data. Crafting high-quality descriptions is challenging because it requires precise communication of essential details within the chart without vision perception. Many chart analysis methods, however, produce brief, unstructured responses that may c… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted in ICDAR 2024. Project page is at: https://github.com/moured/AltChart

  16. arXiv:2404.01816  [pdf, other

    eess.IV cs.CV cs.HC

    Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods

    Authors: Zdravko Marinov, Moon Kim, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: Interactive segmentation plays a crucial role in accelerating the annotation, particularly in domains requiring specialized expertise such as nuclear medicine. For example, annotating lesions in whole-body Positron Emission Tomography (PET) images can require over an hour per volume. While previous works evaluate interactive segmentation models through either real user studies or simulated annotat… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 10 pages, 5 figures, 1 table

  17. arXiv:2403.14442  [pdf, other

    cs.CV

    RoDLA: Benchmarking the Robustness of Document Layout Analysis Models

    Authors: Yufan Chen, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ruiping Liu, Philip Torr, Rainer Stiefelhagen

    Abstract: Before developing a Document Layout Analysis (DLA) model in real-world applications, conducting comprehensive robustness testing is essential. However, the robustness of DLA models remains underexplored in the literature. To address this, we are the first to introduce a robustness benchmark for DLA models, which includes 450K document images of three datasets. To cover realistic corruptions, we pr… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024. Project page: https://yufanchen96.github.io/projects/RoDLA

  18. arXiv:2403.09975  [pdf, other

    cs.CV cs.RO eess.IV

    Skeleton-Based Human Action Recognition with Noisy Labels

    Authors: Yi Xu, Kunyu Peng, Di Wen, Ruiping Liu, Junwei Zheng, Yufan Chen, Jiaming Zhang, Alina Roitberg, Kailun Yang, Rainer Stiefelhagen

    Abstract: Understanding human actions from body poses is critical for assistive robots sharing space with humans in order to make informed and safe decisions about the next interaction. However, precise temporal localization and annotation of activity sequences is time-consuming and the resulting labels are often noisy. If not effectively addressed, label noise negatively affects the model's training, resul… ▽ More

    Submitted 5 August, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted to IROS 2024. The source code for this study is accessible at https://github.com/xuyizdby/NoiseEraSAR

  19. Chart4Blind: An Intelligent Interface for Chart Accessibility Conversion

    Authors: Omar Moured, Morris Baumgarten-Egemole, Alina Roitberg, Karin Muller, Thorsten Schwarz, Rainer Stiefelhagen

    Abstract: In a world driven by data visualization, ensuring the inclusive accessibility of charts for Blind and Visually Impaired (BVI) individuals remains a significant challenge. Charts are usually presented as raster graphics without textual and visual metadata needed for an equivalent exploration experience for BVI people. Additionally, converting these charts into accessible formats requires considerab… ▽ More

    Submitted 25 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted to IUI 2024. 19 pages, 7 figures, 2 table. For a demo video, see this https://moured.github.io/chart4blind/ . The source code is available at https://github.com/moured/chart4blind_code/

  20. arXiv:2402.18302  [pdf, other

    cs.CV cs.RO eess.AS eess.IV

    EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

    Authors: Jiacheng Lin, Jiajun Chen, Kunyu Peng, Xuan He, Zhiyong Li, Rainer Stiefelhagen, Kailun Yang

    Abstract: This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which dynamically tracks specific objects in a video sequence based on audio expressions and appears as a challenging problem in autonomous driving. Due to the lack of semantic modeling capacity in audio and video, existing works have mainly focused on text-based multi-object tracking, which often comes at the cos… ▽ More

    Submitted 5 August, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). The source code and datasets are available at https://github.com/lab206/EchoTrack

  21. arXiv:2401.16923  [pdf, other

    cs.CV cs.RO eess.IV

    Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation

    Authors: Ruiping Liu, Jiaming Zhang, Kunyu Peng, Yufan Chen, Ke Cao, Junwei Zheng, M. Saquib Sarfraz, Kailun Yang, Rainer Stiefelhagen

    Abstract: Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level… ▽ More

    Submitted 10 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE IV 2024. The source code is publicly available at https://github.com/RuipingL/MISS

  22. arXiv:2312.08060  [pdf, other

    cs.CV

    C-BEV: Contrastive Bird's Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation

    Authors: Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens, Rainer Stiefelhagen

    Abstract: To find the geolocation of a street-view image, cross-view geolocalization (CVGL) methods typically perform image retrieval on a database of georeferenced aerial images and determine the location from the visually most similar match. Recent approaches focus mainly on settings where street-view and aerial images are preselected to align w.r.t. translation or orientation, but struggle in challenging… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  23. arXiv:2312.06330  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Navigating Open Set Scenarios for Skeleton-based Action Recognition

    Authors: Kunyu Peng, Cheng Yin, Junwei Zheng, Ruiping Liu, David Schneider, Jiaming Zhang, Kailun Yang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg

    Abstract: In real-world scenarios, human actions often fall outside the distribution of training data, making it crucial for models to recognize known actions and reject unknown ones. However, using pure skeleton data in such open-set conditions poses challenges due to the lack of visual background cues and the distinct sparse structure of body pose sequences. In this paper, we tackle the unexplored Open-Se… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024. The benchmark, code, and models will be released at https://github.com/KPeng9510/OS-SAR

  24. arXiv:2311.14482  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Sliding Window FastEdit: A Framework for Lesion Annotation in Whole-body PET Images

    Authors: Matthias Hadlich, Zdravko Marinov, Moon Kim, Enrico Nasca, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: Deep learning has revolutionized the accurate segmentation of diseases in medical imaging. However, achieving such results requires training with numerous manual voxel annotations. This requirement presents a challenge for whole-body Positron Emission Tomography (PET) imaging, where lesions are scattered throughout the body. To tackle this problem, we introduce SW-FastEdit - an interactive segment… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: 5 pages, 2 figures, 4 tables

  25. arXiv:2311.13964  [pdf, other

    eess.IV cs.AI cs.CV cs.HC cs.LG

    Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy

    Authors: Zdravko Marinov, Paul F. Jäger, Jan Egger, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: Interactive segmentation is a crucial research area in medical image analysis aiming to boost the efficiency of costly annotations by incorporating human feedback. This feedback takes the form of clicks, scribbles, or masks and allows for iterative refinement of the model output so as to efficiently guide the system towards the desired behavior. In recent years, deep learning-based approaches have… ▽ More

    Submitted 9 January, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: 26 pages, 8 figures, 10 tables; Zdravko Marinov and Paul F. Jäger and co-first authors; This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  26. arXiv:2311.05970  [pdf, other

    cs.CV cs.RO

    Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

    Authors: Calvin Tanama, Kunyu Peng, Zdravko Marinov, Rainer Stiefelhagen, Alina Roitberg

    Abstract: Deep learning-based models are at the forefront of most driver observation benchmarks due to their remarkable accuracies but are also associated with high computational costs. This is challenging, as resources are often limited in real-world driving scenarios. This paper introduces a lightweight framework for resource-efficient driver activity recognition. The framework enhances 3D MobileNet, a ne… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: Accepted at IROS 2023

  27. arXiv:2310.02815  [pdf, other

    cs.CV cs.RO eess.IV

    CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity

    Authors: Hao Shi, Chengshan Pang, Jiaming Zhang, Kailun Yang, Yuhao Wu, Huajian Ni, Yining Lin, Rainer Stiefelhagen, Kaiwei Wang

    Abstract: Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses pre… ▽ More

    Submitted 15 September, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted to IEEE Transactions on Image Processing (TIP). The source code will be made publicly available at https://github.com/MasterHow/CoBEV

  28. arXiv:2309.12114  [pdf, other

    eess.IV cs.CV

    AutoPET Challenge 2023: Sliding Window-based Optimization of U-Net

    Authors: Matthias Hadlich, Zdravko Marinov, Rainer Stiefelhagen

    Abstract: Tumor segmentation in medical imaging is crucial and relies on precise delineation. Fluorodeoxyglucose Positron-Emission Tomography (FDG-PET) is widely used in clinical practice to detect metabolically active tumors. However, FDG-PET scans may misinterpret irregular glucose consumption in healthy or benign tissues as cancer. Combining PET with Computed Tomography (CT) can enhance tumor segmentatio… ▽ More

    Submitted 4 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: 9 pages, 1 figure, MICCAI 2023 - AutoPET Challenge Submission Version 2: Added all results on the preliminary test set

  29. arXiv:2309.12029  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    Unveiling the Hidden Realm: Self-supervised Skeleton-based Action Recognition in Occluded Environments

    Authors: Yifei Chen, Kunyu Peng, Alina Roitberg, David Schneider, Jiaming Zhang, Junwei Zheng, Ruiping Liu, Yufan Chen, Kailun Yang, Rainer Stiefelhagen

    Abstract: To integrate action recognition methods into autonomous robotic systems, it is crucial to consider adverse situations involving target occlusions. Such a scenario, despite its practical relevance, is rarely addressed in existing self-supervised skeleton-based action recognition methods. To empower robots with the capacity to address occlusion, we propose a simple and effective method. We first pre… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: The source code will be made publicly available at https://github.com/cyfml/OPSTL

  30. arXiv:2309.12009  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

    Authors: Yiping Wei, Kunyu Peng, Alina Roitberg, Jiaming Zhang, Junwei Zheng, Ruiping Liu, Yufan Chen, Kailun Yang, Rainer Stiefelhagen

    Abstract: Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup. These works overlooked the differences in performance among modalities, which led to the propagation of erroneous knowledge between modalities while only three fundamental modalities, i.e., joints, bone… ▽ More

    Submitted 10 January, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024. The source code will be made publicly available at https://github.com/desehuileng0o0/IKEM

  31. arXiv:2308.16139  [pdf, other

    cs.CV cs.DB cs.LG

    MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

    Authors: Jianning Li, Zongwei Zhou, Jiancheng Yang, Antonio Pepe, Christina Gsaxner, Gijs Luijten, Chongyu Qu, Tiezheng Zhang, Xiaoxi Chen, Wenxuan Li, Marek Wodzinski, Paul Friedrich, Kangxian Xie, Yuan Jin, Narmada Ambigapathy, Enrico Nasca, Naida Solak, Gian Marco Melito, Viet Duc Vu, Afaque R. Memon, Christopher Schlachta, Sandrine De Ribaupierre, Rajnikant Patel, Roy Eagleson, Xiaojun Chen , et al. (132 additional authors not shown)

    Abstract: Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of Shape… ▽ More

    Submitted 12 December, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: 16 pages

    MSC Class: 68T01

  32. arXiv:2308.12049  [pdf, other

    cs.CV cs.AI

    Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

    Authors: Hejun Xiao, Kunyu Peng, Xiangsheng Huang, Alina Roitberg1, Hao Li, Zhaohui Wang, Rainer Stiefelhagen

    Abstract: Fall detection is a vital task in health monitoring, as it allows the system to trigger an alert and therefore enabling faster interventions when a person experiences a fall. Although most previous approaches rely on standard RGB video data, such detailed appearance-aware monitoring poses significant privacy concerns. Depth sensors, on the other hand, are better at preserving privacy as they merel… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  33. arXiv:2307.16543  [pdf, other

    cs.CV

    On Transferability of Driver Observation Models from Simulated to Real Environments in Autonomous Cars

    Authors: Walter Morales-Alvarez, Novel Certad, Alina Roitberg, Rainer Stiefelhagen, Cristina Olaverri-Monreal

    Abstract: For driver observation frameworks, clean datasets collected in controlled simulated environments often serve as the initial training ground. Yet, when deployed under real driving conditions, such simulator-trained models quickly face the problem of distributional shifts brought about by changing illumination, car model, variations in subject appearances, sensor discrepancies, and other environment… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  34. arXiv:2307.15588  [pdf, other

    cs.CV cs.RO eess.IV

    OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation

    Authors: Fei Teng, Jiaming Zhang, Kunyu Peng, Yaonan Wang, Rainer Stiefelhagen, Kailun Yang

    Abstract: Light field cameras are capable of capturing intricate angular and spatial details. This allows for acquiring complex light patterns and details from multiple angles, significantly enhancing the precision of image semantic segmentation. However, two significant issues arise: (1) The extensive angular information of light field cameras contains a large amount of redundant data, which is overwhelmin… ▽ More

    Submitted 9 September, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

    Comments: Accepted to IEEE Transactions on Artificial Intelligence (TAI). The source code is available at https://github.com/FeiBryantkit/OAFuser

  35. arXiv:2307.13375  [pdf, other

    eess.IV cs.CV

    Towards Unifying Anatomy Segmentation: Automated Generation of a Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines

    Authors: Alexander Jaus, Constantin Seibold, Kelsey Hermann, Alexandra Walter, Kristina Giske, Johannes Haubold, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: In this study, we present a method for generating automated anatomy segmentation datasets using a sequential process that involves nnU-Net-based pseudo-labeling and anatomy-guided pseudo-label refinement. By combining various fragmented knowledge bases, we generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage which exper… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: 18 pages, 8 figures, 2 tables

  36. arXiv:2307.07763  [pdf, other

    cs.RO cs.CV eess.IV

    Tightly-Coupled LiDAR-Visual SLAM Based on Geometric Features for Mobile Agents

    Authors: Ke Cao, Ruiping Liu, Ze Wang, Kunyu Peng, Jiaming Zhang, Junwei Zheng, Zhifeng Teng, Kailun Yang, Rainer Stiefelhagen

    Abstract: The mobile robot relies on SLAM (Simultaneous Localization and Mapping) to provide autonomous navigation and task execution in complex and unknown environments. However, it is hard to develop a dedicated algorithm for mobile robots due to dynamic and challenging situations, such as poor lighting conditions and motion blur. To tackle this issue, we propose a tightly-coupled LiDAR-visual SLAM based… ▽ More

    Submitted 25 December, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: Accepted to ROBIO 2023

  37. arXiv:2307.07757  [pdf, other

    cs.CV cs.HC cs.RO eess.IV

    Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments

    Authors: Ruiping Liu, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ke Cao, Yufan Chen, Kailun Yang, Rainer Stiefelhagen

    Abstract: Grounded Situation Recognition (GSR) is capable of recognizing and interpreting visual scenes in a contextually intuitive way, yielding salient activities (verbs) and the involved entities (roles) depicted in images. In this work, we focus on the application of GSR in assisting people with visual impairments (PVI). However, precise localization information of detected objects is often required to… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: Code will be available at https://github.com/RuipingL/OpenSU

  38. arXiv:2307.02065  [pdf, other

    cs.CV cs.AI cs.LG

    Line Graphics Digitization: A Step Towards Full Automation

    Authors: Omar Moured, Jiaming Zhang, Alina Roitberg, Thorsten Schwarz, Rainer Stiefelhagen

    Abstract: The digitization of documents allows for wider accessibility and reproducibility. While automatic digitization of document layout and text content has been a long-standing focus of research, this problem in regard to graphical elements, such as statistical plots, has been under-explored. In this paper, we introduce the task of fine-grained visual understanding of mathematical graphics and present… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023)

  39. arXiv:2306.03934  [pdf, other

    eess.IV cs.CV cs.LG

    Accurate Fine-Grained Segmentation of Human Anatomy in Radiographs via Volumetric Pseudo-Labeling

    Authors: Constantin Seibold, Alexander Jaus, Matthias A. Fink, Moon Kim, Simon Reiß, Ken Herrmann, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: Purpose: Interpreting chest radiographs (CXR) remains challenging due to the ambiguity of overlapping structures such as the lungs, heart, and bones. To address this issue, we propose a novel method for extracting fine-grained anatomical structures in CXR using pseudo-labeling of three-dimensional computed tomography (CT) scans. Methods: We created a large-scale dataset of 10,021 thoracic CTs wi… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: 28 pages, 1 table, 10 figures

    ACM Class: I.4.6; I.4.7; I.4.8

  40. arXiv:2305.12823  [pdf, other

    cs.CV

    READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation

    Authors: Stéphane Vujasinović, Sebastian Bullinger, Stefan Becker, Norbert Scherer-Negenborn, Michael Arens, Rainer Stiefelhagen

    Abstract: We present READMem (Robust Embedding Association for a Diverse Memory), a modular framework for semi-automatic video object segmentation (sVOS) methods designed to handle unconstrained videos. Contemporary sVOS works typically aggregate video frames in an ever-expanding memory, demanding high hardware resources for long-term applications. To mitigate memory requirements and prevent near object dup… ▽ More

    Submitted 25 September, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to BMVC 2023. Code @ https://github.com/Vujas-Eteph/READMem

    Journal ref: 34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023

  41. arXiv:2305.08420  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains

    Authors: Kunyu Peng, Di Wen, David Schneider, Jiaming Zhang, Kailun Yang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg

    Abstract: Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments, sensor types, and data sources. Unsupervised domain adaptation methods have been extensively studied, yet, they require large-scale unlabeled data from the target domain. In this work, we focus on Few-Shot Domain Adaptation for Activity Recognition (FSDA-AR), which leverag… ▽ More

    Submitted 27 April, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: The benchmark and source code will be publicly available at https://github.com/KPeng9510/RelaMiX

  42. arXiv:2303.13842  [pdf, other

    cs.CV cs.RO eess.IV

    FishDreamer: Towards Fisheye Semantic Completion via Unified Image Outpainting and Segmentation

    Authors: Hao Shi, Yu Li, Kailun Yang, Jiaming Zhang, Kunyu Peng, Alina Roitberg, Yaozu Ye, Huajian Ni, Kaiwei Wang, Rainer Stiefelhagen

    Abstract: This paper raises the new task of Fisheye Semantic Completion (FSC), where dense texture, structure, and semantics of a fisheye image are inferred even beyond the sensor field-of-view (FoV). Fisheye cameras have larger FoV than ordinary pinhole cameras, yet its unique special imaging model naturally leads to a blind area at the edge of the image plane. This is suboptimal for safety-critical applic… ▽ More

    Submitted 20 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR OmniCV 2023. Code and datasets will be available at https://github.com/MasterHow/FishDreamer

  43. arXiv:2303.11910  [pdf, other

    cs.CV

    360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View

    Authors: Zhifeng Teng, Jiaming Zhang, Kailun Yang, Kunyu Peng, Hao Shi, Simon Reiß, Ke Cao, Rainer Stiefelhagen

    Abstract: Seeing only a tiny part of the whole is not knowing the full circumstance. Bird's-eye-view (BEV) perception, a process of obtaining allocentric maps from egocentric views, is restricted when using a narrow Field of View (FoV) alone. In this work, mapping from 360° panoramas to BEV semantics, the 360BEV task, is established for the first time to achieve holistic representations of indoor scenes in… ▽ More

    Submitted 4 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: Code and datasets are available at the project page: https://jamycheung.github.io/360BEV.html. Accepted to WACV 2024

  44. arXiv:2303.07126  [pdf, ps, other

    eess.IV cs.CV

    Mirror U-Net: Marrying Multimodal Fission with Multi-task Learning for Semantic Segmentation in Medical Imaging

    Authors: Zdravko Marinov, Simon Reiß, David Kersting, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: Positron Emission Tomography (PET) and Computer Tomography (CT) are routinely used together to detect tumors. PET/CT segmentation models can automate tumor delineation, however, current multimodal models do not fully exploit the complementary information in each modality, as they either concatenate PET and CT data or fuse them at the decision level. To combat this, we propose Mirror U-Net, which r… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: 8 pages; 8 figures; 5 tables

  45. Guiding the Guidance: A Comparative Analysis of User Guidance Signals for Interactive Segmentation of Volumetric Images

    Authors: Zdravko Marinov, Rainer Stiefelhagen, Jens Kleesiek

    Abstract: Interactive segmentation reduces the annotation time of medical images and allows annotators to iteratively refine labels with corrective interactions, such as clicks. While existing interactive models transform clicks into user guidance signals, which are combined with images to form (image, guidance) pairs, the question of how to best represent the guidance has not been fully explored. To addres… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: 8 pages; 2 figures; 2 tables

  46. arXiv:2303.01480  [pdf, other

    cs.CV

    Delivering Arbitrary-Modal Semantic Segmentation

    Authors: Jiaming Zhang, Ruiping Liu, Hao Shi, Kailun Yang, Simon Reiß, Kunyu Peng, Haodong Fu, Kaiwei Wang, Rainer Stiefelhagen

    Abstract: Multimodal fusion can make semantic segmentation more robust. However, fusing an arbitrary number of modalities remains underexplored. To delve into this problem, we create the DeLiVER arbitrary-modal segmentation benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. Aside from this, we provide this dataset in four severe weather conditions as well as five sensor failure cases to expl… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023. Dataset and our code are at: https://jamycheung.github.io/DELIVER.html

  47. arXiv:2303.00952  [pdf, other

    cs.CV cs.RO eess.IV

    Towards Activated Muscle Group Estimation in the Wild

    Authors: Kunyu Peng, David Schneider, Alina Roitberg, Kailun Yang, Jiaming Zhang, Chen Deng, Kaiyu Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen

    Abstract: In this paper, we tackle the new task of video-based Activated Muscle Group Estimation (AMGE) aiming at identifying active muscle regions during physical activity in the wild. To this intent, we provide the MuscleMap dataset featuring >15K video clips with 135 different activities and 20 labeled muscle groups. This dataset opens the vistas to multiple video-based applications in sports and rehabil… ▽ More

    Submitted 5 August, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted to ACM MM 2024. The database and code can be found at https://github.com/KPeng9510/MuscleMap

  48. arXiv:2302.14595  [pdf, other

    cs.RO cs.CV

    MateRobot: Material Recognition in Wearable Robotics for People with Visual Impairments

    Authors: Junwei Zheng, Jiaming Zhang, Kailun Yang, Kunyu Peng, Rainer Stiefelhagen

    Abstract: People with Visual Impairments (PVI) typically recognize objects through haptic perception. Knowing objects and materials before touching is desired by the target users but under-explored in the field of human-centered robotics. To fill this gap, in this work, a wearable vision-based robotic system, MateRobot, is established for PVI to recognize materials and object categories beforehand. To addre… ▽ More

    Submitted 6 March, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: Accepted to ICRA 2024. The source code is publicly available at https://junweizheng93.github.io/publications/MATERobot/MATERobot.html

  49. Multimodal Interactive Lung Lesion Segmentation: A Framework for Annotating PET/CT Images based on Physiological and Anatomical Cues

    Authors: Verena Jasmin Hallitschke, Tobias Schlumberger, Philipp Kataliakos, Zdravko Marinov, Moon Kim, Lars Heiliger, Constantin Seibold, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: Recently, deep learning enabled the accurate segmentation of various diseases in medical imaging. These performances, however, typically demand large amounts of manual voxel annotations. This tedious process for volumetric data becomes more complex when not all required information is available in a single imaging domain as is the case for PET/CT data. We propose a multimodal interactive segmentat… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: Accepted at ISBI 2023; 5 pages, 5 figures

  50. arXiv:2211.12145  [pdf, other

    cs.CV

    Uncertainty-aware Vision-based Metric Cross-view Geolocalization

    Authors: Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens, Rainer Stiefelhagen

    Abstract: This paper proposes a novel method for vision-based metric cross-view geolocalization (CVGL) that matches the camera images captured from a ground-based vehicle with an aerial image to determine the vehicle's geo-pose. Since aerial images are globally available at low cost, they represent a potential compromise between two established paradigms of autonomous driving, i.e. using expensive high-defi… ▽ More

    Submitted 17 May, 2023; v1 submitted 22 November, 2022; originally announced November 2022.