Version 1
: Received: 19 October 2024 / Approved: 22 October 2024 / Online: 22 October 2024 (16:39:53 CEST)
How to cite:
Hassan, S.; Wang, L.; Mahmud, K. R. Integrating Vision and Olfaction via Multi-Modal LLM for Robotic Odor Source Localization. Preprints2024, 2024101708. https://doi.org/10.20944/preprints202410.1708.v1
Hassan, S.; Wang, L.; Mahmud, K. R. Integrating Vision and Olfaction via Multi-Modal LLM for Robotic Odor Source Localization. Preprints 2024, 2024101708. https://doi.org/10.20944/preprints202410.1708.v1
Hassan, S.; Wang, L.; Mahmud, K. R. Integrating Vision and Olfaction via Multi-Modal LLM for Robotic Odor Source Localization. Preprints2024, 2024101708. https://doi.org/10.20944/preprints202410.1708.v1
APA Style
Hassan, S., Wang, L., & Mahmud, K. R. (2024). Integrating Vision and Olfaction via Multi-Modal LLM for Robotic Odor Source Localization. Preprints. https://doi.org/10.20944/preprints202410.1708.v1
Chicago/Turabian Style
Hassan, S., Lingxiao Wang and Khan Raqib Mahmud. 2024 "Integrating Vision and Olfaction via Multi-Modal LLM for Robotic Odor Source Localization" Preprints. https://doi.org/10.20944/preprints202410.1708.v1
Abstract
Odor Source Localization (OSL) technology allows autonomous agents like mobile robots to find an unknown odor source in a given environment. An effective navigation algorithm that guides the robot to approach the odor source is the key to successfully locating the odor source. Compared to traditional olfaction-only OSL method, our proposed method integrates vision and olfaction sensor modalities to localize odor sources even if olfaction sensing is disrupted by turbulent airflow or vision sensing is impaired by environmental complexities. The model leverages the zero-shot multi-modal reasoning capabilities of large language models (LLMs), negating the requirement of manual knowledge encoding or custom-trained supervised learning models. A key feature of the proposed algorithm is the `High-level Reasoning’ module, which encodes the olfaction and vision sensor data into a multi-modal prompt and instructs the LLM to employ a hierarchical reasoning process to select an appropriate high-level navigation behavior. Subsequently, the `Low-level Action’ module translates the selected high-level navigation behavior into low-level action commands that can be executed by the mobile robot. To validate our method, we implemented the proposed algorithm on a mobile robot in a complex, real-world search environment that presents challenges to both olfaction and vision-sensing modalities. We compared the performance of our proposed algorithm to single sensory modality-based olfaction-only and vision-only navigation algorithms, and a supervised learning-based vision and olfaction fusion navigation algorithm. Experimental results demonstrate that multi-sensory navigation algorithms are statistically superior to single sensory navigation algorithms. The proposed algorithm outperformed the other algorithms in both laminar and turbulent airflow environments. The code for this work can be found at: https://github.com/SunzidHassan/24_Vision-Olfaction-LLM.
Keywords
odor source localization; multi-modal robotics; Large Language Models (LLMs); robot operating system (ROS)
Subject
Computer Science and Mathematics, Robotics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.