Google
Mar 23, 2024We propose a method that first builds a semantic map of the scene based on depth information and via visual prompting of a VLM.
In this work, we leverage the strong semantic reasoning capabilities of large vision-language models (VLMs) to efficiently explore and answer such questions.
In this work, we leverage the strong semantic reasoning capabilities of large vision-language models (VLMs) to efficiently explore and answer such questions.
Explore until Confident: Efficient Exploration for. Embodied Question ... to efficiently explore and answer such questions. We first build a semantic�...
We release the HM-EQA dataset, which includes 500 questions about 267 scenes from the HM-3D dataset. They are available in data/
This work proposes a method that first builds a semantic map of the scene based on depth information and via visual prompting of a VLM - leveraging its vast�...
Experimental results have demonstrated that the proposed framework is capable of answering more complicated and realistic questions in the embodied environment.
Experimental results have demonstrated that the proposed framework is capable of answering more complicated and realistic questions in the embodied environment.
Mar 23, 2024Explore until Confident: Efficient Exploration for Embodied Question Answering � no code implementations • 23 Mar 2024 • Allen Z. Ren, Jaden�...
Jul 8, 2024The paper introduces a new method called "Explore until Confident" (EuC) for efficient exploration in embodied question answering tasks. EuC�...