One Paper Accepted To IEEE Robotics and Automation Letters (RA-L) (25.09.23)

Congratulations to Dogyu, Chanyoung, Daeho and Jaeho 

[Title]

GAIA: Generating Task Instruction Aware Simulation Grounded in Real Contexts using Vision-Language Models

[
Journal]
IEEE Robotics and Automation Letters (RA-L) 2025

[Authors]
Dogyu Ko†, Chanyoung Yeo†, Daeho Kim, Jaeho Kim and Hyoseok Hwang*

[Summary]
Enabling robots to interact effectively with the real world requires extensive learning from physical interaction data, making simulation crucial for generating such data safely and cost-effectively. Despite the advantages of simulation, manual environment creation remains a laborious process, motivating the development of automated generation approaches. However, the limitations of current automatic virtual scene generation approaches in bridging the sim-to-real gap and achieving task readiness necessitate the creation of automatically generated, realistic, and task-ready virtual scenes. In this paper, we propose GAIA, a novel methodology to automatically generate interactive, task-ready simulation environments grounded in real contexts from only a single RGB image and a task instruction. GAIA utilizes a pre-trained Vision-Language Model (VLM) without requiring explicit training, and jointly understands the visual context and the user’s instruction. Based on this understanding, it infers and places necessary task-aware objects, including unseen ones to construct an interactive virtual environment that maintains real-scene fidelity while reflecting task requirements without additional manual setup. We show qualitative experiments that GAIA generates spaces consistent with user instructions, and quantitative results that policies learned within these GAIA generated environments successfully transfer to target environments.

 

[Key Figure]