#6 IndoorR2 X: Indoor Robot-to-Everything Coordination with LLM-Driven Planning

Detailed Summary (EN)

Problem definition

Indoor service robots are transitioning from single-agent demos to teams that must jointly carry out long-horizon tasks such as cleaning, cooking assistance, object fetching, and device operation [1]–[4].
In realistic homes and offices, however, multi-robot coordination is fundamentally constrained by partial observability: each robot only sees what lies in its current field of view and what it has already explored.
Under these constraints, teams frequently waste effort through redundant exploration, inconsistent beliefs about object locations or device states, and brittle task allocation when plans must be revised online.
At the same time, indoor environments are increasingly instrumented with ambient IoT sensors, which can provide persistent, wide-coverage observations unavailable to any single robot [5]–[8].

Core idea & method

for Large Language Model (LLM)-driven multi-robot task planning with Robotto-Everything (R2X) perception and communication in indoor environments.
IndoorR2X integrates observations from mobile robots and static IoT devices to construct a global semantic state that supports scalable scene understanding, reduces redundant exploration, and enables high-level coordination through LLMbased planning.
IndoorR2X provides configurable simulation environments, sensor layouts, robot teams, and task suites to systematically evaluate semantic-level coordination strategies.
Extensive experiments across diverse settings demonstrate that IoT-augmented world modeling improves multi-robot efficiency and reliability, and we highlight key insights and failure modes for advancing LLM-based collaboration between robot teams and indoor IoT sensors.
INTRODUCTION Indoor service robots are transitioning from single-agent demos to teams that must jointly carry out long-horizon tasks such as cleaning, cooking assistance, object fetching, and device operation [1]–[4].

Experimental setup & results

Results with GPT-4.1 as the LLM planner in a three-robot setting.
LLM Tokens/Scene ↓ Comparison to prior works (adapted to our setting in Sec.
III, e.g., target positions are unknown before exploration.) SMART-LLM (Adapted) [2] (IROS 2024) 88% 124 119 m 43,397 EMOS (Adapted) [15] (ICLR 2025) 88% 135 122 m 51,394 Our method (w/ ablations), which compares three communication configurations.
IR (w/o X & Inter-robot Comm.) 66% 186 137 m 54,572 R2R (w/o X Comm.) 92% 116 99 m 47,875 R2X 92% 108 88 m 42,438 TABLE III: Performance comparison across different sizes of LLMs.
LLM Tokens/Scene ↓ Llama-3.1-8b-instruct (8 billion parameters) 6% 47 55 m 50,512 Gemma-3-27b-it (27 billion parameters) 64% 113 94 m 26,753 GPT-4.1 (estimated ∼1.8 trillion parameters) 92% 108 88 m 42,438 C.

Limitations & risks

is that most LLM-based multirobot frameworks assume that the “information bottleneck” is primarily robot-to-robot communication: the planner is fed only robots’ onboard observations (sometimes with simplified global state in simulation), and belief updates largely come from physical exploration and dialogue [28]–[30].
In contrast, IndoorR2X explicitly models an R2X information channel by fusing robot observations with ambient IoT sensing into a global semantic state maintained by a coordination hub, allowing the LLM planner to reason over shared, timestamped, cross-source state.
Benchmarks Under Partial Observability and MultiRobot Exploration Partial observability is central to embodied decision making and is commonly formalized through POMDP-style formulations [31].
In multi-robot settings, limited fields of view and incomplete maps make coordination challenging and often lead to redundant exploration, motivating classical work on coordinated exploration and frontier-based search [32], [33].

Read-like-fullpaper digest

This paper addresses Indoor service robots are transitioning from single-agent demos to teams that must jointly carry out long-horizon tasks such as cleaning, cooking assistance, object fetching, and device operation [1]–[4]. The core method is for Large Language Model (LLM)-driven multi-robot task planning with Robotto-Everything (R2X) perception and communication in indoor environments. Key empirical findings include Results with GPT-4.1 as the LLM planner in a three-robot setting.

상세 요약 (KO)

문제 정의

실내 서비스 로봇은 단일 에이전트 데모에서 청소, 요리 지원, 물건 가져오기, 장치 작동과 같은 장거리 작업을 공동으로 수행해야 하는 팀으로 전환하고 있습니다[1]-[4].
그러나 실제 가정과 사무실에서 다중 로봇 조정은 근본적으로 부분적인 관찰 가능성으로 인해 제한됩니다. 각 로봇은 현재 시야에 있는 것과 이미 탐색한 것만 볼 수 있습니다.
이러한 제약으로 인해 팀은 중복된 탐색, 개체 위치 또는 장치 상태에 대한 일관되지 않은 믿음, 계획을 온라인으로 수정해야 할 때 취약한 작업 할당으로 인해 노력을 낭비하는 경우가 많습니다.
동시에 실내 환경에는 주변 IoT 센서가 점점 더 많이 탑재되고 있으며, 이는 단일 로봇으로는 사용할 수 없는 지속적이고 광범위한 관찰을 제공할 수 있습니다[5]-[8].

핵심 아이디어/방법

실내 환경에서 R2X(Robotto-Everything) 인식 및 통신을 통해 LLM(대규모 언어 모델) 기반 다중 로봇 작업 계획을 위한 것입니다.
IndoorR2X는 모바일 로봇과 정적 IoT 장치의 관찰을 통합하여 확장 가능한 장면 이해를 지원하고 중복 탐색을 줄이며 LLM 기반 계획을 통해 높은 수준의 조정을 지원하는 글로벌 의미 상태를 구축합니다.
IndoorR2X는 의미 수준의 조정 전략을 체계적으로 평가하기 위해 구성 가능한 시뮬레이션 환경, 센서 레이아웃, 로봇 팀 및 작업 제품군을 제공합니다.
다양한 설정에 걸친 광범위한 실험을 통해 IoT 증강 세계 모델링이 다중 로봇 효율성과 신뢰성을 향상시키는 것으로 나타났으며, 로봇 팀과 실내 IoT 센서 간의 LLM 기반 협업을 발전시키기 위한 주요 통찰력과 실패 모드를 강조합니다.
소개 실내 서비스 로봇은 단일 에이전트 데모에서 청소, 요리 지원, 물건 가져오기 및 장치 작동과 같은 장거리 작업을 공동으로 수행해야 하는 팀으로 전환하고 있습니다[1]-[4].

실험 설정/결과

3개 로봇 설정에서 LLM 플래너로 GPT-4.1을 사용한 결과입니다.
LLM 토큰/장면 ↓ 이전 작업과 비교(Sec.
III, 예를 들어 탐사 전에 목표 위치를 알 수 없습니다.) SMART-LLM(적응) [2] (IROS 2024) 88% 124 119 m 43,397 EMOS (적응) [15] (ICLR 2025) 88% 135 122 m 51,394 세 가지 통신 구성을 비교하는 우리의 방법(절제 포함).
IR(X 및 로봇 간 통신 없음) 66% 186 137 m 54,572 R2R(X 통신 없음) 92% 116 99 m 47,875 R2X 92% 108 88 m 42,438 표 III: 다양한 크기의 LLM에 대한 성능 비교.
LLM 토큰/장면 ↓ Llama-3.1-8b-instruct(80억 매개변수) 6% 47 55m 50,512 Gemma-3-27b-it(270억 매개변수) 64% 113 94m 26,753 GPT-4.1(추정 ~1.8조 매개변수) 92% 108 88m 42,438 다.

한계/리스크

대부분의 LLM 기반 멀티로봇 프레임워크는 "정보 병목 현상"이 주로 로봇 간 통신이라고 가정합니다. 계획자는 로봇의 온보드 관찰(때때로 시뮬레이션에서 단순화된 전역 상태 포함)만 제공받고 신념 업데이트는 주로 물리적 탐색과 대화에서 비롯됩니다[28]-[30].
이와 대조적으로 IndoorR2X는 로봇 관찰과 주변 IoT 감지를 조정 허브에 의해 유지되는 전역 의미 상태로 융합하여 R2X 정보 채널을 명시적으로 모델링하므로 LLM 플래너가 공유되고 타임스탬프가 지정된 크로스 소스 상태를 추론할 수 있습니다.
부분 관찰 가능성 및 멀티로봇 탐색에 대한 벤치마크 부분 관찰 가능성은 구체화된 의사 결정의 핵심이며 일반적으로 POMDP 스타일 공식을 통해 공식화됩니다[31].
다중 로봇 설정에서 제한된 시야와 불완전한 지도는 조정을 어렵게 만들고 종종 중복 탐색으로 이어져 조정 탐색 및 프론티어 기반 검색에 대한 고전적인 작업에 동기를 부여합니다[32], [33].

전체 논문 읽은 느낌 요약

이 문서에서는 실내 서비스 로봇이 단일 에이전트 데모에서 청소, 요리 지원, 물건 가져오기 및 장치 작동과 같은 장거리 작업을 공동으로 수행해야 하는 팀으로 전환하고 있음을 다룹니다[1]-[4]. 핵심 방법은 실내 환경에서 R2X(Robotto-Everything) 인식 및 통신을 통한 LLM(Large Language Model) 기반 멀티 로봇 작업 계획입니다. 주요 경험적 발견에는 3개 로봇 설정에서 LLM 플래너로 GPT-4.1을 사용한 결과가 포함됩니다.