How to Steer Your Multi-Agent System: Human-LLM Collaborative Planning

Score: 32.0 | Matched keywords: agent, ai, benchmark, llm, multi-agent, reasoning

Abstract Snapshot

Compressed abstract

Main idea

In orchestrated multi-agent systems, humans often struggle to manage plans due to their complexity and limited transparency.

Method signal

Existing approaches rely on outcome-level supervision, where users verify only final outputs without visibility into intermediate reasoning. We formalize a design space for human-LLM co-planning interactions along three axes: mode (semantic vs.

Contribution signal

structural), scope (global vs. targeted), and level (low vs.

Original Abstract

In orchestrated multi-agent systems, humans often struggle to manage plans due to their complexity and limited transparency. Existing approaches rely on outcome-level supervision, where users verify only final outputs without visibility into intermediate reasoning. We formalize a design space for human-LLM co-planning interactions along three axes: mode (semantic vs. structural), scope (global vs. targeted), and level (low vs. high-level edits). We realize it in AMBIPOM, a prototype supporting process-level supervision through both semantic and structural interactions. Through a user study, we characterize how users navigate this space, revealing hybrid workflows and effort-control-risk trade-offs; through a controlled benchmark, we analyze how LLMs revise plans under varying scope and revision strategies. Our findings yield design insights for more transparent, controllable, and effective human-AI co-planning. We release code and data at https://github.com/megagonlabs/ambipom.