#9 "Who Am I, and Who Else Is Here?" Behavioral Differentiation Without Role Assignment in Multi-Agent LLM Systems

Score: 23.4 | Matched keywords: agent, large language models, llm, prompt

Detailed Summary (EN)

Read-like-fullpaper digest

We address these questions through a controlled experimental platform—the War Room—that orchestrates group conversations among 7 LLMs hosted on a unified inference backend (Groq), enabling precise control over model composition, system prompts, naming conventions, and natural agent failure. Our five research questions and principal findings are: RQ1 (Role Differentiation): All five behavioral trait flags show significant inter-agent variation (Kruskal–Wallis, Bonferroni-corrected p < 0.05 for 5/5 flags).

This is primarily a method paper. We address these questions through a controlled experimental platform—the War Room—that orchestrates group conversations among 7 LLMs hosted on a unified inference backend (Groq), enabling precise control over model composition, system prompts, naming conventions, and natural agent failure. We address these questions through a controlled experimental platform—the War Room—that orchestrates group conversations among 7 LLMs hosted on a unified inference backend (Groq), enabling precise control over model composition, system prompts, naming conventions, and natural agent failure. Critically, these behaviors are absent when agents operate in isolation, confirming that behavioral diversity is a structured, reproducible phenomenon driven by the interaction of architectural heterogeneity, group context, and prompt-level scaffolding.

We find that (1) heterogeneous groups exhibit significantly richer behavioral differentiation than homogeneous groups (cosine similarity 0.56 vs. RQ4 (Heterogeneity): Heterogeneous groups show significantly lower profile similarity than homogeneous groups (cosine 0.56

The paper’s conclusions should be interpreted within the scope of the reported evaluation and evidence. We find that (1) heterogeneous groups exhibit significantly richer behavioral differentiation than homogeneous groups (cosine similarity 0.56 vs.

Final takeaway

Main takeaway: We find that (1) heterogeneous groups exhibit significantly richer behavioral differentiation than homogeneous groups (cosine similarity 0.56 vs.
Important caution: The paper’s conclusions should be interpreted within the scope of the reported evaluation and evidence.

Problem definition

We address these questions through a controlled experimental platform—the War Room—that orchestrates group conversations among 7 LLMs hosted on a unified inference backend (Groq), enabling precise control over model composition, system prompts, naming conventions, and natural agent failure.
Our five research questions and principal findings are: RQ1 (Role Differentiation): All five behavioral trait flags show significant inter-agent variation (Kruskal–Wallis, Bonferroni-corrected p < 0.05 for 5/5 flags).

Core idea & method

We address these questions through a controlled experimental platform—the War Room—that orchestrates group conversations among 7 LLMs hosted on a unified inference backend (Groq), enabling precise control over model composition, system prompts, naming conventions, and natural agent failure.
We address these questions through a controlled experimental platform—the War Room—that orchestrates group conversations among 7 LLMs hosted on a unified inference backend (Groq), enabling precise control over model composition, system prompts, naming conventions, and natural agent failure. Critically, these behaviors are absent when agents operate in isolation, confirming that behavioral diversity is a structured, reproducible phenomenon driven by the interaction of architectural heterogeneity, group context, and prompt-level scaffolding.

Actual findings

We find that (1) heterogeneous groups exhibit significantly richer behavioral differentiation than homogeneous groups (cosine similarity 0.56 vs.
RQ4 (Heterogeneity): Heterogeneous groups show significantly lower profile similarity than homogeneous groups (cosine 0.56

How the conclusion was reached

Core contribution: We address these questions through a controlled experimental platform—the War Room—that orchestrates group conversations among 7 LLMs hosted on a unified inference backend (Groq), enabling precise control over model composition, system prompts, naming conventions, and natural agent failure.
Evaluation setup: We address these questions through a controlled experimental platform—the War Room—that orchestrates group conversations among 7 LLMs hosted on a unified inference backend (Groq), enabling precise control over model composition, system prompts, naming conventions, and natural agent failure. RQ2 (Compensatory Responses): Groups spontaneously exhibit compensatory response patterns when an agent crashes, with a three-level hierarchy from absence-noting to task redistribution.
Main supported conclusion: We find that (1) heterogeneous groups exhibit significantly richer behavioral differentiation than homogeneous groups (cosine similarity 0.56 vs.

Experimental setup & results

We address these questions through a controlled experimental platform—the War Room—that orchestrates group conversations among 7 LLMs hosted on a unified inference backend (Groq), enabling precise control over model composition, system prompts, naming conventions, and natural agent failure. RQ2 (Compensatory Responses): Groups spontaneously exhibit compensatory response patterns when an agent crashes, with a three-level hierarchy from absence-noting to task redistribution.
We find that (1) heterogeneous groups exhibit significantly richer behavioral differentiation than homogeneous groups (cosine similarity 0.56 vs.
RQ4 (Heterogeneity): Heterogeneous groups show significantly lower profile similarity than homogeneous groups (cosine 0.56

Limitations & risks

The paper’s conclusions should be interpreted within the scope of the reported evaluation and evidence.

상세 요약 (KO)

전체 논문 읽은 느낌 요약

우리는 통합 추론 백엔드(Groq)에서 호스팅되는 7개의 LLM 간의 그룹 대화를 조율하여 모델 구성, 시스템 프롬프트, 명명 규칙 및 자연적인 에이전트 오류를 정밀하게 제어할 수 있는 제어된 실험 플랫폼인 War Room을 통해 이러한 질문을 해결합니다. 5가지 연구 질문과 주요 결과는 다음과 같습니다. RQ1(역할 차별화): 5가지 행동 특성 플래그 모두 에이전트 간 상당한 차이를 보여줍니다(5/5 플래그에 대해 Kruskal–Wallis, Bonferroni 보정 p < 0.05). 이것은 주로 방법론 논문입니다. 우리는 통합 추론 백엔드(Groq)에서 호스팅되는 7개의 LLM 간의 그룹 대화를 조율하여 모델 구성, 시스템 프롬프트, 명명 규칙 및 자연적인 에이전트 오류를 정밀하게 제어할 수 있는 제어된 실험 플랫폼인 War Room을 통해 이러한 질문을 해결합니다. 우리는 통합 추론 백엔드(Groq)에서 호스팅되는 7개의 LLM 간의 그룹 대화를 조율하여 모델 구성, 시스템 프롬프트, 명명 규칙 및 자연적인 에이전트 오류를 정밀하게 제어할 수 있는 제어된 실험 플랫폼인 War Room을 통해 이러한 질문을 해결합니다. 비판적으로, 에이전트가 격리되어 작동할 때는 이러한 행동이 나타나지 않으며, 이는 행동 다양성이 구조적 이질성, 그룹 컨텍스트 및 프롬프트 수준 비계의 상호 작용에 의해 구동되는 구조화되고 재현 가능한 현상임을 확인합니다. 우리는 (1) 이질적인 그룹이 동질적인 그룹보다 훨씬 더 풍부한 행동 차별화를 보인다는 것을 발견했습니다(코사인 유사성 0.56 vs. RQ4(이질성): 이질적인 그룹은 동질적인 그룹보다 상당히 낮은 프로파일 유사성을 나타냅니다(코사인 0.56). 논문의 결론은 보고된 평가 및 증거의 범위 내에서 해석되어야 합니다. 우리는 (1) 이질적인 그룹이 동질적인 그룹보다 훨씬 더 풍부한 행동 차별화를 나타냄을 발견했습니다(코사인) 유사성 0.56 대

핵심 결론

주요 시사점: 우리는 (1) 이질적인 그룹이 동질적인 그룹보다 훨씬 더 풍부한 행동 차별화를 보인다는 것을 발견했습니다(코사인 유사성 0.56 대 0.56).
중요 주의 사항: 논문의 결론은 보고된 평가 및 증거의 범위 내에서 해석되어야 합니다.

문제 정의

우리는 통합 추론 백엔드(Groq)에서 호스팅되는 7개의 LLM 간의 그룹 대화를 조율하여 모델 구성, 시스템 프롬프트, 명명 규칙 및 자연적인 에이전트 오류를 정밀하게 제어할 수 있는 제어된 실험 플랫폼인 War Room을 통해 이러한 질문을 해결합니다.
5가지 연구 질문과 주요 결과는 다음과 같습니다. RQ1(역할 차별화): 5가지 행동 특성 플래그 모두 에이전트 간 상당한 차이를 보여줍니다(5/5 플래그에 대해 Kruskal–Wallis, Bonferroni 보정 p < 0.05).

핵심 아이디어/방법

우리는 통합 추론 백엔드(Groq)에서 호스팅되는 7개의 LLM 간의 그룹 대화를 조율하여 모델 구성, 시스템 프롬프트, 명명 규칙 및 자연적인 에이전트 오류를 정밀하게 제어할 수 있는 제어된 실험 플랫폼인 War Room을 통해 이러한 질문을 해결합니다.
우리는 통합 추론 백엔드(Groq)에서 호스팅되는 7개의 LLM 간의 그룹 대화를 조율하여 모델 구성, 시스템 프롬프트, 명명 규칙 및 자연적인 에이전트 오류를 정밀하게 제어할 수 있는 제어된 실험 플랫폼인 War Room을 통해 이러한 질문을 해결합니다. 비판적으로, 에이전트가 격리되어 작동할 때는 이러한 행동이 나타나지 않으며, 이는 행동 다양성이 구조적 이질성, 그룹 컨텍스트 및 프롬프트 수준 비계의 상호 작용에 의해 구동되는 구조화되고 재현 가능한 현상임을 확인합니다.

실제 결과

우리는 (1) 이질적인 그룹이 동질적인 그룹보다 훨씬 더 풍부한 행동 차별화를 보인다는 것을 발견했습니다(코사인 유사성 0.56 대 0.56).
RQ4(이종성): 이종 그룹은 동종 그룹에 비해 프로파일 유사성이 현저히 낮습니다(코사인 0.56).

결론이 나온 과정

핵심 기여: 우리는 통합 추론 백엔드(Groq)에서 호스팅되는 7개의 LLM 간의 그룹 대화를 조율하는 제어된 실험 플랫폼인 War Room을 통해 이러한 질문을 해결하여 모델 구성, 시스템 프롬프트, 명명 규칙 및 자연적인 에이전트 오류를 정밀하게 제어할 수 있습니다.
평가 설정: 통합 추론 백엔드(Groq)에서 호스팅되는 7개의 LLM 간의 그룹 대화를 조율하는 제어된 실험 플랫폼인 War Room을 통해 이러한 질문을 해결하여 모델 구성, 시스템 프롬프트, 명명 규칙 및 자연적인 에이전트 오류를 정밀하게 제어할 수 있습니다. RQ2(보상 응답): 에이전트가 충돌할 때 그룹은 부재 알림부터 작업 재분배까지 3단계 계층 구조를 통해 자동으로 보상 응답 패턴을 나타냅니다.
주요 뒷받침되는 결론: 우리는 (1) 이질적인 그룹이 동질적인 그룹보다 훨씬 더 풍부한 행동 차별화를 보인다는 것을 발견했습니다(코사인 유사성 0.56 대 0.56).

실험 설정/결과

우리는 통합 추론 백엔드(Groq)에서 호스팅되는 7개의 LLM 간의 그룹 대화를 조율하여 모델 구성, 시스템 프롬프트, 명명 규칙 및 자연적인 에이전트 오류를 정밀하게 제어할 수 있는 제어된 실험 플랫폼인 War Room을 통해 이러한 질문을 해결합니다. RQ2(보상 응답): 에이전트가 충돌할 때 그룹은 부재 알림부터 작업 재분배까지 3단계 계층 구조를 통해 자동으로 보상 응답 패턴을 나타냅니다.
우리는 (1) 이질적인 그룹이 동질적인 그룹보다 훨씬 더 풍부한 행동 차별화를 보인다는 것을 발견했습니다(코사인 유사성 0.56 대 0.56).
RQ4(이종성): 이종 그룹은 동종 그룹에 비해 프로파일 유사성이 현저히 낮습니다(코사인 0.56).

한계/리스크

논문의 결론은 보고된 평가 및 증거의 범위 내에서 해석되어야 합니다.