#1 SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning
Score: 32.4
Matched keywords: agent, large language model, llm, reasoning
Categories: cs.AI, cs.CL, cs.CY
Compressed abstract: As Large Language Model (LLM) agents increasingly leverage the Model Context Protocol (MCP) to operate in complex environments, the expansion of their action spaces offers agents unsafe capabilities and underscores the risk of power-seeking. While broad action space and greater environment influence are essential for task fulfillment, they create a fragile risk surface where minor errors or hallucinations are magnif…
Open summary page · arXiv · PDF
#2 MOC: Multi-Order Communication in LLM-based Multi-Agent Systems
Score: 32.4
Matched keywords: agent, large language model, llm, multi-agent, token
Categories: cs.AI
Compressed abstract: Despite the remarkable progress of Large Language Model (LLM) based Multi-Agent Systems, most research focuses on optimizing coordination topology while largely underexploring the equally critical problem: how to transmit and optimize messages among agents effectively? Current communication schemes typically rely on the direct concatenation of first-order neighbor responses, which induces a restricted evidence recep…
Open summary page · arXiv · PDF
#3 LLM Consortium for Software Design Refinement: A Controlled Experiment on Multi-Agent Collaboration Topologies
Score: 30.2
Matched keywords: agent, llm, multi-agent, prompt, token
Categories: cs.SE, cs.AI, cs.MA
Compressed abstract: We present a controlled experiment evaluating 12 multi-agent LLM collaboration topologies for software architecture design. Using a 222 factorial design (Authority Roles Dynamics), we conducted 520 experimental runs across 8 design tasks of varying complexity, with 5 repetitions each.
Open summary page · arXiv · PDF
#4 Recognize Your Orchestrator: An Entropy Dynamics Perspective for LLM Multi-Agent Systems
Score: 29.2
Matched keywords: agent, llm, multi-agent, reasoning
Categories: cs.AI
Compressed abstract: The transition from single-turn models to Multi-Agent Systems (MAS) promises enhanced problem-solving capabilities, yet the centralized orchestration topology remains a critical point of fragility. To analyze this, we propose a Mean-Field Entropy Dynamics framework, modeling the orchestration process as a system governed by the competing forces of task resolution and cumulative context loading.
Open summary page · arXiv · PDF
#5 Not All Flips Are Conformity: Decomposing Stance Convergence in Multi-Agent LLM Debate
Score: 34.2
Matched keywords: agent, llm, multi-agent, reasoning
Categories: cs.CL
Compressed abstract: Multi-agent debate (MAD) is a promising strategy for improving LLM reasoning, but when agents converge on a shared answer, it is unclear whether that convergence reflects genuine deliberation or social compliance. We show that the conventional answer flip rate conflates three distinct mechanisms: spontaneous instability, stance-induced conformity, and reasoning-induced persuasion.
Open summary page · arXiv · PDF
#6 Attention-guided Fine-tuning of Multimodal Large Language Models Improves Chain-of-Thought Reasoning
Score: 29.8
Matched keywords: fine-tuning, large language models, multimodal, reasoning, token
Categories: cs.CV
Compressed abstract: The effectiveness of Chain-of-Thought (CoT) prompting in Multimodal Large Language Models (MLLMs) remains uncertain: across several visual reasoning benchmarks, CoT prompting often degrades performance compared to direct prompting. In this paper, we provide a systematic analysis of CoT behavior in three modern MLLM families across model scales on datasets requiring step-wise visual evidence.
Open summary page · arXiv · PDF
#7 COMAP: Co-Evolving World Models and Agent Policies for LLM Agents
Score: 16.4
Matched keywords: agent, llm
Categories: cs.AI, cs.CL
Compressed abstract: Equipping language agents with world models enables them to anticipate environment dynamics and evaluate candidate actions before execution. However, existing textual world models are typically fixed after training, preventing them from adapting to the on-policy state-action distributions induced by an evolving agent.
Open summary page · arXiv · PDF
#8 Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults
Score: 25.0
Matched keywords: agent, llm, prompt
Categories: cs.AI, cs.CL, cs.CR
Compressed abstract: LLM agents increasingly act after consuming ranked external information streams such as social feeds, search results, retrieval contexts, and email queues, yet safety evaluations almost always test the model or the user prompt in isolation, never the upstream ranker that decides what the agent reads just before it acts. We introduce a controlled protocol that holds the model, persona, topic, and final decision promp…
Open summary page · arXiv · PDF
#9 PlanarBench: Evaluating LLM Spatial Reasoning via Planar Graph Drawing
Score: 20.4
Matched keywords: llm, reasoning
Categories: cs.CL, cs.AI
Compressed abstract: PlanarBench tests whether LLMs can draw planar graphs as ASCII art given only an edge list -- a spatial reasoning task that resists memorization because edge order, edge orientation, and node labels are all permutable. We evaluate 91 models on the 199 simplest non-isomorphic connected planar graphs (2 - 7 vertices).
Open summary page · arXiv · PDF
#10 Mechanistic Diagnostics of Spatial Lexical Bias in Multimodal Large Language Model Spatial Reasoning
Score: 28.1
Matched keywords: large language model, large language models, llm, multimodal, reasoning
Categories: cs.CL, cs.CV
Compressed abstract: Multimodal large language models (MLLMs) remain unreliable on spatial multiple-choice questions, and their failures are often attributed to poorly attended visual information. In this work, we identify a complementary failure mode, spatial lexical bias: adding a spatial relation word to the answer options can attract the model's decision and make the newly added option likely to be selected.
Open summary page · arXiv · PDF