#1 When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems
Score: 28.2
Matched keywords: agent, llm, multi-agent
Categories: cs.AI, cs.LG
Compressed abstract: LLM-based multi-agent systems can fail even when planned actions are executed correctly because agents may misjudge their knowledge when evaluating plan feasibility, a phenomenon we term epistemic miscalibration in planning. Unlike execution errors, epistemic miscalibration is latent during planning, as generated plans can remain self-consistent and executable without observable errors; the miscalibration is also dy…
Open summary page · arXiv · PDF
#2 How to Steer Your Multi-Agent System: Human-LLM Collaborative Planning
Score: 32.0
Matched keywords: agent, ai, benchmark, llm, multi-agent, reasoning
Categories: cs.MA, cs.HC
Compressed abstract: In orchestrated multi-agent systems, humans often struggle to manage plans due to their complexity and limited transparency. Existing approaches rely on outcome-level supervision, where users verify only final outputs without visibility into intermediate reasoning.
Open summary page · arXiv · PDF
#3 ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU
Score: 18.4
Matched keywords: large language model, llm, token
Categories: cs.LG, cs.CL, cs.PF
Compressed abstract: ModeSwitch-LLM is a lightweight request-boundary controller for improving single-GPU large language model inference efficiency by routing each request to an appropriate fixed inference mode. Instead of relying on one static serving configuration, the system selects among FP16, quantized modes, speculative decoding, and hybrid modes such as GPTQ plus prefix caching and INT8 plus continuous batching using cheap worklo…
Open summary page · arXiv · PDF
#4 Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals
Score: 20.4
Matched keywords: llm, reasoning
Categories: cs.CL, cs.AI
Compressed abstract: Recent RL methods have substantially improved the reasoning abilities of LLMs. Existing reward designs mainly follow two paradigms: (1) Reinforcement learning with verifiable rewards (RLVR) derives outcome signals from executable checks or ground-truth answers, but provides limited guidance for intermediate reasoning behaviors.
Open summary page · arXiv · PDF
#5 MARGIN: Runtime Confidence Calibration for Multi-Agent Foundation Model Coordination
Score: 25.6
Matched keywords: agent, foundation model, foundation models, multi-agent
Categories: cs.LG, cs.MA
Compressed abstract: Foundation model agents increasingly operate in multi-agent deployments where a coordinator must decide which agent's response to trust. The standard approach weights agents by their self-reported confidence, but recent evidence shows that foundation model confidence is systematically mis-calibrated and, on hard tasks, inversely correlated with accuracy.
Open summary page · arXiv · PDF
#6 Parallel Context Compaction for Long-Horizon LLM Agent Serving
Score: 30.8
Matched keywords: agent, llm, prompt, reasoning
Categories: cs.AI
Compressed abstract: Long-horizon LLM agents accumulate growing conversation histories that eventually exceed the model's context window. Context compaction via LLM-based summarization keeps the conversation bounded, but summarization is inherently lossy and the blocking call stalls agent inference for tens of seconds.
Open summary page · arXiv · PDF
#7 Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents
Score: 16.0
Matched keywords: agent, benchmark, llm
Categories: cs.LG, cs.SE
Compressed abstract: Long-horizon language agents can make many plausible local tool calls yet fail to persist until a requested count is actually complete. We study this gap as Quantitative Goal Persistence (QGP): whether an agent keeps working until an external verifier confirms enough distinct valid items.
Open summary page · arXiv · PDF
#8 Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography
Score: 16.4
Matched keywords: alignment, large language models, llm
Categories: cs.CL, cs.AI, q-bio.NC
Compressed abstract: Intermediate layers of large language models (LLMs) best predict human brain responses to language, one of the most robust findings in computational neurolinguistics, yet why remains mechanistically unexplained. We address this gap by bridging sparse autoencoders (SAEs) from mechanistic interpretability with neural encoding models, decomposing GPT-2 XL and Llama-3.1-8 B into 16 K-32 K interpretable features per laye…
Open summary page · arXiv · PDF
#9 Brain-LLM Alignment Tracks Training Data, Not Typology
Score: 17.4
Matched keywords: alignment, llm
Categories: cs.CL, cs.AI, q-bio.NC
Compressed abstract: Brain-LLM alignment is well established in English, yet the brain's language network is neuroanatomically universal across languages. Does alignment also generalize cross-linguistically, and what governs the variation?
Open summary page · arXiv · PDF
#10 A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism
Score: 10.9
Matched keywords: agent, ai, large language models, multi-agent
Categories: cs.CL, cs.AI
Compressed abstract: Characteristic linguistic behaviors associated with Social Language Disorder (SLD) in autism spectrum disorder, including echoic repetition, pronoun displacement, and stereotyped media quoting, are largely absent from spontaneous conversation and only emerge under specific conversational conditions. In structured clinical assessments, this latency means that questioning strategy selection is a critical yet underappr…
Open summary page · arXiv · PDF