2026-05-13 · arXiv Daily Keyword Digest (Top 10 of 961)

Generated: 2026-05-14T08:02:25.050888+09:00

Target date (KST): 2026-05-13

Selection: picked 10 from 961 papers published on the target date

Source: https://export.arxiv.org/api/query (`cat:cs.*`, sorted by submittedDate desc)

Selection logic: keyword-weight score + subject boost

#1 Predictive Maps of Multi-Agent Reasoning: A Successor-Representation Spectrum for LLM Communication Topologies

Score: 35.7

Matched keywords: agent, large language model, llm, multi-agent, reasoning

Categories: cs.MA, cs.AI, cs.LG, cs.SI, math.SP

Compressed abstract: Practitioners deploying multi-agent large language model (LLM) systems must currently choose between communication topologies such as chain, star, mesh, and richer variants without any pre-inference diagnostic for which topology will amplify drift, converge to consensus, or remain robust under perturbation. Existing evaluation answers these questions only post hoc and only for the task measured.

Open summary page · arXiv · PDF

#2 Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations

Score: 19.0

Matched keywords: agent, diffusion, multi-agent

Categories: cs.RO

Compressed abstract: Imitation learning powered by generative models has proven effective for modeling complex single-agent behaviors. However, teaching multi-agent systems, like multiple arms or vehicles, to coordinate through imitation learning is hindered by a fundamental data bottleneck: as the joint state-action space grows exponentially with the number of agents, collecting a sufficient amount of coordinated multi-agent demonstrat…

Open summary page · arXiv · PDF

#3 FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems

Score: 33.6

Matched keywords: agent, large language models, llm, multi-agent, prompt

Categories: cs.CR

Compressed abstract: Multi-agent systems (MAS) powered by large language models (LLMs) increasingly adopt planner--executor architectures, where planners convert prompts into subtasks, roles, dependencies, and routing paths. This flexibility enables adaptive coordination, but exposes an attack surface in workflow formation: prompts can shape agent organization without modifying MAS infrastructure.

Open summary page · arXiv · PDF

#4 OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling

Score: 32.2

Matched keywords: agent, benchmark, large language models, llm, multi-agent

Categories: cs.AI

Compressed abstract: Large language models (LLMs) are increasingly used to translate natural-language optimization problems into mathematical formulations and solver code, but matching the reference objective value is not a reliable test of correctness: an artifact may agree numerically while still changing the underlying optimization semantics. We formulate this issue as optimization-modeling hallucination detection, namely structural…

Open summary page · arXiv · PDF

#5 The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck

Score: 29.5

Matched keywords: agent, llm, prompt, reasoning, tool-using

Categories: cs.CR, cs.AI

Compressed abstract: Tool-using LLM agents must act on untrusted webpages, emails, files, and API outputs while issuing privileged tool calls. Existing defenses often mediate trust at the granularity of an entire tool invocation, forcing a brittle choice in mixed-trust workflows: allow external content to influence a call and risk hijacked destinations or commands, or quarantine the call and block benign retrieval-then-act behavior.

Open summary page · arXiv · PDF

#6 Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems

Score: 35.3

Matched keywords: agent, ai, ai agents, llm, rag, reasoning

Categories: cs.AI

Compressed abstract: LLM-based conversational AI agents struggle to maintain coherent behavior over long horizons due to limited context. While RAG-based approaches are increasingly adopted to overcome this limitation by storing interactions in external memory modules and performing retrieval from them, their effectiveness in answering challenging questions (e.g., multi-hop, commonsense) ultimately depends on the agent's ability to reas…

Open summary page · arXiv · PDF

#7 Behavioral Integrity Verification for AI Agent Skills

Score: 27.7

Matched keywords: agent, ai, ai agent, benchmark, llm

Categories: cs.CR, cs.AI, eess.SY

Compressed abstract: Agent skills extend LLM agents with privileged third-party capabilities such as filesystem access, credentials, network calls, and shell execution. Existing safety work catches malicious prompts and risky runtime actions, but the skill artifact itself goes unverified.

Open summary page · arXiv · PDF

#8 Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry

Score: 23.2

Matched keywords: agent, ai, ai agent, ai agents

Categories: cs.AI, cs.CR

Compressed abstract: Autonomous AI agents increasingly extend their capabilities through Agent Skills: modular filesystem packages whose SKILL.md files describe when and how agents should use them. While this design enables scalable, on-demand capability expansion, it also introduces a semantic supply-chain risk in which natural-language metadata and instructions can affect which skills are admitted, surfaced, selected, and loaded.

Open summary page · arXiv · PDF

#9 LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents

Score: 23.7

Matched keywords: agent, llm, multi-agent

Categories: cs.AI

Compressed abstract: We propose a personal-LLM exchange (LLM-X), a scalable negotiation-oriented environment that enables direct, structured communication across populations of personal agents (LLMs), each representing an individual user. Unlike existing tool-centric protocols that focus on agent-API interaction, LLM-X introduces a message bus and routing substrate for LLM-to-LLM coordination with guarantees around schema validity and p…

Open summary page · arXiv · PDF

#10 Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

Score: 34.6

Matched keywords: code generation, fine-tuning, in-context learning, llm, prompt, reasoning

Categories: cs.LG, cs.AI

Compressed abstract: In LLM Reinforcement Fine-Tuning (RFT), curriculum learning drives both efficiency and performance. Yet, current methods externalize curriculum judgment via handcrafted heuristics or auxiliary models, risking misalignment with the policy's training dynamics.

Open summary page · arXiv · PDF