2026-06-04 · arXiv Daily Keyword Digest (Top 10 of 680)

Generated: 2026-06-05T08:02:19.336638+09:00

Target date (KST): 2026-06-04

Selection: picked 10 from 680 papers published on the target date

Source: https://export.arxiv.org/api/query (`cat:cs.*`, sorted by submittedDate desc)

Selection logic: keyword-weight score + subject boost

#1 Streaming Communication in Multi-Agent Reasoning

Score: 24.4

Matched keywords: agent, multi-agent, reasoning

Categories: cs.CL, cs.AI, cs.MA

Compressed abstract: Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency.

Open summary page · arXiv · PDF

#2 SMADE-IE: Sparse Multi-Agent Framework with Evidence-Driven Debate for Zero-Shot Information Extraction

Score: 36.1

Matched keywords: agent, agent framework, benchmark, large language models, multi-agent, reasoning, token

Categories: cs.CL

Compressed abstract: Zero-shot information extraction (IE) with large language models (LLMs) has attracted increasing attention due to its flexibility in adapting to new schemas and domains without task-specific training. Existing approaches mainly rely on monolithic prompting, each-type prompting, or multi-agent debate.

Open summary page · arXiv · PDF

#3 Self-Reflective APIs: Structure Beats Verbosity for AI Agent Recovery

Score: 34.7

Matched keywords: agent, ai, ai agent, llm, prompt, reasoning, token

Categories: cs.SE, cs.AI

Compressed abstract: When an AI agent calls an API and hits a validation error, it needs more than what went wrong -- it needs what to do next. A self-reflective API returns, on validation failure, a machine-readable recovery_feedback.suggestions[] payload sufficient for the agent to repair the request and retry without external reasoning.

Open summary page · arXiv · PDF

#4 SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models

Score: 41.2

Matched keywords: agent, ai, ai agents, benchmark, large language models, llm, multi-agent, reasoning

Categories: cs.AI

Compressed abstract: As LLMs become more widely deployed, they are increasingly expected to work alongside other AI agents rather than operating in isolation. Effective coordination in these settings requires agents to communicate, share information and make decisions under uncertainty.

Open summary page · arXiv · PDF

#5 Notarized Agents: Receiver-Attested Confidential Receipts for AI Agent Actions

Score: 25.6

Matched keywords: agent, ai, ai agent, token

Categories: cs.CR, cs.AI, cs.DC

Compressed abstract: Current AI agent observability is structurally compromised: the entity producing the activity log is the same entity whose activity is being logged. A compromised or buggy agent can omit, alter, or fabricate its own traces, and the operator running the agent has no independent way to detect tampering.

Open summary page · arXiv · PDF

#6 StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

Score: 39.2

Matched keywords: benchmark, code generation, fine-tuning, llm, reasoning, retrieval-augmented

Categories: cs.AI, cs.AR, cs.CL

Compressed abstract: Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel framework that combines stepwise trajectory modeling, process-reward modeling (PRM), and retrieval-augmented fine-tuning (RAFT) to enhance both the functional correctness and reasoning fidelity…

Open summary page · arXiv · PDF

#7 Agent Planning Benchmark: A Diagnostic Framework for Planning Capabilities in LLM Agents

Score: 24.4

Matched keywords: agent, benchmark, llm, multimodal

Categories: cs.CL

Compressed abstract: Planning is central to LLM agents: before acting, an agent must decompose goals, select tools, reason over constraints, and decide when a task is infeasible. Yet existing agent evaluations often report only end-to-end success, making it difficult to determine whether failures stem from planning or execution.

Open summary page · arXiv · PDF

#8 The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation

Score: 19.3

Matched keywords: code generation, large language models, llm, prompt

Categories: cs.SE, cs.AI, cs.LG

Compressed abstract: Large language models (LLMs) now generate substantial production code, often for tasks with multiple valid algorithmic solutions. Incidental prompt cues, meaning contextual words or metadata outside the task specification, can steer which algorithm the model selects, even when all outputs pass the same tests.

Open summary page · arXiv · PDF

#9 Channel Fracture: Architectural Blind Spots in Scheduled Cross-Agent Memory Injection for Multi-Agent Orchestration Systems

Score: 16.5

Matched keywords: agent, ai, multi-agent

Categories: cs.MA

Compressed abstract: Multi-agent AI orchestration systems increasingly rely on persistent memory to maintain context across sessions, agents, and tasks. When one agent must inject knowledge into another agent's memory -- a common requirement in hierarchical team architectures -- the delivery mechanism must be architecturally sound.

Open summary page · arXiv · PDF

#10 From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents

Score: 21.2

Matched keywords: agent, large language model, llm

Categories: cs.CR, cs.AI

Compressed abstract: Large language model (LLM)-based agents increasingly solve complex tasks by interacting with external tools, retrieval systems, memory modules, environments, and other agents. These capabilities expand agent autonomy, but also make agent behavior harder to verify, debug, and audit.

Open summary page · arXiv · PDF