2026-05-28 · arXiv Daily Keyword Digest (Top 10 of 897)

Generated: 2026-05-29T08:02:23.522615+09:00

Target date (KST): 2026-05-28

Selection: picked 10 from 897 papers published on the target date

Source: https://export.arxiv.org/api/query (`cat:cs.*`, sorted by submittedDate desc)

Selection logic: keyword-weight score + subject boost

#1 Mixture-of-Experts Knowledge Graph Retrieval-Augmented Generation for Multi-Agent LLM-based Recommendation

Score: 49.7

Matched keywords: agent, agent framework, alignment, large language models, llm, multi-agent, rag, retrieval-augmented

Categories: cs.IR

Compressed abstract: Large language models (LLMs) have recently been adopted for recommendations due to their ability to understand user intent and item semantics. However, LLM-based recommender systems often rely on parametric knowledge and suffer from outdated knowledge, motivating knowledge graph retrieval-augmented generation (KG-RAG) to ground recommendations on structured, up-to-date KGs.

Open summary page · arXiv · PDF

#2 LegalGraphRAG: Multi-Agent Graph Retrieval-Augmented Generation for Reliable Legal Reasoning

Score: 37.8

Matched keywords: agent, llm, multi-agent, rag, reasoning, retrieval-augmented

Categories: cs.CL, cs.AI, cs.MA

Compressed abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) advances flat document retrieval by structuring knowledge as relational graphs, enabling more coherent and effective reasoning. However, applying it to specific domains like legal reasoning faces critical challenges.

Open summary page · arXiv · PDF

#3 HARP: Measuring Harm Amplification in Multi-Agent LLM Systems

Score: 35.2

Matched keywords: agent, harness, llm, multi-agent, prompt, token

Categories: cs.CR, cs.AI, cs.LG

Compressed abstract: Multi-agent LLM systems decompose workflows across agents, tools, shared context, memory, and decision gates. This modularity improves interpretability, but creates a propagation risk: a bounded perturbation to one component can be reused by other agents and amplified into system-level harm.

Open summary page · arXiv · PDF

#4 Multi-Agent LLM-based Metamorphic Testing for REST APIs

Score: 31.7

Matched keywords: agent, agent workflow, llm, multi-agent

Categories: cs.SE, cs.AI

Compressed abstract: As REST APIs become an increasingly significant part of software systems, their validation is becoming more critical. Hence, testing and uncovering underlying issues are of utmost importance for improving software quality.

Open summary page · arXiv · PDF

#5 Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification

Score: 26.2

Matched keywords: agent, large language model, llm, multi-agent

Categories: cs.AI

Compressed abstract: Recent years have witnessed the rapid development of Large Language Model-based Multi-Agent Systems (MAS), which excel at collaborative decision-making and complex problem-solving. However, malicious agents in MAS may inject misinformation to mislead other agents and disrupt system performance, giving rise to a new research direction that focuses on attack mechanisms and defense strategies in MAS.

Open summary page · arXiv · PDF

#6 Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

Score: 26.2

Matched keywords: agent, ai, ai agents, llm, multi-agent

Categories: cs.AI

Compressed abstract: LLM safety evaluations predominantly test models in isolation, yet deployed AI agents increasingly operate within persistent social environments alongside other agents. We introduce a Moltbook-style simulation platform where thousands of LLM agents interact across communities over a simulated month, and use it to evaluate privacy as a downstream safety concern under varying degrees of social pressure.

Open summary page · arXiv · PDF

#7 Decoupled Intelligence: A Multi-Agent LLM Framework for Controllable Traffic Scenario Generation in SUMO

Score: 36.5

Matched keywords: agent, agent framework, large language models, llm, multi-agent, reasoning

Categories: cs.MA, cs.HC

Compressed abstract: The integration of Large Language Models (LLMs) with microscopic traffic simulation offers a promising path toward autonomous urban planning and intelligent transportation analysis. However, existing monolithic agent architectures often struggle with the complexity of end-to-end simulation workflows, leading to reasoning failures, parameter inconsistency, and a lack of systematic state management.

Open summary page · arXiv · PDF

#8 Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows

Score: 29.7

Matched keywords: agent, alignment, benchmark, harness, llm, reasoning

Categories: cs.AI

Compressed abstract: LLM agents are increasingly deployed as executable systems that use tools, modify workspaces, and produce concrete artifacts. In such workflows, performance depends not only on the base model, but also on the harness: the system layer that manages context, tools, state, constraints, permissions, tracing, and recovery.

Open summary page · arXiv · PDF

#9 Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities

Score: 22.4

Matched keywords: ai, alignment, large language models, llm

Categories: cs.CL, cs.AI, cs.SI

Compressed abstract: Large language models (LLMs) are increasingly utilized as proxies for computational social analysis; yet, their ability to faithfully represent the "thick descriptions" (Geertz, 1973) of human communities remains a critical challenge. Current evaluations often reduce social identity to static labels, sidelining how real-world groups navigate social shifts.

Open summary page · arXiv · PDF

#10 Roles with Rails: Contract-Preserving Role Evolution in Multi-Agent Structured Reasoning

Score: 35.0

Matched keywords: agent, llm, multi-agent, prompt, reasoning

Categories: cs.CL

Compressed abstract: Role-based LLM multi-agent systems need adaptive role pools, yet adapting such systems is not merely a matter of prompt optimization: roles often carry structural obligations, including capability coverage, message compatibility, validation, final-answer aggregation, and parser-compatible output protocols. Existing systems either fix the role inventory and lose adaptivity, or allow unconstrained generation to induce…

Open summary page · arXiv · PDF