arXiv daily keyword digest · 2026-05-20

#1 EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design

Score: 53.2

Matched keywords: agent, agent framework, benchmark, large language model, llm, multi-agent, prompt, rag, retrieval-augmented, tool use

Categories: cs.AI, cs.LG, cs.MA

Compressed abstract: Large Language Model (LLM) agents are increasingly applied to engineering design tasks, yet existing evaluation frameworks do not adequately address multi-agent systems that combine simulation, retrieval, and manufacturing preparation. We introduce a benchmark suite with three evaluation dimensions: (1) a workflow benchmark with seven prompt styles targeting distinct cognitive demands-including direct tool use, sema…

Open summary page · arXiv · PDF

#2 MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

Score: 33.7

Matched keywords: agent, agent framework, ai, large language model, llm, multi-agent

Categories: cs.CL

Compressed abstract: The Mixture-of-Agents (MoA) framework has shown promise in improving large language model (LLM) performance by aggregating outputs from multiple agents. However, existing MoA systems often rely on static routers that do not fully capture temporal and contextual dependencies across aggregation layers.

Open summary page · arXiv · PDF

#3 Sequential Consensus for Multi-Agent LLM Debates: A Wald-SPRT compute governor with calibration-based failure detection

Score: 30.5

Matched keywords: agent, llm, multi-agent, reasoning

Categories: cs.LG

Compressed abstract: Multi-agent LLM debate improves factuality and reasoning, but most recipes pick a fixed round count, over-spending on easy items and under-spending on hard ones. We adapt Wald's Sequential Probability Ratio Test (SPRT) as a plug-in compute governor for LLM debates.

Open summary page · arXiv · PDF

#4 Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation

Score: 27.7

Matched keywords: agent, ai, ai agent, multi-agent

Categories: cs.AI, cs.LG

Compressed abstract: We adapt split conformal prediction and adaptive conformal inference (ACI) to continuous AI agent evaluation, providing distribution-free coverage guarantees for forecasted quality scores. Conformal intervals achieve calibration error below 0.02 across all nominal levels at the 24 h horizon, while ACI correctly widens intervals by 35% following agent releases then reconverges.

Open summary page · arXiv · PDF

#5 RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning

Score: 27.2

Matched keywords: agent, alignment, fine-tuning, multi-agent

Categories: cs.RO, cs.AI, cs.CV, cs.LG, cs.MA

Compressed abstract: Supervised open-loop training has been widely adopted for training traffic simulation models; however, it fails to capture the inherently dynamic, multi-agent interactions common in complex driving scenarios. We introduce RLFTSim, a reinforcement-learning-based fine-tuning framework that enhances scenario realism by aligning simulator rollouts with real-world data distributions and provides a method for distilling g…

Open summary page · arXiv · PDF

#6 CASPIAN: Online Detection and Attribution of Cascade Attacks in LLM Multi-Agent Systems via Cross-Channel Causal Monitoring

Score: 26.0

Matched keywords: agent, llm, multi-agent

Categories: cs.MA

Compressed abstract: Cascade attacks in LLM multi-agent systems (MAS) arise when adversarial influence propagates across agents and leads to escalated system-level failures through complex agent interactions. Detecting such cascades is challenging, as their signals are distributed, tightly coupled across interaction channels, and often appear plausibly benign locally but may unfold quickly either within a single turn or gradually across…

Open summary page · arXiv · PDF

#7 Supporting System Testing with a Multi-Agent LLM-based Framework for Knowledge Graph Extraction: A Case Study with Ethernet Switch Systems

Score: 25.3

Matched keywords: agent, llm, multi-agent, prompt

Categories: cs.SE

Compressed abstract: Technical documents contain rich domain knowledge for automating downstream tasks such as system testing. While this paper focuses on Ethernet switch configuration manuals (ESCMs), we propose a general framework that can be adapted to different industrial contexts.

Open summary page · arXiv · PDF

#8 STAR-PólyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision

Score: 30.4

Matched keywords: agent, agent framework, ai, multi-agent, reasoning

Categories: cs.MA, cs.AI, cs.CL

Compressed abstract: Frontier AI models and multi-agent systems have led to significant improvements in mathematical reasoning. However, for problems requiring extended, long-horizon reasoning, existing systems continue to suffer from fundamental reliability issues: hallucination accumulation, memory fragmentation, and imbalanced reasoning-tool trade-offs.

Open summary page · arXiv · PDF

#9 A Multi-Agent Framework for Feature-Constrained Difficulty Control in Reading Comprehension Item Generation

Score: 34.2

Matched keywords: agent, agent framework, large language models, llm, multi-agent

Categories: cs.CL

Compressed abstract: Recent studies in difficulty-controlled reading comprehension item generation have leveraged large language models (LLMs) to produce items by adjusting difficulty-related features. However, existing methods typically rely on a single-agent prompting approach, which often fails to consistently satisfy specified feature constraints, resulting in items that deviate from the target difficulty level.

Open summary page · arXiv · PDF

#10 Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

Score: 24.6

Matched keywords: benchmark, large language models, llm, multimodal, reasoning

Categories: cs.CL

Compressed abstract: Tool-augmented reasoning has emerged as a promising direction for enhancing the reasoning capabilities of multimodal large language models (MLLMs). However, existing studies mainly focus on enabling models to perform tool invocation, while neglecting the necessity of invoking tools.

Open summary page · arXiv · PDF

2026-05-20 · arXiv Daily Keyword Digest (Top 10 of 838)

#1 EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design

#2 MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

#3 Sequential Consensus for Multi-Agent LLM Debates: A Wald-SPRT compute governor with calibration-based failure detection

#4 Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation

#5 RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning

#6 CASPIAN: Online Detection and Attribution of Cascade Attacks in LLM Multi-Agent Systems via Cross-Channel Causal Monitoring

#7 Supporting System Testing with a Multi-Agent LLM-based Framework for Knowledge Graph Extraction: A Case Study with Ethernet Switch Systems

#8 STAR-PólyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision

#9 A Multi-Agent Framework for Feature-Constrained Difficulty Control in Reading Comprehension Item Generation

#10 Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning