arXiv daily keyword digest · 2026-04-28

#1 Complete Cyclic Subtask Graphs for Tool-Using LLM Agents: Flexibility, Cost, and Bottlenecks in Multi-Agent Workflows

Score: 36.9

Matched keywords: agent, agent workflow, benchmark, llm, multi-agent, token, tool-using

Categories: cs.MA, cs.AI

Compressed abstract: Long-horizon tool-using tasks sometimes benefit from revisiting earlier subtasks for recovery and exploration, but added multi-agent workflow flexibility can also introduce coordination overhead and substantial inference cost. We study complete cyclic subtask graphs, a deliberately maximally flexible multi-agent architecture in which executable subtask nodes are fully connected and a unified state-analysis-and-routi…

Open summary page · arXiv · PDF

#2 GAMED.AI: A Hierarchical Multi-Agent Framework for Automated Educational Game Generation

Score: 34.4

Matched keywords: agent, agent framework, ai, alignment, multi-agent, reasoning, token

Categories: cs.AI

Compressed abstract: We introduce GameDAI, a hierarchical multi-agent framework that transforms instructor-provided questions into fully playable, pedagogically grounded educational games validated through formal mechanic contracts. Built on phase-based LangGraph sub-graphs, deterministic Quality Gates, and structured Pydantic schemas, GameDAI supports two template families encompassing 15 interaction mechanics across spatial reasoning,…

Open summary page · arXiv · PDF

#3 Beyond Single-Agent Alignment: Preventing Context-Fragmented Violations in Multi-Agent Systems

Score: 28.0

Matched keywords: agent, alignment, benchmark, multi-agent, prompt, token

Categories: cs.MA, cs.AI, cs.CR, cs.LG

Compressed abstract: We identify and formalize a novel security risk: Context-Fragmented Violations (CFVs) - a class of policy breaches where individual agent actions appear locally safe and reasonable, yet collectively violate organizational policies because critical policy facts are siloed in different departments private contexts. Existing prompt-based alignment mechanisms and monolithic interceptors are poorly matched to violations…

Open summary page · arXiv · PDF

#4 GAMMAF: A Common Framework for Graph-Based Anomaly Monitoring Benchmarking in LLM Multi-Agent Systems

Score: 35.2

Matched keywords: agent, benchmark, large language models, llm, multi-agent, prompt, token

Categories: cs.CR, cs.AI

Compressed abstract: The rapid integration of Large Language Models (LLMs) into Multi-Agent Systems (MAS) has significantly enhanced their collaborative problem-solving capabilities, but it has also expanded their attack surfaces, exposing them to vulnerabilities such as prompt infection and compromised inter-agent communication. While emerging graph-based anomaly detection methods show promise in protecting these networks, the field cu…

Open summary page · arXiv · PDF

#5 Towards Automated Ontology Generation from Unstructured Text: A Multi-Agent LLM Approach

Score: 31.2

Matched keywords: agent, large language models, llm, multi-agent

Categories: cs.AI

Compressed abstract: Automatically generating formal ontologies from unstructured natural language remains a central challenge in knowledge engineering. While large language models (LLMs) show promise, it remains unclear which architectural design choices drive generation quality and why current approaches fail.

Open summary page · arXiv · PDF

#6 Peer Identity Bias in Multi-Agent LLM Evaluation: An Empirical Study Using the TRUST Democratic Discourse Analysis Pipeline

Score: 31.2

Matched keywords: agent, large language model, llm, multi-agent

Categories: cs.CY, cs.AI, cs.MA

Compressed abstract: The TRUST democratic discourse analysis pipeline exposes its large language model (LLM) components to peer model identity through multiple structural channels -- a design feature whose bias implications have not previously been empirically tested. We provide the first systematic measurement of identity-dependent scoring bias across all active identity exposure channels in TRUST, crossing four model families with two…

Open summary page · arXiv · PDF

#7 How Personal Characteristics Shape User Exploration of Diverse Movie Recommendations with a LLM-Based Multi-Agent System

Score: 30.0

Matched keywords: agent, large language model, llm, multi-agent

Categories: cs.HC

Compressed abstract: Diversity is an important evaluation criterion for recommender systems beyond accuracy, yet users differ in their willingness to engage with novel and diverse content. In this work, we investigate how a Large Language Model (LLM)-based multi-agent system supports users' exploration of diverse recommendations, and how individual characteristics shape user experiences.

Open summary page · arXiv · PDF

#8 When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

Score: 25.6

Matched keywords: code generation, large language models, llm, prompt

Categories: cs.SE

Compressed abstract: Large language models are increasingly used for code generation, yet the correctness of their outputs depends not only on model capability but also on how tasks are specified. Prior studies demonstrate that small changes in natural language prompts, particularly under-specification can substantially reduce code correctness; however, these findings are largely based on minimal-specification benchmarks such as HumanEv…

Open summary page · arXiv · PDF

#9 Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective

Score: 30.0

Matched keywords: fine-tuning, in-context learning, large language models, llm, token

Categories: cs.CL, cs.LG

Compressed abstract: Large language models (LLMs) operate in two fundamental learning modes - fine-tuning (FT) and in-context learning (ICL) - raising key questions about which mode yields greater language proficiency and whether they differ in their inductive biases. Prior studies comparing FT and ICL have yielded mixed and inconclusive results due to inconsistent experimental setups.

Open summary page · arXiv · PDF

#10 SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning

Score: 21.4

Matched keywords: llm, reasoning

Categories: cs.LG, cs.AI, cs.CL

Compressed abstract: Recent mixed-policy optimization methods for LLM reasoning that interleave or blend supervised and reinforcement learning signals report improvements over the standard SFT-then-RL pipeline. We show that numerous recently published research papers rely on a faulty baseline caused by two distinct bugs: a CPU-offloaded optimizer bug in DeepSpeed that silently drops intermediate micro-batches during gradient accumulatio…

Open summary page · arXiv · PDF

2026-04-28 · arXiv Daily Keyword Digest (Top 10 of 1000)

#1 Complete Cyclic Subtask Graphs for Tool-Using LLM Agents: Flexibility, Cost, and Bottlenecks in Multi-Agent Workflows

#2 GAMED.AI: A Hierarchical Multi-Agent Framework for Automated Educational Game Generation

#3 Beyond Single-Agent Alignment: Preventing Context-Fragmented Violations in Multi-Agent Systems

#4 GAMMAF: A Common Framework for Graph-Based Anomaly Monitoring Benchmarking in LLM Multi-Agent Systems

#5 Towards Automated Ontology Generation from Unstructured Text: A Multi-Agent LLM Approach

#6 Peer Identity Bias in Multi-Agent LLM Evaluation: An Empirical Study Using the TRUST Democratic Discourse Analysis Pipeline

#7 How Personal Characteristics Shape User Exploration of Diverse Movie Recommendations with a LLM-Based Multi-Agent System

#8 When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

#9 Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective

#10 SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning