2026-05-14 · arXiv Daily Keyword Digest (Top 10 of 819)

Generated: 2026-05-15T08:02:20.156094+09:00

Target date (KST): 2026-05-14

Selection: picked 10 from 819 papers published on the target date

Source: https://export.arxiv.org/api/query (`cat:cs.*`, sorted by submittedDate desc)

Selection logic: keyword-weight score + subject boost

#1 Submodular Multi-Agent Policy Learning for Online Distributed Task Allocation in Open Multi-Agent Systems

Score: 15.0

Matched keywords: agent, multi-agent

Categories: eess.SY

Compressed abstract: This paper studies multi-agent reinforcement learning with submodular team utilities for online distributed task allocation. In this setting, each agent selects one action from a local categorical policy, so feasible joint actions form a partition matroid over agent-action pairs.

Open summary page · arXiv · PDF

#2 AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents

Score: 43.1

Matched keywords: agent, ai, code generation, foundation model, foundation models, harness, harness engineering

Categories: cs.SE, cs.AI

Compressed abstract: Foundation models have transformed automated code generation, yet autonomous software-engineering agents remain unreliable in realistic development settings. The dominant explanation locates this gap in model capability.

Open summary page · arXiv · PDF

#3 IdeaForge: A Knowledge Graph-Grounded Multi-Agent Framework for Cross-Methodology Innovation Analysis and Patent Claim Generation

Score: 33.5

Matched keywords: agent, agent framework, ai, multi-agent, prompt, reasoning

Categories: cs.AI, cs.IR, cs.MA

Compressed abstract: Current AI-assisted innovation systems typically apply a single ideation methodology (such as TRIZ or Design Thinking) using sequential prompt-based workflows that do not preserve intermediate reasoning structure. As a result, insights generated across methodologies remain fragmented, limiting traceability, synthesis, and systematic evaluation of novelty.

Open summary page · arXiv · PDF

#4 Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

Score: 20.2

Matched keywords: agent, ai, ai agent, benchmark

Categories: cs.AI, cs.CR

Compressed abstract: Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting.

Open summary page · arXiv · PDF

#5 RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning

Score: 27.7

Matched keywords: benchmark, fine-tuning, llm, prompt, reasoning

Categories: cs.CL, cs.AI

Compressed abstract: LLM-as-a-judge is now the default measurement instrument for open-ended generation, but on the public JudgeBench benchmark even strong instruction-tuned judges barely scrape past random on objective-correctness pairwise items. We introduce RTLC, a three-stage prompting recipe -- Research, Teach-to-Learn, Critique -- that promotes a single black-box LLM into an ensemble-of-thought judge with no fine-tuning, retrieval…

Open summary page · arXiv · PDF

#6 Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

Score: 24.2

Matched keywords: agent, benchmark, multi-agent, reasoning

Categories: cs.AI

Compressed abstract: Multi-modal multi-agent systems (MM-MAS) have gained increasing attention for their capacity to enable complex reasoning and coordination across diverse modalities. As these systems continue to expand in scale and functionality, investigating their potential vulnerabilities has become increasingly important.

Open summary page · arXiv · PDF

#7 Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning

Score: 20.2

Matched keywords: fine-tuning, large language models, llm

Categories: cs.LG, cs.AI

Compressed abstract: Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing work has studied the effect of selecting data based on heuristics such as perplexity, difficulty, or length, the reported findings are often inconsistent or context-dependent.

Open summary page · arXiv · PDF

#8 AgenticAITA: A Proof-Of-Concept About Deliberative Multi-Agent Reasoning for Autonomous Trading Systems

Score: 33.7

Matched keywords: agent, ai, large language model, llm, multi-agent, reasoning

Categories: q-fin.TR, cs.AI, stat.ME

Compressed abstract: Conventional algorithmic trading systems are grounded in deterministic heuristics or offline-trained statistical models that cannot adapt to the semantic complexity of rapidly shifting market regimes. This paper introduces AGENTICAITA, an agentic AI framework that replaces the traditional signal then execute paradigm with a fully autonomous deliberative loop in which multiple specialized Large Language Model agents…

Open summary page · arXiv · PDF

#9 Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention

Score: 23.4

Matched keywords: agent, diffusion, multi-agent, multimodal

Categories: cs.AI, cs.LG, cs.MA

Compressed abstract: Multi-Agent Path Finding (MAPF) is a coordination problem that requires computing globally consistent, collision-free trajectories from individual start positions to assigned goal positions under combinatorial planning complexity. In dense environments, suboptimal initial plans induce compound conflicts that hinder feasible repair.

Open summary page · arXiv · PDF

#10 Seg-Agent: Test-Time Multimodal Reasoning for Training-Free Language-Guided Segmentation

Score: 25.6

Matched keywords: agent, benchmark, large language models, multimodal, reasoning

Categories: cs.CV, cs.AI

Compressed abstract: Language-guided segmentation transcends the scope limitations of traditional semantic segmentation, enabling models to segment arbitrary target regions based on natural language instructions. Existing approaches typically adopt a two-stage framework: employing Multimodal Large Language Models (MLLMs) to interpret instructions and generate visual prompts, followed by foundational segmentation models (e.g., SAM) to pr…

Open summary page · arXiv · PDF