#1 Submodular Multi-Agent Policy Learning for Online Distributed Task Allocation in Open Multi-Agent Systems
Score: 15.0
Matched keywords: agent, multi-agent
Categories: eess.SY
Compressed abstract: This paper studies multi-agent reinforcement learning with submodular team utilities for online distributed task allocation. In this setting, each agent selects one action from a local categorical policy, so feasible joint actions form a partition matroid over agent-action pairs.
Open summary page · arXiv · PDF
#2 AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents
Score: 43.1
Matched keywords: agent, ai, code generation, foundation model, foundation models, harness, harness engineering
Categories: cs.SE, cs.AI
Compressed abstract: Foundation models have transformed automated code generation, yet autonomous software-engineering agents remain unreliable in realistic development settings. The dominant explanation locates this gap in model capability.
Open summary page · arXiv · PDF
#3 IdeaForge: A Knowledge Graph-Grounded Multi-Agent Framework for Cross-Methodology Innovation Analysis and Patent Claim Generation
Score: 33.5
Matched keywords: agent, agent framework, ai, multi-agent, prompt, reasoning
Categories: cs.AI, cs.IR, cs.MA
Compressed abstract: Current AI-assisted innovation systems typically apply a single ideation methodology (such as TRIZ or Design Thinking) using sequential prompt-based workflows that do not preserve intermediate reasoning structure. As a result, insights generated across methodologies remain fragmented, limiting traceability, synthesis, and systematic evaluation of novelty.
Open summary page · arXiv · PDF
#4 Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
Score: 20.2
Matched keywords: agent, ai, ai agent, benchmark
Categories: cs.AI, cs.CR
Compressed abstract: Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting.
Open summary page · arXiv · PDF
#5 RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning
Score: 27.7
Matched keywords: benchmark, fine-tuning, llm, prompt, reasoning
Categories: cs.CL, cs.AI
Compressed abstract: LLM-as-a-judge is now the default measurement instrument for open-ended generation, but on the public JudgeBench benchmark even strong instruction-tuned judges barely scrape past random on objective-correctness pairwise items. We introduce RTLC, a three-stage prompting recipe -- Research, Teach-to-Learn, Critique -- that promotes a single black-box LLM into an ensemble-of-thought judge with no fine-tuning, retrieval…
Open summary page · arXiv · PDF
#6 Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning
Score: 24.2
Matched keywords: agent, benchmark, multi-agent, reasoning
Categories: cs.AI
Compressed abstract: Multi-modal multi-agent systems (MM-MAS) have gained increasing attention for their capacity to enable complex reasoning and coordination across diverse modalities. As these systems continue to expand in scale and functionality, investigating their potential vulnerabilities has become increasingly important.
Open summary page · arXiv · PDF
#7 Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning
Score: 20.2
Matched keywords: fine-tuning, large language models, llm
Categories: cs.LG, cs.AI
Compressed abstract: Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing work has studied the effect of selecting data based on heuristics such as perplexity, difficulty, or length, the reported findings are often inconsistent or context-dependent.
Open summary page · arXiv · PDF
#8 AgenticAITA: A Proof-Of-Concept About Deliberative Multi-Agent Reasoning for Autonomous Trading Systems
Score: 33.7
Matched keywords: agent, ai, large language model, llm, multi-agent, reasoning
Categories: q-fin.TR, cs.AI, stat.ME
Compressed abstract: Conventional algorithmic trading systems are grounded in deterministic heuristics or offline-trained statistical models that cannot adapt to the semantic complexity of rapidly shifting market regimes. This paper introduces AGENTICAITA, an agentic AI framework that replaces the traditional signal then execute paradigm with a fully autonomous deliberative loop in which multiple specialized Large Language Model agents…
Open summary page · arXiv · PDF
#9 Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention
Score: 23.4
Matched keywords: agent, diffusion, multi-agent, multimodal
Categories: cs.AI, cs.LG, cs.MA
Compressed abstract: Multi-Agent Path Finding (MAPF) is a coordination problem that requires computing globally consistent, collision-free trajectories from individual start positions to assigned goal positions under combinatorial planning complexity. In dense environments, suboptimal initial plans induce compound conflicts that hinder feasible repair.
Open summary page · arXiv · PDF
#10 Seg-Agent: Test-Time Multimodal Reasoning for Training-Free Language-Guided Segmentation
Score: 25.6
Matched keywords: agent, benchmark, large language models, multimodal, reasoning
Categories: cs.CV, cs.AI
Compressed abstract: Language-guided segmentation transcends the scope limitations of traditional semantic segmentation, enabling models to segment arbitrary target regions based on natural language instructions. Existing approaches typically adopt a two-stage framework: employing Multimodal Large Language Models (MLLMs) to interpret instructions and generate visual prompts, followed by foundational segmentation models (e.g., SAM) to pr…
Open summary page · arXiv · PDF