#1 Agent-First Tool API: A Semantic Interface Paradigm for Enterprise AI Agent Systems
Score: 23.2
Matched keywords: agent, ai, ai agent, ai agents
Categories: cs.AI
Compressed abstract: As AI agents transition from research prototypes to enterprise production systems, the tool interfaces they consume remain rooted in human-oriented CRUD paradigms. This paper identifies five fundamental architectural mismatches between conventional APIs and autonomous agent requirements: exact-identifier dependence, rendering-oriented responses, single-shot interaction assumptions, user-equivalent authorization, and…
Open summary page · arXiv · PDF
#2 Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning
Score: 29.8
Matched keywords: agent, ai, ai agent, ai agents, prompt, reasoning
Categories: cs.CR, cs.AI
Compressed abstract: We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning. Unlike prompt injection, Oracle Poisoning manipulates the data agents reason over, not their instructions.
Open summary page · arXiv · PDF
#3 SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
Score: 31.2
Matched keywords: agent, coding agent, llm
Categories: cs.AI
Compressed abstract: LLM/VLM-based digital agents have advanced rapidly thanks to scalable sandboxes for coding, web navigation, and computer use, which provide rich interactive training grounds. In contrast, embodied agents still lack abundant, diverse, and automatically generated 3 D environments for interactive learning.
Open summary page · arXiv · PDF
#4 PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines
Score: 32.2
Matched keywords: agent, benchmark, llm, multi-agent, prompt, token
Categories: cs.AI
Compressed abstract: Multi-agent LLM systems introduce a security risk in which sensitive information accessed by one agent can propagate through shared context and reappear in downstream outputs, even without explicit adversarial intent. We formalise this phenomenon as propagation amplification, where leakage risk increases across agent boundaries as sensitive content is repeatedly exposed to downstream generators.
Open summary page · arXiv · PDF
#5 TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems
Score: 27.2
Matched keywords: agent, llm, multi-agent
Categories: cs.CL
Compressed abstract: Multi-agent systems (MAS) have emerged as a promising paradigm for solving complex tasks. Recent work has explored self-evolving MAS that automatically optimize agent capabilities or communication topologies.
Open summary page · arXiv · PDF
#6 RFAmpDesigner: A Self-Evolving Multi-Agent LLM Framework for Automated Radio Frequency Amplifier Design
Score: 40.1
Matched keywords: agent, agent framework, large language models, llm, multi-agent, rag, retrieval-augmented
Categories: cs.AR
Compressed abstract: Automating radio frequency (RF) amplifier design remains challenging because existing methods suffer from the curse of dimensionality, weak use of domain knowledge, and poor transferability, leading to low data efficiency. Meanwhile, although large language models (LLMs) have shown promise in many scientific domains, applying them directly to RF sizing is nontrivial due to the numerical nature of circuit optimizatio…
Open summary page · arXiv · PDF
#7 Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
Score: 25.2
Matched keywords: large language models, llm, reasoning
Categories: cs.AI, cs.CL, stat.ML
Compressed abstract: Reasoning-capable large language models (LLMs) have recently been adopted as automated judges, but their benefits and costs in LLM-as-a-Judge settings remain unclear. Through controlled comparisons between reasoning and non-reasoning judges, we show that explicit reasoning substantially improves judgment accuracy on tasks requiring structured verification (e.g., math and coding), while offering limited or even negat…
Open summary page · arXiv · PDF
#8 Consistency as a Testable Property: Statistical Methods to Evaluate AI Agent Reliability
Score: 23.2
Matched keywords: agent, ai, ai agent
Categories: cs.AI
Compressed abstract: This paper establishes a rigorous measurement science for AI agent reliability, providing a foundational framework for quantifying consistency under semantically preserving perturbations. By leveraging U-statistics for output-level reliability and kernel-based metrics for trajectory-level stability, we offer a principled approach to evaluating agents across diverse operating conditions.
Open summary page · arXiv · PDF
#9 ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models
Score: 24.2
Matched keywords: agent, large language models, reasoning
Categories: cs.CV, cs.AI
Compressed abstract: Recent advances in Multi-modal Large Language Models (MLLMs) target 3 D spatial intelligence, yet the progress has been largely driven by post-training on curated benchmarks, leaving the inference-time approach relatively underexplored. In this paper, we take a training-free perspective and introduce ViSRA, a human-aligned Video-based Spatial Reasoning Agent, as a framework to probe the spatial reasoning mechanism o…
Open summary page · arXiv · PDF
#10 TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning
Score: 22.9
Matched keywords: agent, ai, ai agent, foundation models, reasoning
Categories: cs.AI
Compressed abstract: Time series analysis underpins forecasting, monitoring, and decision making in domains such as finance and weather, where solving a task often requires both numerical accuracy and contextual reasoning. Recent progress has moved from specialized neural predictors to approaches built on LLMs and foundation models that can reason over time series inputs and use external tools.
Open summary page · arXiv · PDF