#1 RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography
Score: 28.7
Matched keywords: agent, ai, ai agent, reasoning, tool-using
Categories: cs.AI
Compressed abstract: Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or refine.
Open summary page · arXiv · PDF
#2 StoryCoder: Narrative Reformulation for Structured Reasoning in LLM Code Generation
Score: 24.4
Matched keywords: alignment, code generation, llm, reasoning
Categories: cs.CL, cs.AI
Compressed abstract: Effective code generation requires both model capability and a problem representation that carefully structures how models reason and plan. Existing approaches augment reasoning steps or inject specific structure into how models think, but leave scattered problem conditions unchanged.
Open summary page · arXiv · PDF
#3 VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs
Score: 38.7
Matched keywords: agent, agent framework, benchmark, code generation, large language models, llm, multi-agent, reasoning
Categories: cs.AR, cs.AI, cs.LG, cs.MA, cs.PL
Compressed abstract: Generating synthesizable Verilog for large, hierarchical hardware designs remains a significant challenge for large language models (LLMs), which struggle to replicate the structured reasoning that human experts employ when translating complex specifications into RTL. When tasked with producing hierarchical Verilog, LLMs frequently lose context across modules, hallucinate interfaces, fabricate inter-module wiring, a…
Open summary page · arXiv · PDF
#4 AIBuildAI: An AI Agent for Automatically Building AI Models
Score: 40.7
Matched keywords: agent, ai, ai agent, benchmark, large language model, llm, reasoning, tool use
Categories: cs.AI
Compressed abstract: AI models underpin modern intelligent systems, driving advances across science, medicine, finance, and technology. Yet developing high-performing AI models remains a labor-intensive process that requires expert practitioners to iteratively design architectures, engineer representations, implement training pipelines and refine approaches through empirical evaluation.
Open summary page · arXiv · PDF
#5 MARS^2: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation
Score: 27.9
Matched keywords: agent, code generation, multi-agent, reasoning
Categories: cs.AI, cs.CL
Compressed abstract: Reinforcement learning (RL) paradigms have demonstrated strong performance on reasoning-intensive tasks such as code generation. However, limited trajectory diversity often leads to diminishing returns, which constrains the achievable performance ceiling.
Open summary page · arXiv · PDF
#6 CAMO: An Agentic Framework for Automated Causal Discovery from Micro Behaviors to Macro Emergence in LLM Agent Simulations
Score: 21.4
Matched keywords: agent, llm
Categories: cs.AI, cs.CL, cs.CY
Compressed abstract: LLM-empowered agent simulations are increasingly used to study social emergence, yet the micro-to-macro causal mechanisms behind macro outcomes often remain unclear. This is challenging because emergence arises from intertwined agent interactions and meso-level feedback and nonlinearity, making generative mechanisms hard to disentangle.
Open summary page · arXiv · PDF
#7 MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining
Score: 21.8
Matched keywords: benchmark, llm, multimodal, reasoning
Categories: cs.LG, cs.AI, cs.CL
Compressed abstract: Domain reweighting can improve sample efficiency and downstream generalization, but data-mixture optimization for multimodal midtraining remains largely unexplored. Current multimodal training recipes tune mixtures along a single dimension, typically data format or task type.
Open summary page · arXiv · PDF
#8 Where are the Humans? A Scoping Review of Fairness in Multi-agent AI Systems
Score: 20.2
Matched keywords: agent, ai, multi-agent
Categories: cs.AI
Compressed abstract: Rapid advances in Generative AI are giving rise to increasingly sophisticated Multi-Agent AI (MAAI) systems. While AI fairness has been extensively studied in traditional predictive scenarios, its examination in MAAI remains nascent and fragmented.
Open summary page · arXiv · PDF
#9 Coalition Formation in LLM Agent Networks: Stability Analysis and Convergence Guarantees
Score: 31.2
Matched keywords: agent, large language model, llm, multi-agent
Categories: cs.GT, cs.AI
Compressed abstract: Large Language Model (LLM) agents are increasingly deployed in multi-agent systems requiring strategic coordination. While recent work has analyzed LLM behavior in two-player games, coalition formation, where n agents dynamically form cooperative groups, remains theoretically uncharacterized.
Open summary page · arXiv · PDF
#10 Dissecting Failure Dynamics in Large Language Model Reasoning
Score: 19.6
Matched keywords: large language model, large language models, reasoning, token
Categories: cs.AI, cs.CL
Compressed abstract: Large Language Models (LLMs) achieve strong performance through extended inference-time deliberation, yet how their reasoning failures arise remains poorly understood. By analyzing model-generated reasoning trajectories, we find that errors are not uniformly distributed but often originate from a small number of early transition points, after which reasoning remains locally coherent but globally incorrect.
Open summary page · arXiv · PDF