2026-04-17 · arXiv Daily Keyword Digest (Top 10 of 624)

Generated: 2026-04-18T08:02:22.433302+09:00

Target date (KST): 2026-04-17

Selection: picked 10 from 624 papers published on the target date

Source: https://export.arxiv.org/api/query (`cat:cs.*`, sorted by submittedDate desc)

Selection logic: keyword-weight score + subject boost

#1 RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

Score: 28.7

Matched keywords: agent, ai, ai agent, reasoning, tool-using

Categories: cs.AI

Compressed abstract: Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or refine.

Open summary page · arXiv · PDF

#2 StoryCoder: Narrative Reformulation for Structured Reasoning in LLM Code Generation

Score: 24.4

Matched keywords: alignment, code generation, llm, reasoning

Categories: cs.CL, cs.AI

Compressed abstract: Effective code generation requires both model capability and a problem representation that carefully structures how models reason and plan. Existing approaches augment reasoning steps or inject specific structure into how models think, but leave scattered problem conditions unchanged.

Open summary page · arXiv · PDF

#3 VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs

Score: 38.7

Matched keywords: agent, agent framework, benchmark, code generation, large language models, llm, multi-agent, reasoning

Categories: cs.AR, cs.AI, cs.LG, cs.MA, cs.PL

Compressed abstract: Generating synthesizable Verilog for large, hierarchical hardware designs remains a significant challenge for large language models (LLMs), which struggle to replicate the structured reasoning that human experts employ when translating complex specifications into RTL. When tasked with producing hierarchical Verilog, LLMs frequently lose context across modules, hallucinate interfaces, fabricate inter-module wiring, a…

Open summary page · arXiv · PDF

#4 AIBuildAI: An AI Agent for Automatically Building AI Models

Score: 40.7

Matched keywords: agent, ai, ai agent, benchmark, large language model, llm, reasoning, tool use

Categories: cs.AI

Compressed abstract: AI models underpin modern intelligent systems, driving advances across science, medicine, finance, and technology. Yet developing high-performing AI models remains a labor-intensive process that requires expert practitioners to iteratively design architectures, engineer representations, implement training pipelines and refine approaches through empirical evaluation.

Open summary page · arXiv · PDF

#5 MARS^2: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation

Score: 27.9

Matched keywords: agent, code generation, multi-agent, reasoning

Categories: cs.AI, cs.CL

Compressed abstract: Reinforcement learning (RL) paradigms have demonstrated strong performance on reasoning-intensive tasks such as code generation. However, limited trajectory diversity often leads to diminishing returns, which constrains the achievable performance ceiling.

Open summary page · arXiv · PDF

#6 CAMO: An Agentic Framework for Automated Causal Discovery from Micro Behaviors to Macro Emergence in LLM Agent Simulations

Score: 21.4

Matched keywords: agent, llm

Categories: cs.AI, cs.CL, cs.CY

Compressed abstract: LLM-empowered agent simulations are increasingly used to study social emergence, yet the micro-to-macro causal mechanisms behind macro outcomes often remain unclear. This is challenging because emergence arises from intertwined agent interactions and meso-level feedback and nonlinearity, making generative mechanisms hard to disentangle.

Open summary page · arXiv · PDF

#7 MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

Score: 21.8

Matched keywords: benchmark, llm, multimodal, reasoning

Categories: cs.LG, cs.AI, cs.CL

Compressed abstract: Domain reweighting can improve sample efficiency and downstream generalization, but data-mixture optimization for multimodal midtraining remains largely unexplored. Current multimodal training recipes tune mixtures along a single dimension, typically data format or task type.

Open summary page · arXiv · PDF

#8 Where are the Humans? A Scoping Review of Fairness in Multi-agent AI Systems

Score: 20.2

Matched keywords: agent, ai, multi-agent

Categories: cs.AI

Compressed abstract: Rapid advances in Generative AI are giving rise to increasingly sophisticated Multi-Agent AI (MAAI) systems. While AI fairness has been extensively studied in traditional predictive scenarios, its examination in MAAI remains nascent and fragmented.

Open summary page · arXiv · PDF

#9 Coalition Formation in LLM Agent Networks: Stability Analysis and Convergence Guarantees

Score: 31.2

Matched keywords: agent, large language model, llm, multi-agent

Categories: cs.GT, cs.AI

Compressed abstract: Large Language Model (LLM) agents are increasingly deployed in multi-agent systems requiring strategic coordination. While recent work has analyzed LLM behavior in two-player games, coalition formation, where n agents dynamically form cooperative groups, remains theoretically uncharacterized.

Open summary page · arXiv · PDF

#10 Dissecting Failure Dynamics in Large Language Model Reasoning

Score: 19.6

Matched keywords: large language model, large language models, reasoning, token

Categories: cs.AI, cs.CL

Compressed abstract: Large Language Models (LLMs) achieve strong performance through extended inference-time deliberation, yet how their reasoning failures arise remains poorly understood. By analyzing model-generated reasoning trajectories, we find that errors are not uniformly distributed but often originate from a small number of early transition points, after which reasoning remains locally coherent but globally incorrect.

Open summary page · arXiv · PDF