2026-04-03 · arXiv Daily Keyword Digest (Top 10 of 641)

Generated: 2026-04-05T10:58:34.403379+09:00

Target date (KST): 2026-04-03

Selection: picked 10 from 641 papers published on the target date

Source: https://export.arxiv.org/api/query (`cat:cs.*`, sorted by submittedDate desc)

Selection logic: keyword-weight score + subject boost

#1 A Multi-Agent Human-LLM Collaborative Framework for Closed-Loop Scientific Literature Summarization

Score: 29.2

Matched keywords: agent, ai, large language models, llm, multi-agent

Categories: cs.AI

Compressed abstract: Scientific discovery is slowed by fragmented literature that requires excessive human effort to gather, analyze, and understand. AI tools, including autonomous summarization and question answering, have been developed to aid in understanding scientific literature.

Open summary page · arXiv · PDF

#2 Blinded Radiologist and LLM-Based Evaluation of LLM-Generated Japanese Translations of Chest CT Reports: Comparative Study

Score: 2.4

Matched keywords: llm

Categories: cs.AI, cs.CL

Compressed abstract: Background: Accurate translation of radiology reports is important for multilingual research, clinical communication, and radiology education, but the validity of LLM-based evaluation remains unclear. Objective: To evaluate the educational suitability of LLM-generated Japanese translations of chest CT reports and compare radiologist assessments with LLM-as-a-judge evaluations.

Open summary page · arXiv · PDF

#3 From Multi-Agent to Single-Agent: When Is Skill Distillation Beneficial?

Score: 16.2

Matched keywords: agent, multi-agent

Categories: cs.AI

Compressed abstract: Multi-agent systems (MAS) tackle complex tasks by distributing expertise, though this often comes at the cost of heavy coordination overhead, context fragmentation, and brittle phase ordering. Distilling a MAS into a single-agent skill can bypass these costs, but this conversion lacks a principled answer for when and what to distill.

Open summary page · arXiv · PDF

#4 Adaptive Stopping for Multi-Turn LLM Reasoning

Score: 32.8

Matched keywords: large language models, llm, rag, reasoning, retrieval-augmented

Categories: cs.CL, cs.AI

Compressed abstract: Large Language Models (LLMs) increasingly rely on multi-turn reasoning and interaction, such as adaptive retrieval-augmented generation (RAG) and ReAct-style agents, to answer difficult questions. These methods improve accuracy by iteratively retrieving information, reasoning, or acting, but introduce a key challenge: When should the model stop?

Open summary page · arXiv · PDF

#5 Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning

Score: 19.0

Matched keywords: llm, rag, retrieval-augmented

Categories: cs.CL, cs.AI, cs.IR

Compressed abstract: Rerankers play a pivotal role in refining retrieval results for Retrieval-Augmented Generation. However, current reranking models are typically optimized on static human annotated relevance labels in isolation, decoupled from the downstream generation process.

Open summary page · arXiv · PDF

#6 ByteRover: Agent-Native Memory Through LLM-Curated Hierarchical Context

Score: 31.2

Matched keywords: agent, large language models, llm, reasoning

Categories: cs.AI

Compressed abstract: Memory-Augmented Generation (MAG) extends large language models with external memory to support long-context reasoning, but existing approaches universally treat memory as an external service that agents call into, delegating storage to separate pipelines of chunking, embedding, and graph extraction. This architectural separation means the system that stores knowledge does not understand it, leading to semantic drif…

Open summary page · arXiv · PDF

#7 Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations

Score: 22.2

Matched keywords: fine-tuning, large language models, llm, reasoning

Categories: cs.CL

Compressed abstract: Large language models demonstrate strong performance on mathematical reasoning benchmarks, yet remain surprisingly fragile to meaning-preserving surface perturbations. We systematically evaluate three open-weight LLMs, Mistral-7 B, Llama-3-8 B, and Qwen2.5-7 B, on 677 GSM8 K problems paired with semantically equivalent variants generated through name substitution and number format paraphrasing.

Open summary page · arXiv · PDF

#8 LLM Agents as Social Scientists: A Human-AI Collaborative Platform for Social Science Automation

Score: 27.7

Matched keywords: agent, ai, fine-tuning, llm, reasoning

Categories: cs.AI

Compressed abstract: Traditional social science research often requires designing complex experiments across vast methodological spaces and depends on real human participants, making it labor-intensive, costly, and difficult to scale. Here we present S-Researcher, an LLM-agent-based platform that assists researchers in conducting social science research more efficiently and at greater scale by "siliconizing" both the research process an…

Open summary page · arXiv · PDF

#9 De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory Rules

Score: 22.2

Matched keywords: ai, alignment, llm, rag

Categories: cs.AI, cs.CL, cs.LG

Compressed abstract: Regulatory documents encode legally binding obligations that LLM-based systems must respect. Yet converting dense, hierarchically structured legal text into machine-readable rules remains a costly, expert-intensive process.

Open summary page · arXiv · PDF

#10 APEX: Agent Payment Execution with Policy for Autonomous Agent API Access

Score: 9.2

Matched keywords: agent

Categories: cs.CR, cs.AI

Compressed abstract: Autonomous agents are moving beyond simple retrieval tasks to become economic actors that invoke APIs, sequence workflows, and make real-time decisions. As this shift accelerates, API providers need request-level monetization with programmatic spend governance.

Open summary page · arXiv · PDF