2026-05-11 · arXiv Daily Keyword Digest (Top 10 of 932)

Generated: 2026-05-12T08:02:23.999635+09:00

Target date (KST): 2026-05-11

Selection: picked 10 from 932 papers published on the target date

Source: https://export.arxiv.org/api/query (`cat:cs.*`, sorted by submittedDate desc)

Selection logic: keyword-weight score + subject boost

#1 MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

Score: 37.0

Matched keywords: agent, large language model, llm, multi-agent, prompt

Categories: cs.AI, cs.CL, cs.LG, cs.MA

Compressed abstract: Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals.

Open summary page · arXiv · PDF

#2 BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation

Score: 30.0

Matched keywords: agent, ai, benchmark, llm, multi-agent, rag, reasoning

Categories: cs.RO, cs.AI

Compressed abstract: Biological laboratory automation can reduce repetitive manual work and improve reproducibility, but reliable embodied execution in wet-lab environments remains challenging. Protocols are often unstructured, labware is frequently transparent or reflective, and multi-step procedures require state-aware execution beyond one-shot instruction following.

Open summary page · arXiv · PDF

#3 MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

Score: 19.4

Matched keywords: deep learning, fine-tuning, llm

Categories: cs.CL, cs.AI, cs.LG

Compressed abstract: With the rise in scale for deep learning models to billions of parameters, the computational cost of fine-tuning remains a significant barrier to deployment. While Low-Rank Adaptation (LoRA) has become the standard for parameter-efficient fine-tuning, the need to set a predefined, static rank r requires exhaustive grid searches to balance efficiency and performance.

Open summary page · arXiv · PDF

#4 Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation

Score: 17.9

Matched keywords: fine-tuning, large language models, llm

Categories: cs.CL, cs.AI

Compressed abstract: Recent literature on fine-tuning Large Language Models highlights a fundamental debate. While Full Fine-Tuning (FFT) provides the representational plasticity required for high-entropy knowledge injection, Low-Rank Adaptation (LoRA) can match or surpass FFT performance because many tasks only require updates in a low-rank space and benefit from LoRA's additional regularization.

Open summary page · arXiv · PDF

#5 Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations

Score: 32.2

Matched keywords: agent, ai, ai agents, alignment, large language model, multi-agent

Categories: cs.AI, cs.LG, cs.MA

Compressed abstract: Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from spurious similarity, as consequential coalitions may form at the level of internal representations before any overt behavioral change is apparent.

Open summary page · arXiv · PDF

#6 GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning

Score: 31.7

Matched keywords: agent, agent framework, large language models, multi-agent, reasoning

Categories: cs.AI, cs.MA

Compressed abstract: Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systematic multi-step reasoning, especially on larger graphs.

Open summary page · arXiv · PDF

#7 OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning

Score: 25.6

Matched keywords: benchmark, large language model, llm, multimodal, reasoning

Categories: q-bio.GN, cs.AI, q-bio.CB

Compressed abstract: Interpreting transcriptomic data is one of the most common analytical tasks in modern biology. Yet most current models either consume expression profiles without producing natural-language biological explanations, or reason in language without direct access to quantitative omics measurements.

Open summary page · arXiv · PDF

#8 Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback

Score: 26.0

Matched keywords: benchmark, fine-tuning, foundation model, llm, prompt

Categories: cs.LG

Compressed abstract: Recent works have advanced feedback-based learning systems, whereby a foundation model is able to intake incoming feedback (e.g., a user) to self-improve, creating a self-loop system of training. However, existing works are limited in needing to consider an offline setup to allow for such feedback-based methods, and are further limited in the need of requiring privileged ground-truth contexts for training.

Open summary page · arXiv · PDF

#9 MAVEN: Multi-Agent Verification-Elaboration Network with In-Step Epistemic Auditing

Score: 25.4

Matched keywords: agent, multi-agent, reasoning

Categories: cs.CL, cs.AI, cs.LG

Compressed abstract: While explicit reasoning trajectories enhance model interpretability, existing paradigms often rely on monolithic chains that lack intermediate verification, allowing early errors to cascade unchecked. This lack of modularity impedes granular auditing and compromises the epistemic trust required for high-stakes applications.

Open summary page · arXiv · PDF

#10 AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents

Score: 28.2

Matched keywords: agent, benchmark, llm, reasoning, tool use

Categories: cs.AI

Compressed abstract: As LLM-based agents increasingly rely on external tools, it is important to evaluate their ability to sustain tool-grounded reasoning beyond familiar workflows and short-range interactions. We introduce AgentEscapeBench, an escape-room-style benchmark that tests whether agents can infer, execute, and revise novel tool-use procedures under explicit long-range dependency constraints.

Open summary page · arXiv · PDF