#1 Divide and Cooperate: Role-Decomposed Multi-Agent LLM Training with Cross-Agent Learning Signals
Score: 30.7
Matched keywords: agent, fine-tuning, llm, multi-agent, reasoning
Categories: cs.LG, cs.AI
Compressed abstract: Modern language agents which perform multi-step reasoning have shown strong performance in knowledge-intensive question answering. However, existing approaches typically couple evidence acquisition and answer generation within a single policy.
Open summary page · arXiv · PDF
#2 Early-Token Confidence Predicts Reasoning Quality in Multi-Agent LLM Debate
Score: 38.6
Matched keywords: agent, alignment, llm, multi-agent, reasoning, token
Categories: cs.CL
Compressed abstract: Evaluating reasoning quality in multi-agent LLM systems is challenging, especially for open-ended tasks without reference answers. We investigate whether intrinsic confidence signals, token-level log-probabilities from decoding, can predict reasoning quality as assessed by LLM-as-judge evaluation.
Open summary page · arXiv · PDF
#3 The Confident Liar: Diagnosing Multi-Agent Debate with Log-Probabilities and LLM-as-Judge
Score: 36.6
Matched keywords: agent, llm, multi-agent, reasoning, token
Categories: cs.CL, cs.AI
Compressed abstract: Multi-agent debate systems are typically evaluated only on whether the final answer is correct, overlooking the quality of the intermediate reasoning that debate is designed to produce. This paper studies the relationship between three signals in multi-agent debate: token-level log-probability distributions over reasoning tokens, LLM-as-judge rubric scores assigned to those tokens, and final task accuracy.
Open summary page · arXiv · PDF
#4 Dep-LLM: Training-Free Depression Diagnosis via Evidence-Guided Structured Multi-factor with Reliable LLM Reasoning
Score: 13.6
Matched keywords: fine-tuning, llm, reasoning, token
Categories: cs.CL, cs.AI
Compressed abstract: Automatic Depression Detection (ADD) from clinical interviews is a pivotal task in computational mental health, yet it remains challenging due to two critical obstacles: 1) difficulty in modeling complex but sparsely distributed depression clues within lengthy, multi-topic clinical interviews, leading to superficial and unreliable reasoning; 2) scarcity of labeled data due to clinical privacy, together with high cos…
Open summary page · arXiv · PDF
#5 Causal Ensemble Agent: Hierarchical Causal Discovery with LLM-guided Expert Reweighting
Score: 28.4
Matched keywords: agent, alignment, large language models, llm
Categories: cs.LG, cs.AI, cs.CL
Compressed abstract: Causal discovery aims to uncover causal structures from observational data, which is crucial for real-world decision-making. However, different causal discovery algorithms can produce divergent results that conflict with each other, complicating the identification of accurate causal graphs.
Open summary page · arXiv · PDF
#6 Decoupling Thought from Speech: Knowledge-Grounded Counterfactual Reasoning for Resilient Multi-Agent Argumentation
Score: 33.2
Matched keywords: agent, alignment, large language model, multi-agent, reasoning, retrieval-augmented
Categories: cs.MA, cs.AI, cs.CL
Compressed abstract: Multi-agent debate frameworks have been shown to improve large language model performance in convergent tasks, but they are currently optimized in a way that heavily favors final output accuracy rather than stability of the process. During long-horizon exchanges reactive systems under sustained perturbations often experience logic degradation, argument repetition, and role drift.
Open summary page · arXiv · PDF
#7 What makes a harness a harness: necessary and sufficient conditions for an agent harness
Score: 33.7
Matched keywords: agent, agent framework, artificial intelligence, coding agent, harness
Categories: cs.SE, cs.AI
Compressed abstract: The term agent harness now circulates widely in software engineering with generative artificial intelligence. It names the layer that wraps a language model and turns it into a coding agent able to act on a repository.
Open summary page · arXiv · PDF
#8 RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference
Score: 29.4
Matched keywords: fine-tuning, llm, reasoning, token, transformer
Categories: cs.LG, cs.AI, cs.CL
Compressed abstract: We introduce RKSC (Reasoning-Aware KV Cache Sharing), a training-free inference framework that eliminates two structural redundancies in multi-branch LLM reasoning pipelines. ASKS (Attention-Similarity KV Sharing) computes the prefix KV cache once and broadcasts it to all semantically similar branches via hidden-state cosine similarity, strictly generalising the token-exact prefix caching used by vLLM and SGLang.
Open summary page · arXiv · PDF
#9 The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment
Score: 21.2
Matched keywords: agent, ai, multi-agent, reasoning
Categories: cs.AI
Compressed abstract: As AI systems built from multiple language-model agents become more common, they are increasingly used to make decisions together: discussing, negotiating, and acting on shared tasks. While individual agents may appear well-aligned when tested on their own, problems can arise from how they interact with one another.
Open summary page · arXiv · PDF
#10 Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent Memory
Score: 21.4
Matched keywords: agent, llm
Categories: cs.AI, cs.CL
Compressed abstract: Long-term LLM agents need persistent memory that can track changing facts and provide relevant evidence across sessions. Existing memory systems often store observations as isolated records, summaries, or indexed fragments, which makes evidence aggregation, fact revision, and memory maintenance difficult.
Open summary page · arXiv · PDF