#1 TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination
Score: 38.5
Matched keywords: agent, fine-tuning, llm, multi-agent, reasoning
Categories: cs.LG, cs.MA
Compressed abstract: Multi-agent LLM systems have shown promise for complex reasoning, yet recent evaluations reveal they often underperform single-model baselines. We identify a structural failure mode in sequential fine-tuning of shared-context teams: updating one agent shifts the team's context distribution, and when subsequent updates are evaluated on cached rollouts, this mismatch compounds.
Open summary page · arXiv · PDF
#2 Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
Score: 31.8
Matched keywords: agent, llm, reasoning, token
Categories: cs.AI, cs.CL, cs.LG, cs.MA, eess.SY
Compressed abstract: Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across components. Yet practitioners lack guidance on which design choices improve performance versus merely increase inference costs.
Open summary page · arXiv · PDF
#3 CAX-Agent: A Lightweight Agent Harness for Reliable APDL Automation
Score: 27.2
Matched keywords: agent, benchmark, harness, large language models, llm
Categories: cs.AI, cs.CE
Compressed abstract: Large language models deployed for MAPDL finite-element simulation face practical reliability challenges: without structured execution control, tool encapsulation, and fault recovery, outputs may be inconsistent and task failures are common. The Agent Harness paradigm addresses this by inserting domain-specific orchestration middleware that manages tool lifecycles, workflow state, and recovery escalation.
Open summary page · arXiv · PDF
#4 Representation Without Reward: A JEPA Audit for LLM Fine-Tuning
Score: 26.0
Matched keywords: fine-tuning, harness, llm
Categories: cs.LG, cs.AI, stat.ML
Compressed abstract: Joint-embedding predictive architectures (JEPAs) propose that a model should learn more useful abstractions when trained to predict latent representations rather than observed outputs. For autoregressive language-model fine-tuning the principle entails a stricter requirement: the induced hidden-state geometry must reach the language-model head and improve the decoded task metric.
Open summary page · arXiv · PDF
#5 From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery
Score: 25.0
Matched keywords: fine-tuning, llm, prompt
Categories: cs.CE, cs.AI, cs.CL
Compressed abstract: Modern quantitative trading increasingly relies on systematic models to extract predictive signals from large-scale financial data, where alpha factor discovery plays a central role in transforming market observations into tradable signals. Recent LLM-based methods have shown promise in automating factor generation, but most of them still rely on prompt-level generation--evaluation--feedback loops for iterative opti…
Open summary page · arXiv · PDF
#6 Effective Harness Engineering for Algorithm Discovery with Coding Agents
Score: 22.8
Matched keywords: harness, harness engineering, large language models, token
Categories: cs.SE, cs.AI, cs.CL
Compressed abstract: AlphaEvolve and FunSearch have demonstrated the potential of combining large language models (LLMs) with evolutionary search for automated algorithm discovery. However, discovery success is shaped not only by model capability but also significantly by the design of the execution infrastructure, i.e., the harness.
Open summary page · arXiv · PDF
#7 Property-Guided LLM Program Synthesis for Planning
Score: 19.2
Matched keywords: llm, program synthesis
Categories: cs.AI, cs.LG
Compressed abstract: LLMs have shown impressive success in program synthesis, discovering programs that surpass prior solutions. However, these approaches rely on simple numeric scores to signal program quality, such as the value of the solution or the number of passed tests.
Open summary page · arXiv · PDF
#8 SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch
Score: 21.4
Matched keywords: agent, alignment, benchmark, multi-agent, rlhf
Categories: cs.AI
Compressed abstract: Multi-agent orchestration frameworks such as LangChain, LangGraph, and CrewAI route tasks through graph-based pipelines but do not enforce the stage constraints that govern real business processes. We present SDOF, a framework that treats multi-agent execution as a constrained state machine.
Open summary page · arXiv · PDF
#9 An LLM-RAG Approach for Healthy Eating Index-Informed Personalized Food Recommendations
Score: 29.1
Matched keywords: ai, artificial intelligence, large language models, llm, rag, retrieval-augmented
Categories: cs.IR, cs.AI
Compressed abstract: Diet quality is a leading determinant of chronic disease risk. Advances in artificial intelligence (AI) have enabled food recommendation systems to adapt suggestions to user preferences and health goals.
Open summary page · arXiv · PDF
#10 Attribute-Grounded Selective Reasoning for Artwork Emotion Understanding with Multimodal Large Language Models
Score: 31.4
Matched keywords: agent, agent framework, large language models, multi-agent, multimodal, reasoning
Categories: cs.CV
Compressed abstract: Multimodal large language models (MLLMs) can produce fluent artwork emotion explanations, but they often suffer from attribute flooding: they enumerate many visible formal attributes without identifying which cues actually support the affective judgment. We therefore formulate artwork emotion understanding as Attribute-Grounded Selective Reasoning (AGSR), where predefined formal attributes serve as evidence units an…
Open summary page · arXiv · PDF