arXiv daily keyword digest · 2026-05-18

#1 TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

Score: 38.5

Matched keywords: agent, fine-tuning, llm, multi-agent, reasoning

Categories: cs.LG, cs.MA

Compressed abstract: Multi-agent LLM systems have shown promise for complex reasoning, yet recent evaluations reveal they often underperform single-model baselines. We identify a structural failure mode in sequential fine-tuning of shared-context teams: updating one agent shifts the team's context distribution, and when subsequent updates are evaluated on cached rollouts, this mismatch compounds.

Open summary page · arXiv · PDF

#2 Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

Score: 31.8

Matched keywords: agent, llm, reasoning, token

Categories: cs.AI, cs.CL, cs.LG, cs.MA, eess.SY

Compressed abstract: Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across components. Yet practitioners lack guidance on which design choices improve performance versus merely increase inference costs.

Open summary page · arXiv · PDF

#3 CAX-Agent: A Lightweight Agent Harness for Reliable APDL Automation

Score: 27.2

Matched keywords: agent, benchmark, harness, large language models, llm

Categories: cs.AI, cs.CE

Compressed abstract: Large language models deployed for MAPDL finite-element simulation face practical reliability challenges: without structured execution control, tool encapsulation, and fault recovery, outputs may be inconsistent and task failures are common. The Agent Harness paradigm addresses this by inserting domain-specific orchestration middleware that manages tool lifecycles, workflow state, and recovery escalation.

Open summary page · arXiv · PDF

#4 Representation Without Reward: A JEPA Audit for LLM Fine-Tuning

Score: 26.0

Matched keywords: fine-tuning, harness, llm

Categories: cs.LG, cs.AI, stat.ML

Compressed abstract: Joint-embedding predictive architectures (JEPAs) propose that a model should learn more useful abstractions when trained to predict latent representations rather than observed outputs. For autoregressive language-model fine-tuning the principle entails a stricter requirement: the induced hidden-state geometry must reach the language-model head and improve the decoded task metric.

Open summary page · arXiv · PDF

#5 From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

Score: 25.0

Matched keywords: fine-tuning, llm, prompt

Categories: cs.CE, cs.AI, cs.CL

Compressed abstract: Modern quantitative trading increasingly relies on systematic models to extract predictive signals from large-scale financial data, where alpha factor discovery plays a central role in transforming market observations into tradable signals. Recent LLM-based methods have shown promise in automating factor generation, but most of them still rely on prompt-level generation--evaluation--feedback loops for iterative opti…

Open summary page · arXiv · PDF

#6 Effective Harness Engineering for Algorithm Discovery with Coding Agents

Score: 22.8

Matched keywords: harness, harness engineering, large language models, token

Categories: cs.SE, cs.AI, cs.CL

Compressed abstract: AlphaEvolve and FunSearch have demonstrated the potential of combining large language models (LLMs) with evolutionary search for automated algorithm discovery. However, discovery success is shaped not only by model capability but also significantly by the design of the execution infrastructure, i.e., the harness.

Open summary page · arXiv · PDF

#7 Property-Guided LLM Program Synthesis for Planning

Score: 19.2

Matched keywords: llm, program synthesis

Categories: cs.AI, cs.LG

Compressed abstract: LLMs have shown impressive success in program synthesis, discovering programs that surpass prior solutions. However, these approaches rely on simple numeric scores to signal program quality, such as the value of the solution or the number of passed tests.

Open summary page · arXiv · PDF

#8 SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch

Score: 21.4

Matched keywords: agent, alignment, benchmark, multi-agent, rlhf

Categories: cs.AI

Compressed abstract: Multi-agent orchestration frameworks such as LangChain, LangGraph, and CrewAI route tasks through graph-based pipelines but do not enforce the stage constraints that govern real business processes. We present SDOF, a framework that treats multi-agent execution as a constrained state machine.

Open summary page · arXiv · PDF

#9 An LLM-RAG Approach for Healthy Eating Index-Informed Personalized Food Recommendations

Score: 29.1

Matched keywords: ai, artificial intelligence, large language models, llm, rag, retrieval-augmented

Categories: cs.IR, cs.AI

Compressed abstract: Diet quality is a leading determinant of chronic disease risk. Advances in artificial intelligence (AI) have enabled food recommendation systems to adapt suggestions to user preferences and health goals.

Open summary page · arXiv · PDF

#10 Attribute-Grounded Selective Reasoning for Artwork Emotion Understanding with Multimodal Large Language Models

Score: 31.4

Matched keywords: agent, agent framework, large language models, multi-agent, multimodal, reasoning

Categories: cs.CV

Compressed abstract: Multimodal large language models (MLLMs) can produce fluent artwork emotion explanations, but they often suffer from attribute flooding: they enumerate many visible formal attributes without identifying which cues actually support the affective judgment. We therefore formulate artwork emotion understanding as Attribute-Grounded Selective Reasoning (AGSR), where predefined formal attributes serve as evidence units an…

Open summary page · arXiv · PDF

2026-05-18 · arXiv Daily Keyword Digest (Top 10 of 633)

#1 TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

#2 Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

#3 CAX-Agent: A Lightweight Agent Harness for Reliable APDL Automation

#4 Representation Without Reward: A JEPA Audit for LLM Fine-Tuning

#5 From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

#6 Effective Harness Engineering for Algorithm Discovery with Coding Agents

#7 Property-Guided LLM Program Synthesis for Planning

#8 SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch

#9 An LLM-RAG Approach for Healthy Eating Index-Informed Personalized Food Recommendations

#10 Attribute-Grounded Selective Reasoning for Artwork Emotion Understanding with Multimodal Large Language Models