arXiv daily keyword digest · 2026-05-22

#1 From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

Score: 21.4

Matched keywords: llm, reasoning

Categories: cs.LG, cs.AI, cs.CL

Compressed abstract: Reinforcement learning from verifiable rewards (RLVR) has shown strong promise for LLM reasoning, but outcome-based RLVR remains inefficient on hard problems because correct final-answer rollouts are rare and sample-level credit assignment cannot use partial progress in failed attempts. We introduce SCRL (Subproblem Curriculum Reinforcement Learning), a curriculum RL framework that derives verifiable subproblems fro…

Open summary page · arXiv · PDF

#2 AutoMCU: Feasibility-First MCU Neural Network Customization via LLM-based Multi-Agent Systems

Score: 31.0

Matched keywords: agent, large language model, llm, multi-agent

Categories: cs.LG

Compressed abstract: Deploying neural networks on microcontroller units (MCUs) is critical for edge intelligence but remains challenging due to tight memory, storage, and computation constraints. Existing approaches, such as model compression and hardware-aware neural architecture search (HW-NAS), often depend on proxy metrics, incur high search cost, and do not fully bridge the gap between architecture design and verified deployment.

Open summary page · arXiv · PDF

#3 From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

Score: 24.2

Matched keywords: alignment, fine-tuning, large language models, llm

Categories: cs.LG, cs.CL

Compressed abstract: Adapting Large Language Models (LLMs) to specialized domains typically incurs high data and computational overhead. While prior efficiency efforts have largely treated data selection and parameter-efficient fine-tuning as isolated processes, our empirical analysis suggests they may be intrinsically coupled.

Open summary page · arXiv · PDF

#4 OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

Score: 22.6

Matched keywords: llm, reasoning, token

Categories: cs.LG, cs.AI

Compressed abstract: Reinforcement learning with verifiable rewards has become the standard recipe for improving LLM reasoning, but the dominant algorithm GRPO assigns a single trajectory-level advantage to every token, diluting the signal at pivotal reasoning steps and injecting noise at uninformative ones. Critic-free alternatives derived from on-policy distillation supply per-token signals through oracle-conditioned likelihood ratios…

Open summary page · arXiv · PDF

#5 Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts

Score: 15.2

Matched keywords: ai, alignment, llm

Categories: cs.AI, cs.HC

Compressed abstract: AI models are already deployed in societies affected by armed conflict, and journalists, humanitarian workers, governments and ordinary citizens rely on them for information or for their work processes. No established practice exists for checking whether their outputs can make those conflicts worse.

Open summary page · arXiv · PDF

#6 Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning

Score: 22.6

Matched keywords: llm, reasoning, token

Categories: cs.LG, cs.AI

Compressed abstract: On-policy self-distillation (OPSD) is an emerging LLM post-training paradigm in which the model serves as its own teacher: conditioned on privileged information such as a reference trace or hint, the same policy provides dense token-level supervision on its own rollouts. However, recent studies show that OPSD degrades complex reasoning by suppressing predictive uncertainty, which supports exploration and hypothesis…

Open summary page · arXiv · PDF

#7 Claw AI Lab: An Autonomous Multi-Agent Research Team

Score: 31.8

Matched keywords: agent, ai, harness, multi-agent, prompt

Categories: cs.AI

Compressed abstract: We present Claw AI Lab, a lab-native autonomous research platform that advances automated research from a hidden prompt-to-paper pipeline into an interactive AI laboratory. Rather than centering the system around a single agent or a fixed serial workflow, we allow users to instantiate a full research team from one prompt, with customizable roles, collaborative workflows, real-time monitoring, artifact inspection, an…

Open summary page · arXiv · PDF

#8 Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

Score: 32.2

Matched keywords: agent, harness, llm, tool use

Categories: cs.AI

Compressed abstract: LLM agents are shaped not only by their language models, but also by the runtime harness that mediates observation, tool use, action execution, feedback interpretation, and trajectory control. While existing agent adaptation methods mainly update model parameters, many failures in deterministic, rule-governed domains stem from mismatches at the model--environment interface.

Open summary page · arXiv · PDF

#9 TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization

Score: 23.2

Matched keywords: agent, ai, benchmark, multi-agent, reasoning

Categories: cs.AI

Compressed abstract: Topology optimization can generate efficient structures, but designers often must manually translate qualitative intent, such as desired visual style, product experience, or manufacturability into solver settings that are not directly tied to those preferences. We present TO-Agents, a multi-agent AI framework that connects natural-language design intent with iterative topology optimization.

Open summary page · arXiv · PDF

#10 GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving

Score: 28.5

Matched keywords: agent, benchmark, large language model, llm, reasoning

Categories: cs.LG

Compressed abstract: Large Language Model (LLM)-based agents demonstrate strong reasoning and execution capabilities on complex tasks when guided by structured instructions, commonly referred to as workflows. However, existing workflow-assisted agent serving systems typically rely on predefined templates and shallow matching mechanisms, which limit their ability to capture deep semantic relationships and generalize to previously unseen…

Open summary page · arXiv · PDF

2026-05-22 · arXiv Daily Keyword Digest (Top 10 of 763)

#1 From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

#2 AutoMCU: Feasibility-First MCU Neural Network Customization via LLM-based Multi-Agent Systems

#3 From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

#4 OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

#5 Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts

#6 Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning

#7 Claw AI Lab: An Autonomous Multi-Agent Research Team

#8 Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

#9 TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization

#10 GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving