#1 AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use
Score: 34.2
Matched keywords: agent, ai, ai agent, ai agents, benchmark, llm, tool use
Categories: cs.AI, cs.CR
Compressed abstract: Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A single unsafe action, including accidental deletion, credential exposure, or data exfiltration, can cause irreversible harm.
Open summary page · arXiv · PDF
#2 Stabilizing LLM Supervised Fine-Tuning via Explicit Distributional Control
Score: 21.4
Matched keywords: fine-tuning, large language models, llm
Categories: cs.LG, cs.AI, cs.CL
Compressed abstract: Post-training large language models (LLMs) often suffers from catastrophic forgetting, where improvements on a target objective degrade previously acquired capabilities. Recent evidence suggests that this phenomenon is primarily driven by excessive distributional drift during optimization.
Open summary page · arXiv · PDF
#3 DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents
Score: 25.0
Matched keywords: agent, ai, ai agents, prompt
Categories: cs.AI
Compressed abstract: AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due to their high capability and flexibility, such agents raise significant security and safety concerns.
Open summary page · arXiv · PDF
#4 Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games
Score: 32.2
Matched keywords: agent, large language models, llm, multi-agent, reasoning
Categories: cs.AI
Compressed abstract: While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other agents brings significant challenges on the evaluation of the reasoning process and the credit assignment over multiple reasoning steps.
Open summary page · arXiv · PDF
#5 From Parameter Dynamics to Risk Scoring : Quantifying Sample-Level Safety Degradation in LLM Fine-tuning
Score: 22.2
Matched keywords: alignment, fine-tuning, large language models, llm
Categories: cs.AI, cs.LG
Compressed abstract: Safety alignment of Large Language Models (LLMs) is extremely fragile, as fine-tuning on a small number of benign samples can erase safety behaviors learned from millions of preference examples. Existing studies attempt to explain this phenomenon by comparing parameters and hidden states before and after fine-tuning, but overlook their dynamic evolution during fine-tuning.
Open summary page · arXiv · PDF
#6 Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning
Score: 19.2
Matched keywords: large language models, llm, reasoning
Categories: cs.CL, cs.ET, cs.LG
Compressed abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is an essential paradigm that enhances the reasoning capabilities of Large Language Models (LLMs). However, existing methods typically rely on static policy optimization schemes that misalign with the model's evolving reasoning capabilities.
Open summary page · arXiv · PDF
#7 Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning
Score: 18.2
Matched keywords: benchmark, fine-tuning, large language models, llm
Categories: cs.SE, cs.AI
Compressed abstract: Reinforcement fine-tuning (RFT) has become a core paradigm for post-training large language models, yet its training process remains highly fragile. Existing efforts mainly improve reliability at the system level or address specific issues in individual subproblems by modifying RFT algorithms.
Open summary page · arXiv · PDF
#8 SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies
Score: 20.0
Matched keywords: agent, ai, ai agents, benchmark, coding agent
Categories: cs.MA, cs.SE
Compressed abstract: The emergence of "vibe coding" platforms, where users describe applications in natural language and AI agents autonomously generate full-stack software, has created a need for rigorous evaluation beyond code-level benchmarks. In order to assess them as virtual software development agencies on understanding business requirements, making architectural decisions, writing production code, handling iterative modification…
Open summary page · arXiv · PDF
#9 RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation
Score: 9.4
Matched keywords: llm
Categories: cs.CL, cs.AI, cs.LG
Compressed abstract: We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4 o-mini judge selects the best candidate per instance.
Open summary page · arXiv · PDF
#10 Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs
Score: 22.4
Matched keywords: harness, large language models, llm
Categories: cs.DC, cs.AI, cs.CL, cs.LG
Compressed abstract: The usage of large language models (LLMs) has grown increasingly fragmented, with no single model dominating. Meanwhile, cloud providers offer a wide range of mid-tier and older-generation GPUs that enjoy better availability and deliver comparable performance per dollar to top-tier hardware.
Open summary page · arXiv · PDF