#1 Cross-Lingual Token Arbitrage: Optimizing Code Agent Context Windows via Local LLM Preprocessing
Score: 34.7
Matched keywords: agent, ai, benchmark, code agent, llm, prompt, token
Categories: cs.AI
Compressed abstract: AI-assisted coding agents are bottlenecked by input-token cost. Two pathologies of raw human input drive much of this overhead: tokenization inefficiency for non-English text and structural entropy in conversational prompts.
Open summary page · arXiv · PDF
#2 The Deliberative Illusion: Diagnosing Factual Attrition and Stance Homogenization in Multi-Agent LLM Deliberation
Score: 27.2
Matched keywords: agent, llm, multi-agent
Categories: cs.CL
Compressed abstract: Multi-agent LLM systems often treat consensus as evidence of successful interaction. For deliberative problems, however, reliability depends on whether agents preserve the facts and viewpoints needed to interpret an issue.
Open summary page · arXiv · PDF
#3 The Ringelmann Effect in Multi-Agent LLM Systems: A Scaling Law for Effective Team Size
Score: 27.2
Matched keywords: agent, llm, multi-agent
Categories: physics.soc-ph, cs.AI, cs.MA
Compressed abstract: Inference-time multi-agent LLM scaling lacks a shared unit: counting nominal agents conflates cost with independent evidence. We derive a two-parameter scaling law R(N) = N_eff/N = 1/(1+c(N-1)N^{-}) where the regime exponent classifies any configuration into one of three asymptotic regimes -- hard-ceiling at 1/c ( = 0), sublinear at N^/c (0 < < 1), or linear ( 1), and a mean-field theorem predicts that peer count k…
Open summary page · arXiv · PDF
#4 Multi^2: Hierarchical Multi-Agent Decision-Making with LLM-Based Agents in Interactive Environments
Score: 39.5
Matched keywords: agent, benchmark, fine-tuning, large language model, llm, multi-agent, reasoning
Categories: cs.LG
Compressed abstract: A central goal of large language model (LLM) research is to build agentic systems that can plan, act, and adapt through sustained interaction with dynamic environments. While recent LLM-based agents exhibit impressive contextual reasoning, their long-horizon decision-making remains fragile, often suffering from objective drift, where goals and plans drift over extended interactions.
Open summary page · arXiv · PDF
#5 The Geometry of LLM-as-Judge: Why Inter-LLM Consensus Is Not Human Alignment
Score: 20.2
Matched keywords: alignment, fine-tuning, llm
Categories: cs.CL
Compressed abstract: LMs-as-judges are now standard, yet judges agree strongly with one another while agreeing only weakly with humans. We test whether this reflects shared signal or shared bias by measuring four geometric quantities on the standard LLM-as-judge stack across four community-built Indic datasets, eight Indic languages, and 41 LLM judges: score spread, effective rank, principal angle to the human subspace, and stacked corr…
Open summary page · arXiv · PDF
#6 Toward a Modular Architecture for Embedded AI Agent Systems at the Edge
Score: 33.2
Matched keywords: agent, ai, ai agent, large language models, reasoning, tool use
Categories: cs.AI, cs.MA
Compressed abstract: The rise of Large Language Models (LLMs) has enabled agentic AI capable of complex reasoning and tool use; however, deploying such autonomy in pervasive computing environments remains challenging due to the strict memory and energy constraints of embedded microcontrollers. Existing frameworks typically assume server-class resources or continuous connectivity, leaving a gap for deeply embedded systems.
Open summary page · arXiv · PDF
#7 E2 LLM: Towards Efficient LLM Serving in Heterogeneous Edge/Fog Environments
Score: 17.4
Matched keywords: large language models, llm, token
Categories: cs.DC, cs.AI
Compressed abstract: Large Language Models (LLMs) have become integral to modern applications, yet their deployment remains challenging. Beyond executing the models themselves, practical deployment must address cost efficiency, low latency, and optimal resource utilization.
Open summary page · arXiv · PDF
#8 Diagnosing Knowledge Gaps in LLM Tool Use: An Agentic Benchmark for Novel API Acquisition
Score: 24.7
Matched keywords: benchmark, code generation, fine-tuning, large language models, llm, tool use
Categories: cs.AI
Compressed abstract: Large language models for code generation often need to use APIs that are absent from their pretraining data. This requires more than recalling a function name: models must coordinate signatures, module paths, input-output contracts, semantics, and executable usage patterns.
Open summary page · arXiv · PDF
#9 Multi-Agent Framework Leveraging Knowledge Graphs for Virtual Commissioning Models
Score: 19.0
Matched keywords: agent, agent framework, multi-agent
Categories: cs.CE
Compressed abstract: Virtual commissioning models (VCMs) of discrete manufacturing systems are used to validate automation behavior before physical deployment, but creating and maintaining them remains labor-intensive. Relevant engineering information is distributed across programmable logic controller (PLC) engineering projects, such as Siemens TIA Portal, and kinematic simulation models, such as Siemens NX Mechatronics Concept Designe…
Open summary page · arXiv · PDF
#10 Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning
Score: 28.6
Matched keywords: agent, large language models, llm, reasoning, token
Categories: cs.CL, cs.AI
Compressed abstract: Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Existing efficient reasoning methods control thinking length by shortening, early-stopping, or compressing traces, leaving how the model thinks implicit.
Open summary page · arXiv · PDF