#1 Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus
Score: 29.4
Matched keywords: agent, llm, multi-agent, token
Categories: cs.LG, cs.AI, cs.MA
Compressed abstract: Multi-agent LLM committees replicate the same model under different role prompts and aggregate outputs by majority vote, implicitly assuming that agents contribute complementary evidence. We embed each agent's chain-of-thought rationale and measure pairwise similarity: across 100 GSM8 K questions with three Qwen2.5-14 B agents, mean cosine similarity is 0.888 and effective rank is 2.17 out of 3.0, a failure mode we…
Open summary page · arXiv · PDF
#2 PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage
Score: 36.4
Matched keywords: agent, large language model, llm, multi-agent
Categories: cs.AI, cs.CL, cs.MA, q-fin.TR
Compressed abstract: This paper presents PolySwarm, a novel multi-agent large language model (LLM) framework designed for real-time prediction market trading and latency arbitrage on decentralized platforms such as Polymarket. PolySwarm deploys a swarm of 50 diverse LLM personas that concurrently evaluate binary outcome markets, aggregating individual probability estimates through confidence-weighted Bayesian combination of swarm consen…
Open summary page · arXiv · PDF
#3 Optimizing Service Operations via LLM-Powered Multi-Agent Simulation
Score: 27.2
Matched keywords: agent, llm, multi-agent
Categories: cs.AI, cs.MA, math.OC
Compressed abstract: Service system performance depends on how participants respond to design choices, but modeling these responses is hard due to the complexity of human behavior. We introduce an LLM-powered multi-agent simulation (LLM-MAS) framework for optimizing service operations.
Open summary page · arXiv · PDF
#4 ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3 EX Decoupled Architecture
Score: 34.1
Matched keywords: agent, ai, ai agent, ai agents, llm, multi-agent, token
Categories: cs.AI, cs.CL
Compressed abstract: AI agents, autonomous digital actors, need agent-native protocols; existing methods include GUI automation and MCP-based skills, with defects of high token consumption, fragmented interaction, inadequate security, due to lacking a unified top-level framework and key components, each independent module flawed. To address these issues, we present ANX, an open, extensible, verifiable agent-native protocol and top-level…
Open summary page · arXiv · PDF
#5 BAAI Cardiac Agent: An intelligent multimodal agent for automated reasoning and diagnosis of cardiovascular diseases from cardiac magnetic resonance imaging
Score: 9.6
Matched keywords: agent, agent framework, multimodal, reasoning
Categories: eess.IV, cs.AI, cs.CV
Compressed abstract: Cardiac magnetic resonance (CMR) is a cornerstone for diagnosing cardiovascular disease. However, it remains underutilized due to complex, time-consuming interpretation across multi-sequences, phases, quantitative measures that heavily reliant on specialized expertise.
Open summary page · arXiv · PDF
#6 SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization
Score: 27.8
Matched keywords: code generation, fine-tuning, llm, prompt, reasoning
Categories: cs.CR, cs.AI
Compressed abstract: Reasoning language models (RLMs) are increasingly used in programming. Yet, even state-of-the-art RLMs frequently introduce critical security vulnerabilities in generated code.
Open summary page · arXiv · PDF
#7 Scaling Multi-agent Systems: A Smart Middleware for Improving Agent Interactions
Score: 27.8
Matched keywords: agent, alignment, large language model, llm, multi-agent, prompt
Categories: cs.MA, cs.NI
Compressed abstract: As Large Language Model (LLM) based Multi-Agent Systems (MAS) evolve from experimental pilots to complex, persistent ecosystems, the limitations of direct agent-to-agent communication have become increasingly apparent. Current architectures suffer from fragmented context, stochastic hallucinations, rigid security boundaries, and inefficient topology management.
Open summary page · arXiv · PDF
#8 ClawArena: Benchmarking AI Agents in Evolving Information Environments
Score: 33.4
Matched keywords: agent, ai, ai agents, benchmark, reasoning
Categories: cs.LG, cs.AI, cs.CL
Compressed abstract: AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered across heterogeneous sources that often contradict one another, new information can invalidate earlier conclusions, and user preferences surface through corrections rather than explicit instructions.
Open summary page · arXiv · PDF
#9 Toward Executable Repository-Level Code Generation via Environment Alignment
Score: 23.2
Matched keywords: alignment, code generation, large language models, repository-level
Categories: cs.SE, cs.AI
Compressed abstract: Large language models (LLMs) have achieved strong performance on code generation, but existing methods still struggle with repository-level code generation under executable validation. Under this evaluation setting, success is determined not by the plausibility of isolated code fragments, but by whether a generated multi-file repository can be successfully installed, have its dependencies and internal references res…
Open summary page · arXiv · PDF
#10 Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not
Score: 15.1
Matched keywords: large language models, reasoning, token
Categories: cs.CL, cs.AI
Compressed abstract: Large language models achieve strong performance on many language tasks, yet it remains unclear whether they integrate world knowledge with syntactic structure in a human-like, structure-sensitive way during ambiguity resolution. We test this question in Turkish prenominal relative-clause attachment ambiguities, where the same surface string permits high attachment (HA) or low attachment (LA).
Open summary page · arXiv · PDF