#1 PolyGnosis 2.0: Enhancing LLM Reasoning via Agentic Harness Engineering for Polymarket and OSINT Insight Extraction
Score: 48.9
Matched keywords: agent, alignment, harness, harness engineering, llm, multi-agent, reasoning, token
Categories: cs.CL, cs.CE
Compressed abstract: This paper introduces PolyGnosis 2.0, a pioneering multi-agent architecture designed to extract predictive intelligence by synthesizing Polymarket anomaly signals with global Open Source Intelligence (OSINT) streams, specifically Global Database of Events, Language, and Tone (GDELT). We define and target "Perspective Mismatches", the narrative divergence between Polymarket sentiment and global media flows, as high-a…
Open summary page · arXiv · PDF
#2 Agent-as-Peer-Debriefer: A Multi-Agent Framework with Perspective-Based Refinement for Qualitative Analysis
Score: 40.7
Matched keywords: agent, agent framework, coding agent, large language models, llm, multi-agent
Categories: cs.AI
Compressed abstract: Large language models (LLMs) are increasingly used for qualitative data analysis (QDA), yet their outputs often miss the depth and nuance of human analysis. We argue this gap reflects a missing credibility practice from human QDA: peer debriefing, in which an analyst seeks feedback from a disinterested peer and uses it to refine their coding.
Open summary page · arXiv · PDF
#3 Spectral Retrieval: Multi-Scale Sinc Convolution over Token Embeddings for Localized Retrieval in LLM Multi-Agent Systems
Score: 31.8
Matched keywords: agent, benchmark, llm, multi-agent, token
Categories: cs.IR, cs.AI, cs.CL
Compressed abstract: [Abridged] - Spectral Retrieval is a plug-in re-ranking stage that interpolates between per-token MaxSim and mean-pool retrieval through a multi-scale sinc convolution over token embeddings. In standard dense retrieval each document is one mean-pooled vector; when relevance localises into a short subspan, the signal averages into noise.
Open summary page · arXiv · PDF
#4 A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback
Score: 29.9
Matched keywords: agent, ai, llm, multi-agent
Categories: cs.CL, cs.AI, cs.MA
Compressed abstract: Verbal feedback delivered by attending surgeons in the operating room plays a critical formative role in resident trainee skill acquisition. Yet, assessing the quality of trainer feedback and its effectiveness in influencing trainee behavior during live surgery remains a challenge.
Open summary page · arXiv · PDF
#5 Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems
Score: 28.2
Matched keywords: agent, llm, multi-agent
Categories: cs.AI, cs.LG, q-fin.PM
Compressed abstract: Multi-agent LLM decision systems for portfolio management still lack a principled way to assign credit across specialist agents, remain vulnerable to cold-start dominance under regime shifts, and offer limited transparency into how final allocations are formed. We propose Market Regime Council (MRC), a cooperative multi-agent decision system that computes exact Shapley credits across all single, pairwise, and Grand-…
Open summary page · arXiv · PDF
#6 Automated Benchmark Auditing for AI Agents and Large Language Models
Score: 21.2
Matched keywords: ai, ai agents, benchmark, large language models, llm
Categories: cs.CL
Compressed abstract: Modern AI benchmarks operate at a complexity that outpaces traditional verification methods. Tasks authored by domain experts often contain implicit assumptions, incomplete environment specifications, and brittle evaluation logic that human annotation cannot reliably catch.
Open summary page · arXiv · PDF
#7 Meta-Agent: From Task Descriptions to Verified Multi-Agent Systems
Score: 29.7
Matched keywords: agent, ai, ai agents, code generation, multi-agent, reasoning
Categories: cs.AI
Compressed abstract: AI agents are increasingly used to solve complex, multi-step tasks, but existing multi-agent frameworks remain brittle as workflows grow in scale and depth. Small errors at intermediate stages can propagate through agent interactions, while insufficient grounding and weak verification mechanisms further limit reliability.
Open summary page · arXiv · PDF
#8 CausalFlow: Causal Attribution and Counterfactual Repair for LLM Agent Failures
Score: 37.2
Matched keywords: agent, code generation, large language model, llm, reasoning, tool use
Categories: cs.LG, cs.AI
Compressed abstract: Large language model (LLM) agents frequently fail on multi-step tasks involving reasoning, tool use, and environment interaction. While such failures are typically logged or retried heuristically, they contain structured signals about where execution broke down.
Open summary page · arXiv · PDF
#9 How Many Tools Should an LLM Agent See? A Chance-Corrected Answer
Score: 21.2
Matched keywords: agent, llm
Categories: cs.IR, cs.AI, cs.LG
Compressed abstract: Before an LLM agent can use a tool, a retrieval system must decide which candidate tools to show to the agent. How long should that shortlist be?
Open summary page · arXiv · PDF
#10 Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning
Score: 20.4
Matched keywords: llm, reasoning
Categories: cs.CL, cs.AI, cs.SE
Compressed abstract: Post-hoc repair of LLM mathematical reasoning introduces an asymmetric risk: fixing an incorrect reasoning trace is useful, but replacing a trace that was already correct can be harmful. We study this problem under a selective replacement setting, where a system must decide whether a repaired candidate is safer than preserving the original cached trace.
Open summary page · arXiv · PDF