arXiv daily keyword digest · 2026-06-01

#1 Extending AI for Research to the Humanities: A Multi-Agent Framework for Evidence-Grounded Scholarship

Score: 48.5

Matched keywords: agent, agent framework, ai, benchmark, llm, multi-agent, rag, reasoning

Categories: cs.CL

Compressed abstract: LLM-based research agents have advanced rapidly in science and engineering, where research is organized around executable experiments, code, and quantitative signals. Humanities scholarship, however, requires a different mode of reasoning: interpretive, evidence-grounded argument over primary sources, where scholarly value depends on faithful quotation, verifiable provenance, and close reading.

Open summary page · arXiv · PDF

#2 Counterfactual Graph for Multi-Agent LLM Calibration

Score: 27.2

Matched keywords: agent, llm, multi-agent

Categories: cs.CL

Compressed abstract: Multi-agent LLM systems often treat agreement as evidence: when many agents in a panel give the same answer, that answer is assumed to be more reliable. We show that this assumption can fail after agents communicate.

Open summary page · arXiv · PDF

#3 Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

Score: 27.4

Matched keywords: agent, benchmark, harness, multi-agent

Categories: cs.CV, cs.AI, cs.CL

Compressed abstract: Scientific figures are among the most effective means of communicating complex research ideas, yet producing publication-quality illustrations remains one of the most labor-intensive parts of paper preparation. Existing automated systems each target a single figure type under text-only input, leaving the diversity of types and conditions researchers actually use unaddressed; their raster outputs further cannot be lo…

Open summary page · arXiv · PDF

#4 HADT: A Heterogeneous Multi-Agent Differential Transformer for Autonomous Earth Observation Satellite Cluster

Score: 15.3

Matched keywords: agent, multi-agent, transformer

Categories: cs.AI, cs.LG, cs.MA

Compressed abstract: This work addresses the problem of autonomous resource management in heterogeneous satellite cluster conducting Earth Observation (EO) missions including optical and Synthetic Aperture Radar (SAR) satellites. In autonomous operation mode, satellites are equipped with intelligent capabilities enabling real-time decision-making based on the latest conditions, while requiring minimal interaction with ground operators.

Open summary page · arXiv · PDF

#5 An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

Score: 25.4

Matched keywords: agent, large language model, llm

Categories: cs.CR, cs.AI, cs.CL, cs.IR

Compressed abstract: Regulated cybersecurity workflows lack a runtime substrate that enforces organization-level scope across retrieval, tool calls, memory, findings, reports, and audit while remaining model-agnostic and locally deployable. Recent large language model (LLM) agent systems report strong results on isolated cybersecurity tasks, yet they do not by themselves define an auditable platform architecture for regulated security o…

Open summary page · arXiv · PDF

#6 Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely

Score: 15.7

Matched keywords: agent, multi-agent, reasoning

Categories: cs.CL, cs.RO

Compressed abstract: Robots operating in diverse environments rely on visual input to interpret objects and spatial layouts. In human-collaborative tasks, they are expected to communicate this understanding through language.

Open summary page · arXiv · PDF

#7 LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

Score: 23.2

Matched keywords: ai, large language models, llm, rag, token

Categories: cs.AI

Compressed abstract: Assessing whether Large Language Models outputs are factually grounded, epistemically calibrated, and methodologically reproducible is a prerequisite for responsible AI deployment. Yet auditing LLMs remains inaccessible to non-technical practitioners: existing tools require programming expertise and non-trivial environment setup, and cloud-hosted platforms transmit evaluation data to external services, creating barr…

Open summary page · arXiv · PDF

#8 ERGeoBench:A Comprehensive Benchmark for Embodied Reasoning and Geo-localization in Multimodal Large Language Models

Score: 22.6

Matched keywords: benchmark, large language models, multimodal, reasoning

Categories: cs.CV, cs.AI

Compressed abstract: Multimodal large language models (MLLMs) have shown strong potential as embodied agents, yet embodied geo-localization remains underexplored due to the lack of fine-grained evaluation. We introduce ERGeoBench, a diagnostic benchmark for vision-driven embodied geo-localization.

Open summary page · arXiv · PDF

#9 ImmersiveTTS: Environment-Aware Text-to-Speech with Multimodal Diffusion Transformer and Domain-Specific Representation Alignment

Score: 20.4

Matched keywords: alignment, diffusion, multimodal, transformer

Categories: eess.AS, cs.AI, cs.CL

Compressed abstract: Recent advancements in text-guided audio generation have yielded promising results in diverse domains, including sound effects, speech, and music. However, jointly generating speech with environmental audio remains challenging due to the inherent disparities in their acoustic patterns and temporal dynamics.

Open summary page · arXiv · PDF

#10 Design and Evaluation of Multi-Agent AI Oracle Systems for Prediction Market Resolution

Score: 33.7

Matched keywords: agent, ai, llm, multi-agent, reasoning

Categories: cs.MA, cs.AI

Compressed abstract: Prediction markets aggregate collective intelligence to forecast uncertain events, but their utility depends on reliable outcome resolution. Existing oracle systems tradeoff fast but brittle automation against accurate but costly human arbitration.

Open summary page · arXiv · PDF

2026-06-01 · arXiv Daily Keyword Digest (Top 10 of 761)

#1 Extending AI for Research to the Humanities: A Multi-Agent Framework for Evidence-Grounded Scholarship

#2 Counterfactual Graph for Multi-Agent LLM Calibration

#3 Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

#4 HADT: A Heterogeneous Multi-Agent Differential Transformer for Autonomous Earth Observation Satellite Cluster

#5 An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

#6 Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely

#7 LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

#8 ERGeoBench:A Comprehensive Benchmark for Embodied Reasoning and Geo-localization in Multimodal Large Language Models

#9 ImmersiveTTS: Environment-Aware Text-to-Speech with Multimodal Diffusion Transformer and Domain-Specific Representation Alignment

#10 Design and Evaluation of Multi-Agent AI Oracle Systems for Prediction Market Resolution