arXiv daily keyword digest · 2026-04-01

#1 Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation

Score: 21.2

Matched keywords: benchmark, large language model, large language models, llm

Categories: cs.DC, cs.CL, cs.LG

Open summary page · arXiv · PDF

#2 ELT-Bench-Verified: Benchmark Quality Issues Underestimate AI Agent Capabilities

Score: 29.0

Matched keywords: agent, ai, ai agent, ai agents, benchmark, large language models, llm

Categories: cs.AI, cs.DB

Open summary page · arXiv · PDF

#3 SimMOF: AI agent for Automated MOF Simulations

Score: 16.8

Matched keywords: agent, ai, ai agent, large language model

Categories: cs.AI

Open summary page · arXiv · PDF

#4 Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives

Score: 18.0

Matched keywords: llm, prompt, reasoning

Categories: cs.CL, cs.AI

Open summary page · arXiv · PDF

#5 The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

Score: 19.4

Matched keywords: benchmark, large language models, llm, reasoning, token

Categories: cs.CL, cs.AI

Open summary page · arXiv · PDF

#6 Adversarial Prompt Injection Attack on Multimodal Large Language Models

Score: 17.2

Matched keywords: large language models, multimodal, prompt

Categories: cs.CV, cs.AI

Open summary page · arXiv · PDF

#7 Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing

Score: 21.2

Matched keywords: benchmark, large language models, llm, prompt, token

Categories: cs.CR, cs.AI

Open summary page · arXiv · PDF

#8 SkillReducer: Optimizing LLM Agent Skills for Token Efficiency

Score: 16.8

Matched keywords: agent, benchmark, llm, token

Categories: cs.SE

Open summary page · arXiv · PDF

#9 Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-agent AI Systems

Score: 18.6

Matched keywords: agent, ai, benchmark, large language models

Categories: cs.CR, cs.AI

Open summary page · arXiv · PDF

#10 ASI-Evolve: AI Accelerates AI

Score: 9.2

Matched keywords: ai, benchmark

Categories: cs.AI

Open summary page · arXiv · PDF

2026-04-01 · arXiv Daily Keyword Digest (Top 10 of 644)

#1 Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation

#2 ELT-Bench-Verified: Benchmark Quality Issues Underestimate AI Agent Capabilities

#3 SimMOF: AI agent for Automated MOF Simulations

#4 Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives

#5 The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

#6 Adversarial Prompt Injection Attack on Multimodal Large Language Models

#7 Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing

#8 SkillReducer: Optimizing LLM Agent Skills for Token Efficiency

#9 Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-agent AI Systems

#10 ASI-Evolve: AI Accelerates AI