#1 Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation
Score: 21.2
Matched keywords: benchmark, large language model, large language models, llm
Categories: cs.DC, cs.CL, cs.LG
Score: 21.2
Matched keywords: benchmark, large language model, large language models, llm
Categories: cs.DC, cs.CL, cs.LG
Score: 29.0
Matched keywords: agent, ai, ai agent, ai agents, benchmark, large language models, llm
Categories: cs.AI, cs.DB
Score: 16.8
Matched keywords: agent, ai, ai agent, large language model
Categories: cs.AI
Score: 18.0
Matched keywords: llm, prompt, reasoning
Categories: cs.CL, cs.AI
Score: 19.4
Matched keywords: benchmark, large language models, llm, reasoning, token
Categories: cs.CL, cs.AI
Score: 17.2
Matched keywords: large language models, multimodal, prompt
Categories: cs.CV, cs.AI
Score: 21.2
Matched keywords: benchmark, large language models, llm, prompt, token
Categories: cs.CR, cs.AI
Score: 16.8
Matched keywords: agent, benchmark, llm, token
Categories: cs.SE
Score: 18.6
Matched keywords: agent, ai, benchmark, large language models
Categories: cs.CR, cs.AI
Score: 9.2
Matched keywords: ai, benchmark
Categories: cs.AI