#1 Exploring LLM-based Verilog Code Generation with Data-Efficient Fine-Tuning and Testbench Automation
Score: 39.7
Matched keywords: agent, benchmark, code generation, fine-tuning, large language models, llm, multi-agent
Categories: cs.AR, cs.AI
Compressed abstract: Recent advances in large language models have improved code generation, but their use in hardware description languages is still limited. Moreover, training data and testbenches for these models are often scarce.
Open summary page · arXiv · PDF
#2 SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems
Score: 34.2
Matched keywords: agent, benchmark, large language models, llm, multi-agent, reasoning
Categories: cs.AI, cs.LG, cs.MA
Compressed abstract: As Large Language Models (LLMs) transition from text processors to autonomous agents, evaluating their social reasoning in embodied multi-agent settings becomes critical. We introduce SocialGrid, an embodied multi-agent environment inspired by Among Us that evaluates LLM agents on planning, task execution, and social reasoning.
Open summary page · arXiv · PDF
#3 Weak-Link Optimization for Multi-Agent Reasoning and Collaboration
Score: 29.4
Matched keywords: agent, llm, multi-agent, reasoning
Categories: cs.AI, cs.CL, cs.MA
Compressed abstract: LLM-driven multi-agent frameworks address complex reasoning tasks through multi-role collaboration. However, existing approaches often suffer from reasoning instability, where individual agent errors are amplified through collaboration, undermining overall performance.
Open summary page · arXiv · PDF
#4 LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance
Score: 26.4
Matched keywords: fine-tuning, large language models, llm
Categories: cs.CL, cs.AI, cs.LG
Compressed abstract: Existing research on large language models (LLMs) for automated code compliance has primarily focused on performance, treating the models as black boxes and overlooking how training decisions affect their interpretive behavior. This paper addresses this gap by employing a perturbation-based attribution analysis to compare the interpretive behaviors of LLMs across different fine-tuning strategies such as full fine-tu…
Open summary page · arXiv · PDF
#5 To LLM, or Not to LLM: How Designers and Developers Navigate LLMs as Tools or Teammates
Score: 24.2
Matched keywords: large language models, llm, reasoning
Categories: cs.HC, cs.AI, cs.IR, cs.LG
Compressed abstract: Large language models (LLMs) are increasingly integrated into design and development workflows, yet decisions about their use are rarely binary or purely technical. We report findings from a constructivist grounded theory study based on interviews with 33 designers and developers across three large technology organisations.
Open summary page · arXiv · PDF
#6 Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation
Score: 16.7
Matched keywords: agent, ai, ai agent
Categories: cs.AI
Compressed abstract: Recent work on subliminal learning demonstrates that language models can transmit semantic traits through data that is semantically unrelated to those traits. However, it remains unclear whether behavioral traits can transfer in agentic systems, where policies are learned from trajectories rather than static text.
Open summary page · arXiv · PDF
#7 Bridging the Gap between User Intent and LLM: A Requirement Alignment Approach for Code Generation
Score: 30.5
Matched keywords: alignment, benchmark, code generation, large language models, llm, reasoning
Categories: cs.SE
Compressed abstract: Code generation refers to automatically producing executable programs from user requirements. Recently, researchers have explored approaches to enhance the correctness of generated code with advanced large language models.
Open summary page · arXiv · PDF
#8 How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models
Score: 15.2
Matched keywords: large language models, llm
Categories: cs.CL
Compressed abstract: Large language models (LLMs) are increasingly studied as repositories of linguistic knowledge. In this line of work, models are commonly evaluated both as generators of language and as judges of linguistic output, yet these two roles are rarely examined in direct relation to one another.
Open summary page · arXiv · PDF
#9 CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution
Score: 20.2
Matched keywords: agent, llm
Categories: cs.CL
Compressed abstract: Reinforcement learning for LLM agents is typically conducted on a static data distribution, which fails to adapt to the agent's evolving behavior and leads to poor coverage of complex environment interactions. To address these challenges, we propose CoEvolve, an agent-data mutual evolution framework that enables LLM agents to improve through closed-loop, interaction-driven training.
Open summary page · arXiv · PDF
#10 Explainable Iterative Data Visualisation Refinement via an LLM Agent
Score: 21.7
Matched keywords: agent, ai, large language model, llm
Categories: cs.HC, cs.AI
Compressed abstract: Exploratory analysis of high-dimensional data relies on embedding the data into a low-dimensional space (typically 2 D or 3 D), based on which visualization plot is produced to uncover meaningful structures and to communicate geometric and distributional data characteristics. However, finding a suitable algorithm configuration, particularly hyperparameter setting, to produce a visualization plot that faithfully repr…
Open summary page · arXiv · PDF