2026-04-20 · arXiv Daily Keyword Digest (Top 10 of 545)

Generated: 2026-04-21T08:59:49.844710+09:00

Target date (KST): 2026-04-20

Selection: picked 10 from 545 papers published on the target date

Source: https://export.arxiv.org/api/query (`cat:cs.*`, sorted by submittedDate desc)

Selection logic: keyword-weight score + subject boost

#1 Exploring LLM-based Verilog Code Generation with Data-Efficient Fine-Tuning and Testbench Automation

Score: 39.7

Matched keywords: agent, benchmark, code generation, fine-tuning, large language models, llm, multi-agent

Categories: cs.AR, cs.AI

Compressed abstract: Recent advances in large language models have improved code generation, but their use in hardware description languages is still limited. Moreover, training data and testbenches for these models are often scarce.

Open summary page · arXiv · PDF

#2 SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems

Score: 34.2

Matched keywords: agent, benchmark, large language models, llm, multi-agent, reasoning

Categories: cs.AI, cs.LG, cs.MA

Compressed abstract: As Large Language Models (LLMs) transition from text processors to autonomous agents, evaluating their social reasoning in embodied multi-agent settings becomes critical. We introduce SocialGrid, an embodied multi-agent environment inspired by Among Us that evaluates LLM agents on planning, task execution, and social reasoning.

Open summary page · arXiv · PDF

#3 Weak-Link Optimization for Multi-Agent Reasoning and Collaboration

Score: 29.4

Matched keywords: agent, llm, multi-agent, reasoning

Categories: cs.AI, cs.CL, cs.MA

Compressed abstract: LLM-driven multi-agent frameworks address complex reasoning tasks through multi-role collaboration. However, existing approaches often suffer from reasoning instability, where individual agent errors are amplified through collaboration, undermining overall performance.

Open summary page · arXiv · PDF

#4 LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance

Score: 26.4

Matched keywords: fine-tuning, large language models, llm

Categories: cs.CL, cs.AI, cs.LG

Compressed abstract: Existing research on large language models (LLMs) for automated code compliance has primarily focused on performance, treating the models as black boxes and overlooking how training decisions affect their interpretive behavior. This paper addresses this gap by employing a perturbation-based attribution analysis to compare the interpretive behaviors of LLMs across different fine-tuning strategies such as full fine-tu…

Open summary page · arXiv · PDF

#5 To LLM, or Not to LLM: How Designers and Developers Navigate LLMs as Tools or Teammates

Score: 24.2

Matched keywords: large language models, llm, reasoning

Categories: cs.HC, cs.AI, cs.IR, cs.LG

Compressed abstract: Large language models (LLMs) are increasingly integrated into design and development workflows, yet decisions about their use are rarely binary or purely technical. We report findings from a constructivist grounded theory study based on interviews with 33 designers and developers across three large technology organisations.

Open summary page · arXiv · PDF

#6 Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation

Score: 16.7

Matched keywords: agent, ai, ai agent

Categories: cs.AI

Compressed abstract: Recent work on subliminal learning demonstrates that language models can transmit semantic traits through data that is semantically unrelated to those traits. However, it remains unclear whether behavioral traits can transfer in agentic systems, where policies are learned from trajectories rather than static text.

Open summary page · arXiv · PDF

#7 Bridging the Gap between User Intent and LLM: A Requirement Alignment Approach for Code Generation

Score: 30.5

Matched keywords: alignment, benchmark, code generation, large language models, llm, reasoning

Categories: cs.SE

Compressed abstract: Code generation refers to automatically producing executable programs from user requirements. Recently, researchers have explored approaches to enhance the correctness of generated code with advanced large language models.

Open summary page · arXiv · PDF

#8 How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models

Score: 15.2

Matched keywords: large language models, llm

Categories: cs.CL

Compressed abstract: Large language models (LLMs) are increasingly studied as repositories of linguistic knowledge. In this line of work, models are commonly evaluated both as generators of language and as judges of linguistic output, yet these two roles are rarely examined in direct relation to one another.

Open summary page · arXiv · PDF

#9 CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution

Score: 20.2

Matched keywords: agent, llm

Categories: cs.CL

Compressed abstract: Reinforcement learning for LLM agents is typically conducted on a static data distribution, which fails to adapt to the agent's evolving behavior and leads to poor coverage of complex environment interactions. To address these challenges, we propose CoEvolve, an agent-data mutual evolution framework that enables LLM agents to improve through closed-loop, interaction-driven training.

Open summary page · arXiv · PDF

#10 Explainable Iterative Data Visualisation Refinement via an LLM Agent

Score: 21.7

Matched keywords: agent, ai, large language model, llm

Categories: cs.HC, cs.AI

Compressed abstract: Exploratory analysis of high-dimensional data relies on embedding the data into a low-dimensional space (typically 2 D or 3 D), based on which visualization plot is produced to uncover meaningful structures and to communicate geometric and distributional data characteristics. However, finding a suitable algorithm configuration, particularly hyperparameter setting, to produce a visualization plot that faithfully repr…

Open summary page · arXiv · PDF