#4 Revisiting Quantum Code Generation: Where Should Domain Knowledge Live?

Score: 27.8 | Matched keywords: agent, benchmark, fine-tuning, large language models, llm, rag, retrieval-augmented

Detailed Summary (EN)

Problem definition

Large language models (LLMs) [1] have recently emerged as powerful tools for automating a wide range of programming tasks, including code completion, program synthesis, and software maintenance.
Their impact has been particularly notable in scientific and engineering domains, where LLMs are increasingly used to assist with complex workflows involving specialized libraries, numerical methods, and domain-specific abstractions.
As these models continue to improve in scale, reasoning capability, and context length [2], an important open question is how best to incorporate domain knowledge into LLM-based systems for specialized scientific software ecosystems.
In quantum computing, the development of reliable and efficient software remains a key challenge, given the complexity of quantum programming abstractions and the rapid evolution of quantum software development kits (SDKs), such as Qiskit [3], Cirq [4], PennyLane [5], and Braket SDK [6].

Core idea & method

Agentic execution feedback yields the most consistent improvements, albeit at increased runtime cost, while RAG provides modest and model-dependent gains.
These findings indicate that performance gains can be achieved without domainspecific fine-tuning, instead relying on inference-time augmentation, thereby enabling a more flexible and maintainable approach to LLM-assisted quantum software development.
1 Introduction Large language models (LLMs) [1] have recently emerged as powerful tools for automating a wide range of programming tasks, including code completion, program synthesis, and software maintenance.
Their impact has been particularly notable in scientific and engineering domains, where LLMs are increasingly used to assist with complex workflows involving specialized libraries, numerical methods, and domain-specific abstractions.
As these models continue to improve in scale, reasoning capability, and context length [2], an important open question is how best to incorporate domain knowledge into LLM-based systems for specialized scientific software ecosystems.

Experimental setup & results

show that modern generalpurpose LLMs consistently outperform the parameter-specialized baseline.
Agentic execution feedback yields the most consistent improvements, albeit at increased runtime cost, while RAG provides modest and model-dependent gains.
These findings indicate that performance gains can be achieved without domainspecific fine-tuning, instead relying on inference-time augmentation, thereby enabling a more flexible and maintainable approach to LLM-assisted quantum software development.
1 Introduction Large language models (LLMs) [1] have recently emerged as powerful tools for automating a wide range of programming tasks, including code completion, program synthesis, and software maintenance.
Their impact has been particularly notable in scientific and engineering domains, where LLMs are increasingly used to assist with complex workflows involving specialized libraries, numerical methods, and domain-specific abstractions.

Limitations & risks

Revisiting Quantum Code Generation: Where Should Domain Knowledge Live?
Oscar Novo, Oscar Bastidas-Jossa, Alberto Calvo, Antonio Peris, and Carlos Kuchkovsky Quantum Computing Research, QCentroid, 48001, Bilbao, Spain Recent advances in large language models (LLMs) have enabled the automation of an increasing number of programming tasks, including code generation for scientific and engineering domains.
In rapidly evolving software ecosystems such as quantum software development, where frameworks expose complex abstractions, a central question is how best to incorporate domain knowledge into LLM-based assistants while preserving maintainability as libraries evolve.
In this work, we study specialization strategies for Qiskit code generation using the Qiskit-HumanEval benchmark.

Read-like-fullpaper digest

This paper addresses Large language models (LLMs) [1] have recently emerged as powerful tools for automating a wide range of programming tasks, including code completion, program synthesis, and software maintenance. The core method is Agentic execution feedback yields the most consistent improvements, albeit at increased runtime cost, while RAG provides modest and model-dependent gains. Key empirical findings include show that modern generalpurpose LLMs consistently outperform the parameter-specialized baseline.

상세 요약 (KO)

문제 정의

LLM(대규모 언어 모델)[1]은 최근 코드 완성, 프로그램 합성, 소프트웨어 유지 관리를 비롯한 광범위한 프로그래밍 작업을 자동화하기 위한 강력한 도구로 등장했습니다.
그 영향은 전문 라이브러리, 수치 방법 및 도메인별 추상화와 관련된 복잡한 워크플로를 지원하기 위해 LLM이 점점 더 많이 사용되고 있는 과학 및 엔지니어링 영역에서 특히 두드러졌습니다.
이러한 모델의 규모, 추론 능력 및 컨텍스트 길이가 지속적으로 향상됨에 따라 중요한 공개 질문은 전문 과학 소프트웨어 생태계를 위한 LLM 기반 시스템에 도메인 지식을 통합하는 최선의 방법입니다.
양자 컴퓨팅에서는 양자 프로그래밍 추상화의 복잡성과 Qiskit [3], Cirq [4], PennyLane [5] 및 Brackett SDK [6]와 같은 양자 소프트웨어 개발 키트(SDK)의 급속한 발전을 고려할 때 안정적이고 효율적인 소프트웨어 개발이 여전히 중요한 과제로 남아 있습니다.

핵심 아이디어/방법

에이전트 실행 피드백은 런타임 비용이 증가하더라도 가장 일관된 개선을 제공하는 반면, RAG는 적당하고 모델에 따른 이점을 제공합니다.
이러한 결과는 추론 시간 확대에 의존하는 대신 도메인별 미세 조정 없이 성능 향상을 달성할 수 있음을 나타냅니다. 이를 통해 LLM 지원 양자 소프트웨어 개발에 대한 보다 유연하고 유지 관리 가능한 접근 방식이 가능해집니다.
1 소개 대규모 언어 모델(LLM)[1]은 최근 코드 완성, 프로그램 합성 및 소프트웨어 유지 관리를 포함한 광범위한 프로그래밍 작업을 자동화하는 강력한 도구로 등장했습니다.
그 영향은 전문 라이브러리, 수치 방법 및 도메인별 추상화와 관련된 복잡한 워크플로를 지원하기 위해 LLM이 점점 더 많이 사용되고 있는 과학 및 엔지니어링 영역에서 특히 두드러졌습니다.
이러한 모델의 규모, 추론 능력 및 컨텍스트 길이가 지속적으로 향상됨에 따라 중요한 공개 질문은 전문 과학 소프트웨어 생태계를 위한 LLM 기반 시스템에 도메인 지식을 통합하는 최선의 방법입니다.

실험 설정/결과

최신 범용 LLM이 매개변수 전문 기준선보다 지속적으로 뛰어난 성능을 발휘한다는 것을 보여줍니다.
에이전트 실행 피드백은 런타임 비용이 증가하더라도 가장 일관된 개선을 제공하는 반면, RAG는 적당하고 모델에 따른 이점을 제공합니다.
이러한 결과는 추론 시간 확대에 의존하는 대신 도메인별 미세 조정 없이 성능 향상을 달성할 수 있음을 나타냅니다. 이를 통해 LLM 지원 양자 소프트웨어 개발에 대한 보다 유연하고 유지 관리 가능한 접근 방식이 가능해집니다.
1 소개 대규모 언어 모델(LLM)[1]은 최근 코드 완성, 프로그램 합성 및 소프트웨어 유지 관리를 포함한 광범위한 프로그래밍 작업을 자동화하는 강력한 도구로 등장했습니다.
그 영향은 전문 라이브러리, 수치 방법 및 도메인별 추상화와 관련된 복잡한 워크플로를 지원하기 위해 LLM이 점점 더 많이 사용되고 있는 과학 및 엔지니어링 영역에서 특히 두드러졌습니다.

한계/리스크

양자 코드 생성 재검토: 도메인 지식은 어디에 있어야 하는가?
Oscar Novo, Oscar Bastidas-Jossa, Alberto Calvo, Antonio Peris 및 Carlos Kuchkovsky Quantum Computing Research, QCentroid, 48001, Bilbao, Spain 최근 LLM(대규모 언어 모델)의 발전으로 과학 및 엔지니어링 분야의 코드 생성을 포함하여 점점 더 많은 프로그래밍 작업의 자동화가 가능해졌습니다.
프레임워크가 복잡한 추상화를 노출하는 양자 소프트웨어 개발과 같이 빠르게 진화하는 소프트웨어 생태계에서 핵심 질문은 라이브러리가 발전함에 따라 유지 관리성을 유지하면서 도메인 지식을 LLM 기반 도우미에 통합하는 최선의 방법입니다.
본 연구에서는 Qiskit-HumanEval 벤치마크를 사용하여 Qiskit 코드 생성을 위한 전문화 전략을 연구합니다.

전체 논문 읽은 느낌 요약

이 문서에서는 최근 코드 완성, 프로그램 합성 및 소프트웨어 유지 관리를 비롯한 광범위한 프로그래밍 작업을 자동화하기 위한 강력한 도구로 등장한 LLM(대규모 언어 모델)[1]을 다룹니다. 핵심 방법은 에이전트 실행 피드백이 런타임 비용이 증가하더라도 가장 일관된 개선을 제공하는 반면, RAG는 적당하고 모델에 따라 달라지는 이득을 제공한다는 것입니다. 주요 경험적 연구 결과에는 최신 범용 LLM이 매개변수 전문 기준선보다 지속적으로 뛰어난 성능을 발휘한다는 사실이 포함됩니다.