#5 Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Detailed Summary (EN)

Problem definition

9 4 Cascade RL and Multi-Domain On-Policy Distillation 9 4.1 Training Framework.

Core idea & method

with 3B activated parameters that delivers best-inclass reasoning and strong agentic capabilities.
Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models.
It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20× fewer parameters.
In contrast to Nemotron-Cascade 1, the key technical advancements are as follows.
After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains.

Experimental setup & results

9 4 Cascade RL and Multi-Domain On-Policy Distillation 9 4.1 Training Framework.

Limitations & risks

2026-03-16 Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Zhuolin Yang∗, Zihan Liu*, Yang Chen*, Wenliang Dai*, Boxin Wang*, Sheng-Chieh Lin, Chankyu Lee, Yangyi Chen, Dongfu Jiang, Jiafan He‡, Renjie Pi, Grace Lam, Nayeon Lee, Alexander Bukharin, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping*† Abstract We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-inclass reasoning and strong agentic capabilities.
Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models.
It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20× fewer parameters.
In contrast to Nemotron-Cascade 1, the key technical advancements are as follows.

Read-like-fullpaper digest

This paper addresses 9 4 Cascade RL and Multi-Domain On-Policy Distillation 9 4.1 Training Framework. The core method is with 3B activated parameters that delivers best-inclass reasoning and strong agentic capabilities. Key empirical findings include 9 4 Cascade RL and Multi-Domain On-Policy Distillation 9 4.1 Training Framework.

상세 요약 (KO)

문제 정의

9 4 캐스케이드 RL 및 다중 도메인 정책 증류 9 4.1 교육 프레임워크.

핵심 아이디어/방법

동급 최고의 추론과 강력한 에이전트 기능을 제공하는 3B 활성화 매개변수를 사용합니다.
컴팩트한 크기에도 불구하고 수학적 및 코딩 추론 성능은 프론티어 개방형 모델에 근접합니다.
이는 DeepSeekV3.2-Speciale-671B-A37B에 이어 2025년 국제 수학 올림피아드(IMO), 국제 정보학 올림피아드(IOI) 및 ICPC 월드 파이널에서 금메달 수준의 성능을 달성한 두 번째 개방형 LLM으로, 20배 더 적은 매개 변수로 현저하게 높은 지능 밀도를 보여줍니다.
Nemotron-Cascade 1과 비교하여 주요 기술 발전은 다음과 같습니다.
세심하게 선별된 데이터 세트에 대한 SFT 이후 우리는 Cascade RL을 실질적으로 확장하여 훨씬 더 광범위한 추론 및 에이전트 도메인을 포괄합니다.

실험 설정/결과

9 4 캐스케이드 RL 및 다중 도메인 정책 증류 9 4.1 교육 프레임워크.

한계/리스크

이 Bukharin, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping*† Abstract 동급 최고의 추론과 강력한 에이전트 기능을 제공하는 3B 활성화 매개변수를 갖춘 개방형 30B MoE 모델인 Nemotron-Cascade 2를 소개합니다.
컴팩트한 크기에도 불구하고 수학적 및 코딩 추론 성능은 프론티어 개방형 모델에 근접합니다.
이는 DeepSeekV3.2-Speciale-671B-A37B에 이어 2025년 국제 수학 올림피아드(IMO), 국제 정보학 올림피아드(IOI) 및 ICPC 월드 파이널에서 금메달 수준의 성능을 달성한 두 번째 개방형 LLM으로, 20배 더 적은 매개 변수로 현저하게 높은 지능 밀도를 보여줍니다.
Nemotron-Cascade 1과 비교하여 주요 기술 발전은 다음과 같습니다.

전체 논문 읽은 느낌 요약

이 백서는 9 4 계단식 RL 및 다중 도메인 정책 증류 9 4.1 교육 프레임워크를 다룹니다. 핵심 방법은 동급 최고의 추론과 강력한 에이전트 기능을 제공하는 3B 활성화 매개변수를 사용하는 것입니다. 주요 경험적 결과에는 9 4 Cascade RL 및 다중 도메인 정책 증류 9 4.1 교육 프레임워크가 포함됩니다.