Picture for Mohammad Shoeybi

Mohammad Shoeybi

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Add code
Aug 21, 2025
Figure 1 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 2 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 3 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 4 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Viaarxiv icon

Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset

Add code
Aug 20, 2025
Viaarxiv icon

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy

Add code
Jun 16, 2025
Viaarxiv icon

Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning

Add code
May 26, 2025
Viaarxiv icon

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Add code
May 22, 2025
Viaarxiv icon

MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core

Add code
Apr 21, 2025
Viaarxiv icon

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning

Add code
Apr 15, 2025
Viaarxiv icon

NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning

Add code
Apr 15, 2025
Viaarxiv icon

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Add code
Apr 10, 2025
Viaarxiv icon

From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models

Add code
Apr 08, 2025
Viaarxiv icon