Picture for Yonggan Fu

Yonggan Fu

Celine

NVIDIA Nemotron 3: Efficient and Open Intelligence

Add code
Dec 24, 2025
Viaarxiv icon

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Add code
Dec 23, 2025
Viaarxiv icon

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Add code
Dec 16, 2025
Figure 1 for Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed
Figure 2 for Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed
Figure 3 for Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed
Figure 4 for Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed
Viaarxiv icon

TiDAR: Think in Diffusion, Talk in Autoregression

Add code
Nov 12, 2025
Figure 1 for TiDAR: Think in Diffusion, Talk in Autoregression
Figure 2 for TiDAR: Think in Diffusion, Talk in Autoregression
Figure 3 for TiDAR: Think in Diffusion, Talk in Autoregression
Figure 4 for TiDAR: Think in Diffusion, Talk in Autoregression
Viaarxiv icon

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Add code
Aug 21, 2025
Figure 1 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 2 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 3 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 4 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Viaarxiv icon

Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment

Add code
Aug 08, 2025
Figure 1 for Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment
Figure 2 for Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment
Figure 3 for Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment
Figure 4 for Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment
Viaarxiv icon

LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement

Add code
Apr 22, 2025
Viaarxiv icon

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Add code
Apr 17, 2025
Figure 1 for CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Figure 2 for CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Figure 3 for CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Figure 4 for CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Viaarxiv icon

Hymba: A Hybrid-head Architecture for Small Language Models

Add code
Nov 20, 2024
Figure 1 for Hymba: A Hybrid-head Architecture for Small Language Models
Figure 2 for Hymba: A Hybrid-head Architecture for Small Language Models
Figure 3 for Hymba: A Hybrid-head Architecture for Small Language Models
Figure 4 for Hymba: A Hybrid-head Architecture for Small Language Models
Viaarxiv icon

AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment

Add code
Nov 15, 2024
Viaarxiv icon