Picture for Banghua Zhu

Banghua Zhu

NVIDIA Nemotron 3: Efficient and Open Intelligence

Add code
Dec 24, 2025
Viaarxiv icon

dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

Add code
Dec 24, 2025
Viaarxiv icon

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Add code
Dec 23, 2025
Viaarxiv icon

Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression

Add code
Oct 01, 2025
Viaarxiv icon

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Add code
Aug 21, 2025
Figure 1 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 2 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 3 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 4 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Viaarxiv icon

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Add code
Aug 13, 2025
Viaarxiv icon

MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation

Add code
May 23, 2025
Figure 1 for MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
Figure 2 for MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
Figure 3 for MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
Figure 4 for MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
Viaarxiv icon

How to Evaluate Reward Models for RLHF

Add code
Oct 18, 2024
Figure 1 for How to Evaluate Reward Models for RLHF
Figure 2 for How to Evaluate Reward Models for RLHF
Figure 3 for How to Evaluate Reward Models for RLHF
Figure 4 for How to Evaluate Reward Models for RLHF
Viaarxiv icon

Taming Overconfidence in LLMs: Reward Calibration in RLHF

Add code
Oct 13, 2024
Figure 1 for Taming Overconfidence in LLMs: Reward Calibration in RLHF
Figure 2 for Taming Overconfidence in LLMs: Reward Calibration in RLHF
Figure 3 for Taming Overconfidence in LLMs: Reward Calibration in RLHF
Figure 4 for Taming Overconfidence in LLMs: Reward Calibration in RLHF
Viaarxiv icon

From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

Add code
Jun 17, 2024
Figure 1 for From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Figure 2 for From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Figure 3 for From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Figure 4 for From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Viaarxiv icon