Picture for Dongfu Jiang

Dongfu Jiang

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Add code
Mar 19, 2026
Viaarxiv icon

EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning

Add code
Mar 13, 2026
Viaarxiv icon

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Add code
May 26, 2025
Figure 1 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Figure 2 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Figure 3 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Figure 4 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Viaarxiv icon

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Add code
May 22, 2025
Figure 1 for QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
Figure 2 for QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
Figure 3 for QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
Figure 4 for QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
Viaarxiv icon

General-Reasoner: Advancing LLM Reasoning Across All Domains

Add code
May 21, 2025
Figure 1 for General-Reasoner: Advancing LLM Reasoning Across All Domains
Figure 2 for General-Reasoner: Advancing LLM Reasoning Across All Domains
Figure 3 for General-Reasoner: Advancing LLM Reasoning Across All Domains
Figure 4 for General-Reasoner: Advancing LLM Reasoning Across All Domains
Viaarxiv icon

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Add code
Feb 03, 2025
Viaarxiv icon

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Add code
Oct 14, 2024
Figure 1 for MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Figure 2 for MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Figure 3 for MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Figure 4 for MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Viaarxiv icon

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Add code
Jun 24, 2024
Figure 1 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 2 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 3 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 4 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Viaarxiv icon

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Add code
Jun 16, 2024
Viaarxiv icon

GenAI Arena: An Open Evaluation Platform for Generative Models

Add code
Jun 06, 2024
Viaarxiv icon