Picture for Jiafan He

Jiafan He

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Add code
Mar 19, 2026
Viaarxiv icon

Variance-Dependent Regret Lower Bounds for Contextual Bandits

Add code
Mar 15, 2025
Viaarxiv icon

Accelerated Preference Optimization for Large Language Model Alignment

Add code
Oct 08, 2024
Figure 1 for Accelerated Preference Optimization for Large Language Model Alignment
Figure 2 for Accelerated Preference Optimization for Large Language Model Alignment
Figure 3 for Accelerated Preference Optimization for Large Language Model Alignment
Figure 4 for Accelerated Preference Optimization for Large Language Model Alignment
Viaarxiv icon

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

Add code
Apr 16, 2024
Figure 1 for Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Figure 2 for Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Figure 3 for Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Viaarxiv icon

Settling Constant Regrets in Linear Markov Decision Processes

Add code
Apr 16, 2024
Figure 1 for Settling Constant Regrets in Linear Markov Decision Processes
Figure 2 for Settling Constant Regrets in Linear Markov Decision Processes
Figure 3 for Settling Constant Regrets in Linear Markov Decision Processes
Figure 4 for Settling Constant Regrets in Linear Markov Decision Processes
Viaarxiv icon

Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

Add code
Feb 15, 2024
Viaarxiv icon

Reinforcement Learning from Human Feedback with Active Queries

Add code
Feb 14, 2024
Figure 1 for Reinforcement Learning from Human Feedback with Active Queries
Figure 2 for Reinforcement Learning from Human Feedback with Active Queries
Figure 3 for Reinforcement Learning from Human Feedback with Active Queries
Figure 4 for Reinforcement Learning from Human Feedback with Active Queries
Viaarxiv icon

Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

Add code
Feb 14, 2024
Figure 1 for Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path
Figure 2 for Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path
Figure 3 for Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path
Viaarxiv icon

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

Add code
Nov 26, 2023
Figure 1 for A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation
Viaarxiv icon

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

Add code
Oct 02, 2023
Figure 1 for Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
Viaarxiv icon