Picture for Yuheng Zhang

Yuheng Zhang

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Add code
Jun 30, 2024
Viaarxiv icon

LCSim: A Large-Scale Controllable Traffic Simulator

Add code
Jun 28, 2024
Figure 1 for LCSim: A Large-Scale Controllable Traffic Simulator
Figure 2 for LCSim: A Large-Scale Controllable Traffic Simulator
Figure 3 for LCSim: A Large-Scale Controllable Traffic Simulator
Figure 4 for LCSim: A Large-Scale Controllable Traffic Simulator
Viaarxiv icon

Provably Efficient Interactive-Grounded Learning with Personalized Reward

Add code
May 31, 2024
Viaarxiv icon

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

Add code
Feb 22, 2024
Viaarxiv icon

Efficient Contextual Bandits with Uninformed Feedback Graphs

Add code
Feb 12, 2024
Figure 1 for Efficient Contextual Bandits with Uninformed Feedback Graphs
Figure 2 for Efficient Contextual Bandits with Uninformed Feedback Graphs
Viaarxiv icon

A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference

Add code
Feb 11, 2024
Figure 1 for A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference
Figure 2 for A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference
Figure 3 for A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference
Figure 4 for A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference
Viaarxiv icon

FRAD: Front-Running Attacks Detection on Ethereum using Ternary Classification Model

Add code
Nov 24, 2023
Viaarxiv icon

Offline Learning in Markov Games with General Function Approximation

Add code
Feb 06, 2023
Figure 1 for Offline Learning in Markov Games with General Function Approximation
Viaarxiv icon

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

Add code
Oct 04, 2022
Figure 1 for Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs
Viaarxiv icon

Improved Algorithms for Neural Active Learning

Add code
Oct 02, 2022
Figure 1 for Improved Algorithms for Neural Active Learning
Figure 2 for Improved Algorithms for Neural Active Learning
Figure 3 for Improved Algorithms for Neural Active Learning
Figure 4 for Improved Algorithms for Neural Active Learning
Viaarxiv icon