Picture for Wenzhe Li

Wenzhe Li

MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models

Add code
Dec 31, 2025
Viaarxiv icon

Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

Add code
Nov 13, 2025
Viaarxiv icon

The Ever-Evolving Science Exam

Add code
Jul 22, 2025
Figure 1 for The Ever-Evolving Science Exam
Figure 2 for The Ever-Evolving Science Exam
Figure 3 for The Ever-Evolving Science Exam
Figure 4 for The Ever-Evolving Science Exam
Viaarxiv icon

MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations

Add code
Feb 10, 2025
Figure 1 for MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Figure 2 for MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Figure 3 for MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Figure 4 for MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Viaarxiv icon

Towards Principled Superhuman AI for Multiplayer Symmetric Games

Add code
Jun 06, 2024
Figure 1 for Towards Principled Superhuman AI for Multiplayer Symmetric Games
Figure 2 for Towards Principled Superhuman AI for Multiplayer Symmetric Games
Viaarxiv icon

FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

Add code
Jun 04, 2024
Figure 1 for FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning
Figure 2 for FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning
Figure 3 for FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning
Figure 4 for FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning
Viaarxiv icon

A Survey on Transformers in Reinforcement Learning

Add code
Jan 08, 2023
Viaarxiv icon

Flow to Control: Offline Reinforcement Learning with Lossless Primitive Discovery

Add code
Dec 02, 2022
Figure 1 for Flow to Control: Offline Reinforcement Learning with Lossless Primitive Discovery
Figure 2 for Flow to Control: Offline Reinforcement Learning with Lossless Primitive Discovery
Figure 3 for Flow to Control: Offline Reinforcement Learning with Lossless Primitive Discovery
Figure 4 for Flow to Control: Offline Reinforcement Learning with Lossless Primitive Discovery
Viaarxiv icon

Improving Graph-Based Text Representations with Character and Word Level N-grams

Add code
Oct 12, 2022
Figure 1 for Improving Graph-Based Text Representations with Character and Word Level N-grams
Figure 2 for Improving Graph-Based Text Representations with Character and Word Level N-grams
Figure 3 for Improving Graph-Based Text Representations with Character and Word Level N-grams
Figure 4 for Improving Graph-Based Text Representations with Character and Word Level N-grams
Viaarxiv icon

Latent-Variable Advantage-Weighted Policy Optimization for Offline RL

Add code
Mar 16, 2022
Figure 1 for Latent-Variable Advantage-Weighted Policy Optimization for Offline RL
Figure 2 for Latent-Variable Advantage-Weighted Policy Optimization for Offline RL
Figure 3 for Latent-Variable Advantage-Weighted Policy Optimization for Offline RL
Figure 4 for Latent-Variable Advantage-Weighted Policy Optimization for Offline RL
Viaarxiv icon