Picture for Xuekai Zhu

Xuekai Zhu

Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning

Add code
Mar 11, 2026
Viaarxiv icon

How Far Can Unsupervised RLVR Scale LLM Training?

Add code
Mar 09, 2026
Viaarxiv icon

Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets

Add code
Feb 11, 2026
Viaarxiv icon

SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks

Add code
Feb 06, 2026
Viaarxiv icon

Controlling Exploration-Exploitation in GFlowNets via Markov Chain Perspectives

Add code
Feb 03, 2026
Viaarxiv icon

FlowRL: Matching Reward Distributions for LLM Reasoning

Add code
Sep 18, 2025
Viaarxiv icon

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Add code
Sep 11, 2025
Viaarxiv icon

A Survey of Reinforcement Learning for Large Reasoning Models

Add code
Sep 10, 2025
Viaarxiv icon

Towards a Unified View of Large Language Model Post-Training

Add code
Sep 04, 2025
Figure 1 for Towards a Unified View of Large Language Model Post-Training
Figure 2 for Towards a Unified View of Large Language Model Post-Training
Figure 3 for Towards a Unified View of Large Language Model Post-Training
Figure 4 for Towards a Unified View of Large Language Model Post-Training
Viaarxiv icon

Reasoning with Exploration: An Entropy Perspective

Add code
Jun 17, 2025
Viaarxiv icon