Picture for Yazhe Niu

Yazhe Niu

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Add code
Apr 15, 2025
Viaarxiv icon

Empowering LLMs in Decision Games through Algorithmic Data Synthesis

Add code
Mar 18, 2025
Viaarxiv icon

Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM

Add code
Mar 10, 2025
Viaarxiv icon

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

Add code
Dec 02, 2024
Viaarxiv icon

Pretrained Reversible Generation as Unsupervised Visual Representation Learning

Add code
Nov 29, 2024
Viaarxiv icon

PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements

Add code
Jul 22, 2024
Viaarxiv icon

UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Add code
Jun 15, 2024
Figure 1 for UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Figure 2 for UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Figure 3 for UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Figure 4 for UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Viaarxiv icon

ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze

Add code
Apr 28, 2024
Viaarxiv icon

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

Add code
Dec 12, 2023
Figure 1 for A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Figure 2 for A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Figure 3 for A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Figure 4 for A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Viaarxiv icon

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Add code
Oct 12, 2023
Figure 1 for LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
Figure 2 for LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
Figure 3 for LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
Figure 4 for LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
Viaarxiv icon