Picture for Chenjun Xiao

Chenjun Xiao

Kimi K2: Open Agentic Intelligence

Add code
Jul 28, 2025
Viaarxiv icon

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Add code
Apr 15, 2025
Viaarxiv icon

Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning

Add code
Feb 07, 2025
Figure 1 for Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning
Figure 2 for Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning
Figure 3 for Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning
Figure 4 for Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning
Viaarxiv icon

Large Language Model-Enhanced Multi-Armed Bandits

Add code
Feb 03, 2025
Viaarxiv icon

$β$-DQN: Improving Deep Q-Learning By Evolving the Behavior

Add code
Jan 01, 2025
Figure 1 for $β$-DQN: Improving Deep Q-Learning By Evolving the Behavior
Figure 2 for $β$-DQN: Improving Deep Q-Learning By Evolving the Behavior
Figure 3 for $β$-DQN: Improving Deep Q-Learning By Evolving the Behavior
Figure 4 for $β$-DQN: Improving Deep Q-Learning By Evolving the Behavior
Viaarxiv icon

Hindsight Preference Learning for Offline Preference-based Reinforcement Learning

Add code
Jul 05, 2024
Figure 1 for Hindsight Preference Learning for Offline Preference-based Reinforcement Learning
Figure 2 for Hindsight Preference Learning for Offline Preference-based Reinforcement Learning
Figure 3 for Hindsight Preference Learning for Offline Preference-based Reinforcement Learning
Figure 4 for Hindsight Preference Learning for Offline Preference-based Reinforcement Learning
Viaarxiv icon

Diffusion Spectral Representation for Reinforcement Learning

Add code
Jun 23, 2024
Figure 1 for Diffusion Spectral Representation for Reinforcement Learning
Figure 2 for Diffusion Spectral Representation for Reinforcement Learning
Figure 3 for Diffusion Spectral Representation for Reinforcement Learning
Figure 4 for Diffusion Spectral Representation for Reinforcement Learning
Viaarxiv icon

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

Add code
May 31, 2024
Viaarxiv icon

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

Add code
Apr 23, 2024
Viaarxiv icon

Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning

Add code
Nov 20, 2023
Viaarxiv icon