Picture for Wee Sun Lee

Wee Sun Lee

NUS

EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning

Add code
Dec 17, 2025
Viaarxiv icon

SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization

Add code
Nov 09, 2025
Viaarxiv icon

Defeating the Training-Inference Mismatch via FP16

Add code
Oct 30, 2025
Viaarxiv icon

Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation

Add code
Oct 01, 2025
Figure 1 for Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation
Figure 2 for Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation
Figure 3 for Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation
Figure 4 for Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation
Viaarxiv icon

SHIELD: Multi-task Multi-distribution Vehicle Routing Solver with Sparsity and Hierarchy

Add code
Jun 11, 2025
Viaarxiv icon

Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs

Add code
Jun 09, 2025
Viaarxiv icon

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Add code
May 19, 2025
Figure 1 for Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Figure 2 for Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Figure 3 for Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Figure 4 for Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Viaarxiv icon

Reasoning-CV: Fine-tuning Powerful Reasoning LLMs for Knowledge-Assisted Claim Verification

Add code
May 18, 2025
Viaarxiv icon

Approximation and Generalization Abilities of Score-based Neural Network Generative Models for Sub-Gaussian Distributions

Add code
May 16, 2025
Viaarxiv icon

Understanding R1-Zero-Like Training: A Critical Perspective

Add code
Mar 26, 2025
Figure 1 for Understanding R1-Zero-Like Training: A Critical Perspective
Figure 2 for Understanding R1-Zero-Like Training: A Critical Perspective
Figure 3 for Understanding R1-Zero-Like Training: A Critical Perspective
Figure 4 for Understanding R1-Zero-Like Training: A Critical Perspective
Viaarxiv icon