Picture for Zishun Yu

Zishun Yu

Boosting LLM Reasoning via Spontaneous Self-Correction

Add code
Jun 07, 2025
Viaarxiv icon

Language Model Distillation: A Temporal Difference Imitation Learning Perspective

Add code
May 24, 2025
Viaarxiv icon

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

Add code
Jan 31, 2025
Viaarxiv icon

$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis

Add code
Oct 04, 2023
Viaarxiv icon

Slowly Changing Adversarial Bandit Algorithms are Provably Efficient for Discounted MDPs

Add code
May 18, 2022
Figure 1 for Slowly Changing Adversarial Bandit Algorithms are Provably Efficient for Discounted MDPs
Figure 2 for Slowly Changing Adversarial Bandit Algorithms are Provably Efficient for Discounted MDPs
Viaarxiv icon

Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound

Add code
May 12, 2022
Figure 1 for Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound
Figure 2 for Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound
Figure 3 for Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound
Figure 4 for Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound
Viaarxiv icon