Picture for Zishun Yu

Zishun Yu

Boosting LLM Reasoning via Spontaneous Self-Correction

Add code
Jun 07, 2025
Figure 1 for Boosting LLM Reasoning via Spontaneous Self-Correction
Figure 2 for Boosting LLM Reasoning via Spontaneous Self-Correction
Figure 3 for Boosting LLM Reasoning via Spontaneous Self-Correction
Figure 4 for Boosting LLM Reasoning via Spontaneous Self-Correction
Viaarxiv icon

Language Model Distillation: A Temporal Difference Imitation Learning Perspective

Add code
May 24, 2025
Viaarxiv icon

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

Add code
Jan 31, 2025
Viaarxiv icon

$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis

Add code
Oct 04, 2023
Figure 1 for $\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis
Figure 2 for $\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis
Figure 3 for $\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis
Figure 4 for $\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis
Viaarxiv icon

Slowly Changing Adversarial Bandit Algorithms are Provably Efficient for Discounted MDPs

Add code
May 18, 2022
Figure 1 for Slowly Changing Adversarial Bandit Algorithms are Provably Efficient for Discounted MDPs
Figure 2 for Slowly Changing Adversarial Bandit Algorithms are Provably Efficient for Discounted MDPs
Viaarxiv icon

Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound

Add code
May 12, 2022
Figure 1 for Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound
Figure 2 for Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound
Figure 3 for Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound
Figure 4 for Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound
Viaarxiv icon