Picture for Dong Yu

Dong Yu

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Add code
May 19, 2025
Viaarxiv icon

Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model

Add code
May 19, 2025
Viaarxiv icon

MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation

Add code
May 16, 2025
Viaarxiv icon

Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation

Add code
May 06, 2025
Viaarxiv icon

WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model

Add code
Apr 23, 2025
Viaarxiv icon

Enhancing Web Agents with Explicit Rollback Mechanisms

Add code
Apr 16, 2025
Viaarxiv icon

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Add code
Apr 15, 2025
Figure 1 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 2 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 3 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 4 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Viaarxiv icon

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains

Add code
Apr 01, 2025
Figure 1 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Figure 2 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Figure 3 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Figure 4 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Viaarxiv icon

Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique

Add code
Mar 21, 2025
Figure 1 for Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Figure 2 for Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Figure 3 for Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Figure 4 for Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Viaarxiv icon

FNSE-SBGAN: Far-field Speech Enhancement with Schrodinger Bridge and Generative Adversarial Networks

Add code
Mar 17, 2025
Viaarxiv icon