Picture for Dian Yu

Dian Yu

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

Add code
Oct 02, 2025
Figure 1 for CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Figure 2 for CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Figure 3 for CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Figure 4 for CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Viaarxiv icon

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

Add code
Oct 01, 2025
Figure 1 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Figure 2 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Figure 3 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Figure 4 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Viaarxiv icon

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Add code
Sep 18, 2025
Viaarxiv icon

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

Add code
Sep 11, 2025
Figure 1 for CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Figure 2 for CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Figure 3 for CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Figure 4 for CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Viaarxiv icon

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Add code
Aug 27, 2025
Viaarxiv icon

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Add code
Apr 15, 2025
Figure 1 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 2 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 3 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 4 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Viaarxiv icon

Safe Flow Matching: Robot Motion Planning with Control Barrier Functions

Add code
Apr 11, 2025
Viaarxiv icon

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains

Add code
Apr 01, 2025
Viaarxiv icon

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

Add code
Feb 24, 2025
Figure 1 for Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
Figure 2 for Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
Figure 3 for Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
Viaarxiv icon

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Add code
Jan 30, 2025
Figure 1 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 2 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 3 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 4 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Viaarxiv icon