Picture for Zhaopeng Tu

Zhaopeng Tu

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Add code
May 19, 2025
Viaarxiv icon

Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models

Add code
May 01, 2025
Figure 1 for Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Figure 2 for Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Figure 3 for Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Figure 4 for Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Viaarxiv icon

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Add code
Apr 27, 2025
Viaarxiv icon

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Add code
Apr 15, 2025
Figure 1 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 2 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 3 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Figure 4 for DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Viaarxiv icon

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains

Add code
Apr 01, 2025
Figure 1 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Figure 2 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Figure 3 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Figure 4 for Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Viaarxiv icon

Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique

Add code
Mar 21, 2025
Figure 1 for Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Figure 2 for Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Figure 3 for Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Figure 4 for Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Viaarxiv icon

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement

Add code
Mar 20, 2025
Viaarxiv icon

RaSA: Rank-Sharing Low-Rank Adaptation

Add code
Mar 16, 2025
Viaarxiv icon

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models

Add code
Mar 04, 2025
Figure 1 for The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models
Figure 2 for The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models
Figure 3 for The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models
Figure 4 for The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models
Viaarxiv icon

VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models

Add code
Feb 23, 2025
Figure 1 for VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models
Figure 2 for VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models
Figure 3 for VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models
Figure 4 for VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models
Viaarxiv icon