Picture for Xueru Wen

Xueru Wen

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

Add code
Mar 10, 2026
Viaarxiv icon

Coupled Variational Reinforcement Learning for Language Model General Reasoning

Add code
Dec 14, 2025
Viaarxiv icon

The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models

Add code
Mar 05, 2025
Viaarxiv icon

Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

Add code
Feb 24, 2025
Figure 1 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Figure 2 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Figure 3 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Figure 4 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Viaarxiv icon

Scalable Oversight for Superhuman AI via Recursive Self-Critiquing

Add code
Feb 07, 2025
Viaarxiv icon

Transferable Post-training via Inverse Value Learning

Add code
Oct 28, 2024
Figure 1 for Transferable Post-training via Inverse Value Learning
Figure 2 for Transferable Post-training via Inverse Value Learning
Figure 3 for Transferable Post-training via Inverse Value Learning
Figure 4 for Transferable Post-training via Inverse Value Learning
Viaarxiv icon

Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?

Add code
Oct 08, 2024
Figure 1 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Figure 2 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Figure 3 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Figure 4 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Viaarxiv icon

Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic

Add code
Aug 29, 2024
Figure 1 for Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Figure 2 for Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Figure 3 for Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Figure 4 for Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Viaarxiv icon

On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation

Add code
Jun 18, 2024
Figure 1 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Figure 2 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Figure 3 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Figure 4 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Viaarxiv icon

Offline Pseudo Relevance Feedback for Efficient and Effective Single-pass Dense Retrieval

Add code
Aug 20, 2023
Figure 1 for Offline Pseudo Relevance Feedback for Efficient and Effective Single-pass Dense Retrieval
Figure 2 for Offline Pseudo Relevance Feedback for Efficient and Effective Single-pass Dense Retrieval
Figure 3 for Offline Pseudo Relevance Feedback for Efficient and Effective Single-pass Dense Retrieval
Viaarxiv icon