Picture for Wei Shen

Wei Shen

Amazon

Reward-Driven Interaction: Enhancing Proactive Dialogue Agents through User Satisfaction Prediction

Add code
May 24, 2025
Viaarxiv icon

NAN: A Training-Free Solution to Coefficient Estimation in Model Merging

Add code
May 22, 2025
Viaarxiv icon

Why Can Accurate Models Be Learned from Inaccurate Annotations?

Add code
May 22, 2025
Viaarxiv icon

AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

Add code
May 17, 2025
Viaarxiv icon

Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning

Add code
May 12, 2025
Viaarxiv icon

Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images

Add code
May 06, 2025
Figure 1 for Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Figure 2 for Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Figure 3 for Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Figure 4 for Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Viaarxiv icon

Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning

Add code
Apr 23, 2025
Viaarxiv icon

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

Add code
Apr 22, 2025
Figure 1 for Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Figure 2 for Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Figure 3 for Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Figure 4 for Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Viaarxiv icon

LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs

Add code
Apr 20, 2025
Figure 1 for LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs
Figure 2 for LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs
Figure 3 for LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs
Figure 4 for LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs
Viaarxiv icon

A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future

Add code
Apr 12, 2025
Viaarxiv icon