Picture for Wei Shen

Wei Shen

AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

Add code
May 17, 2025
Viaarxiv icon

Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning

Add code
May 12, 2025
Viaarxiv icon

Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images

Add code
May 06, 2025
Viaarxiv icon

Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning

Add code
Apr 23, 2025
Viaarxiv icon

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

Add code
Apr 22, 2025
Viaarxiv icon

LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs

Add code
Apr 20, 2025
Viaarxiv icon

A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future

Add code
Apr 12, 2025
Viaarxiv icon

Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations

Add code
Apr 01, 2025
Viaarxiv icon

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Add code
Mar 31, 2025
Viaarxiv icon

Dereflection Any Image with Diffusion Priors and Diversified Data

Add code
Mar 21, 2025
Viaarxiv icon