Picture for Weixun Wang

Weixun Wang

Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

Add code
Jun 06, 2025
Viaarxiv icon

Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models

Add code
May 26, 2025
Viaarxiv icon

USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models

Add code
May 26, 2025
Viaarxiv icon

Think-J: Learning to Think for Generative LLM-as-a-Judge

Add code
May 20, 2025
Viaarxiv icon

Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation

Add code
Mar 20, 2025
Viaarxiv icon

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Add code
Feb 26, 2025
Viaarxiv icon

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Add code
Feb 23, 2025
Viaarxiv icon

ProgCo: Program Helps Self-Correction of Large Language Models

Add code
Jan 02, 2025
Viaarxiv icon

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Add code
Nov 13, 2024
Figure 1 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 2 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 3 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 4 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Viaarxiv icon

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Add code
Oct 25, 2024
Figure 1 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 2 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 3 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 4 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Viaarxiv icon