Picture for Dong Yan

Dong Yan

What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time

Add code
Mar 20, 2026
Viaarxiv icon

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

Add code
Feb 12, 2026
Viaarxiv icon

STAIR: Improving Safety Alignment with Introspective Reasoning

Add code
Feb 04, 2025
Figure 1 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 2 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 3 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 4 for STAIR: Improving Safety Alignment with Introspective Reasoning
Viaarxiv icon

Baichuan4-Finance Technical Report

Add code
Dec 17, 2024
Figure 1 for Baichuan4-Finance Technical Report
Figure 2 for Baichuan4-Finance Technical Report
Figure 3 for Baichuan4-Finance Technical Report
Figure 4 for Baichuan4-Finance Technical Report
Viaarxiv icon

Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search

Add code
Nov 18, 2024
Figure 1 for Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search
Figure 2 for Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search
Figure 3 for Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search
Figure 4 for Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search
Viaarxiv icon

Boosting Deductive Reasoning with Step Signals In RLHF

Add code
Oct 12, 2024
Figure 1 for Boosting Deductive Reasoning with Step Signals In RLHF
Figure 2 for Boosting Deductive Reasoning with Step Signals In RLHF
Figure 3 for Boosting Deductive Reasoning with Step Signals In RLHF
Figure 4 for Boosting Deductive Reasoning with Step Signals In RLHF
Viaarxiv icon

Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown

Add code
Oct 01, 2024
Viaarxiv icon

3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

Add code
Jun 11, 2024
Figure 1 for 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Figure 2 for 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Figure 3 for 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Figure 4 for 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Viaarxiv icon

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Add code
May 27, 2024
Figure 1 for Exploring the LLM Journey from Cognition to Expression with Linear Representations
Figure 2 for Exploring the LLM Journey from Cognition to Expression with Linear Representations
Figure 3 for Exploring the LLM Journey from Cognition to Expression with Linear Representations
Figure 4 for Exploring the LLM Journey from Cognition to Expression with Linear Representations
Viaarxiv icon

SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling

Add code
May 21, 2024
Figure 1 for SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling
Figure 2 for SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling
Figure 3 for SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling
Figure 4 for SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling
Viaarxiv icon