Picture for Wenkai Yang

Wenkai Yang

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Add code
Apr 14, 2026
Viaarxiv icon

AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents

Add code
Mar 15, 2026
Viaarxiv icon

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Add code
Feb 12, 2026
Viaarxiv icon

Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning

Add code
Jun 09, 2025
Viaarxiv icon

DeepCritic: Deliberate Critique with Large Language Models

Add code
May 01, 2025
Viaarxiv icon

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Add code
Feb 25, 2025
Viaarxiv icon

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

Add code
Jun 17, 2024
Figure 1 for Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Figure 2 for Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Figure 3 for Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Figure 4 for Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Viaarxiv icon

Exploring Backdoor Vulnerabilities of Chat Models

Add code
Apr 03, 2024
Figure 1 for Exploring Backdoor Vulnerabilities of Chat Models
Figure 2 for Exploring Backdoor Vulnerabilities of Chat Models
Figure 3 for Exploring Backdoor Vulnerabilities of Chat Models
Figure 4 for Exploring Backdoor Vulnerabilities of Chat Models
Viaarxiv icon

Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents

Add code
Feb 17, 2024
Viaarxiv icon

Enabling Large Language Models to Learn from Rules

Add code
Nov 15, 2023
Viaarxiv icon