Picture for Yuchuan Wu

Yuchuan Wu

P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling

Add code
Feb 12, 2026
Viaarxiv icon

Reward Modeling from Natural Language Human Feedback

Add code
Jan 12, 2026
Viaarxiv icon

MOA: Multi-Objective Alignment for Role-Playing Agents

Add code
Dec 10, 2025
Viaarxiv icon

CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization

Add code
Aug 12, 2025
Viaarxiv icon

TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence

Add code
May 30, 2025
Viaarxiv icon

ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents

Add code
May 29, 2025
Figure 1 for ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Figure 2 for ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Figure 3 for ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Figure 4 for ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Viaarxiv icon

Reverse Preference Optimization for Complex Instruction Following

Add code
May 28, 2025
Viaarxiv icon

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction

Add code
May 26, 2025
Figure 1 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 2 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 3 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 4 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Viaarxiv icon

EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning

Add code
Feb 18, 2025
Figure 1 for EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
Figure 2 for EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
Figure 3 for EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
Figure 4 for EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
Viaarxiv icon

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

Add code
Jan 08, 2025
Figure 1 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Figure 2 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Figure 3 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Figure 4 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Viaarxiv icon