Picture for Yaodong Yang

Yaodong Yang

ProgressGym: Alignment with a Millennium of Moral Progress

Add code
Jun 28, 2024
Viaarxiv icon

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

Add code
Jun 20, 2024
Viaarxiv icon

PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models

Add code
Jun 20, 2024
Viaarxiv icon

In-Context Editing: Learning Knowledge from Self-Induced Distributions

Add code
Jun 17, 2024
Figure 1 for In-Context Editing: Learning Knowledge from Self-Induced Distributions
Figure 2 for In-Context Editing: Learning Knowledge from Self-Induced Distributions
Figure 3 for In-Context Editing: Learning Knowledge from Self-Induced Distributions
Figure 4 for In-Context Editing: Learning Knowledge from Self-Induced Distributions
Viaarxiv icon

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

Add code
Jun 15, 2024
Figure 1 for Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Figure 2 for Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Figure 3 for Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Figure 4 for Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Viaarxiv icon

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

Add code
Jun 12, 2024
Figure 1 for Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning
Figure 2 for Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning
Figure 3 for Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning
Figure 4 for Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning
Viaarxiv icon

Language Models Resist Alignment

Add code
Jun 10, 2024
Viaarxiv icon

Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles

Add code
Jun 03, 2024
Viaarxiv icon

Efficient Model-agnostic Alignment via Bayesian Persuasion

Add code
May 29, 2024
Viaarxiv icon

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

Add code
Mar 19, 2024
Figure 1 for AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Figure 2 for AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Figure 3 for AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Figure 4 for AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Viaarxiv icon