Picture for Yuanshun Yao

Yuanshun Yao

Toward Optimal LLM Alignments Using Two-Player Games

Add code
Jun 16, 2024
Viaarxiv icon

Label Smoothing Improves Machine Unlearning

Jun 11, 2024
Viaarxiv icon

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

Add code
Mar 14, 2024
Figure 1 for Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Figure 2 for Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Figure 3 for Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Figure 4 for Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Viaarxiv icon

Learning to Watermark LLM-generated Text via Reinforcement Learning

Add code
Mar 13, 2024
Figure 1 for Learning to Watermark LLM-generated Text via Reinforcement Learning
Figure 2 for Learning to Watermark LLM-generated Text via Reinforcement Learning
Figure 3 for Learning to Watermark LLM-generated Text via Reinforcement Learning
Figure 4 for Learning to Watermark LLM-generated Text via Reinforcement Learning
Viaarxiv icon

Fair Classifiers Without Fair Training: An Influence-Guided Data Sampling Approach

Feb 20, 2024
Figure 1 for Fair Classifiers Without Fair Training: An Influence-Guided Data Sampling Approach
Figure 2 for Fair Classifiers Without Fair Training: An Influence-Guided Data Sampling Approach
Figure 3 for Fair Classifiers Without Fair Training: An Influence-Guided Data Sampling Approach
Figure 4 for Fair Classifiers Without Fair Training: An Influence-Guided Data Sampling Approach
Viaarxiv icon

Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting

Add code
Feb 16, 2024
Viaarxiv icon

Rethinking Machine Unlearning for Large Language Models

Feb 15, 2024
Figure 1 for Rethinking Machine Unlearning for Large Language Models
Figure 2 for Rethinking Machine Unlearning for Large Language Models
Viaarxiv icon

Human-Instruction-Free LLM Self-Alignment with Limited Samples

Add code
Jan 06, 2024
Viaarxiv icon

Large Language Model Unlearning

Add code
Oct 14, 2023
Figure 1 for Large Language Model Unlearning
Figure 2 for Large Language Model Unlearning
Figure 3 for Large Language Model Unlearning
Figure 4 for Large Language Model Unlearning
Viaarxiv icon

Fair Classifiers that Abstain without Harm

Oct 09, 2023
Figure 1 for Fair Classifiers that Abstain without Harm
Figure 2 for Fair Classifiers that Abstain without Harm
Figure 3 for Fair Classifiers that Abstain without Harm
Figure 4 for Fair Classifiers that Abstain without Harm
Viaarxiv icon