Alert button
Picture for Yuanshun Yao

Yuanshun Yao

Alert button

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

Add code
Bookmark button
Alert button
Mar 14, 2024
Wei Shen, Xiaoying Zhang, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu

Figure 1 for Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Figure 2 for Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Figure 3 for Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Figure 4 for Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Viaarxiv icon

Learning to Watermark LLM-generated Text via Reinforcement Learning

Add code
Bookmark button
Alert button
Mar 13, 2024
Xiaojun Xu, Yuanshun Yao, Yang Liu

Figure 1 for Learning to Watermark LLM-generated Text via Reinforcement Learning
Figure 2 for Learning to Watermark LLM-generated Text via Reinforcement Learning
Figure 3 for Learning to Watermark LLM-generated Text via Reinforcement Learning
Figure 4 for Learning to Watermark LLM-generated Text via Reinforcement Learning
Viaarxiv icon

Fair Classifiers Without Fair Training: An Influence-Guided Data Sampling Approach

Add code
Bookmark button
Alert button
Feb 20, 2024
Jinlong Pang, Jialu Wang, Zhaowei Zhu, Yuanshun Yao, Chen Qian, Yang Liu

Viaarxiv icon

Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting

Add code
Bookmark button
Alert button
Feb 16, 2024
Jiaheng Wei, Yuanshun Yao, Jean-Francois Ton, Hongyi Guo, Andrew Estornell, Yang Liu

Viaarxiv icon

Rethinking Machine Unlearning for Large Language Models

Add code
Bookmark button
Alert button
Feb 15, 2024
Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Xiaojun Xu, Yuguang Yao, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu

Viaarxiv icon

Human-Instruction-Free LLM Self-Alignment with Limited Samples

Add code
Bookmark button
Alert button
Jan 06, 2024
Hongyi Guo, Yuanshun Yao, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu

Viaarxiv icon

Large Language Model Unlearning

Add code
Bookmark button
Alert button
Oct 14, 2023
Yuanshun Yao, Xiaojun Xu, Yang Liu

Figure 1 for Large Language Model Unlearning
Figure 2 for Large Language Model Unlearning
Figure 3 for Large Language Model Unlearning
Figure 4 for Large Language Model Unlearning
Viaarxiv icon

Fair Classifiers that Abstain without Harm

Add code
Bookmark button
Alert button
Oct 09, 2023
Tongxin Yin, Jean-François Ton, Ruocheng Guo, Yuanshun Yao, Mingyan Liu, Yang Liu

Viaarxiv icon

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Add code
Bookmark button
Alert button
Aug 10, 2023
Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li

Figure 1 for Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Figure 2 for Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Figure 3 for Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Figure 4 for Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Viaarxiv icon

Understanding Unfairness via Training Concept Influence

Add code
Bookmark button
Alert button
Jun 30, 2023
Yuanshun Yao, Yang Liu

Figure 1 for Understanding Unfairness via Training Concept Influence
Figure 2 for Understanding Unfairness via Training Concept Influence
Figure 3 for Understanding Unfairness via Training Concept Influence
Figure 4 for Understanding Unfairness via Training Concept Influence
Viaarxiv icon