Alert button
Picture for Weiyan Shi

Weiyan Shi

Alert button

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Bookmark button
Alert button
Mar 07, 2024
Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Add code
Bookmark button
Alert button
Feb 14, 2024
Myeongseob Ko, Feiyang Kang, Weiyan Shi, Ming Jin, Zhou Yu, Ruoxi Jia

Viaarxiv icon

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

Add code
Bookmark button
Alert button
Jan 23, 2024
Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, Weiyan Shi

Viaarxiv icon

The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation

Add code
Bookmark button
Alert button
Dec 29, 2023
Rongwu Xu, Brian S. Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang, Wei Xu, Han Qiu

Figure 1 for The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Figure 2 for The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Figure 3 for The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Figure 4 for The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Viaarxiv icon

From Scroll to Misbelief: Modeling the Unobservable Susceptibility to Misinformation on Social Media

Add code
Bookmark button
Alert button
Nov 16, 2023
Yanchen Liu, Mingyu Derek Ma, Wenna Qin, Azure Zhou, Jiaao Chen, Weiyan Shi, Wei Wang, Diyi Yang

Viaarxiv icon

Controllable Mixed-Initiative Dialogue Generation through Prompting

Add code
Bookmark button
Alert button
May 06, 2023
Maximillian Chen, Xiao Yu, Weiyan Shi, Urvi Awasthi, Zhou Yu

Figure 1 for Controllable Mixed-Initiative Dialogue Generation through Prompting
Figure 2 for Controllable Mixed-Initiative Dialogue Generation through Prompting
Figure 3 for Controllable Mixed-Initiative Dialogue Generation through Prompting
Figure 4 for Controllable Mixed-Initiative Dialogue Generation through Prompting
Viaarxiv icon

AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies

Add code
Bookmark button
Alert button
Nov 22, 2022
Weiyan Shi, Emily Dinan, Adi Renduchintala, Daniel Fried, Athul Paul Jacob, Zhou Yu, Mike Lewis

Figure 1 for AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies
Figure 2 for AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies
Figure 3 for AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies
Figure 4 for AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies
Viaarxiv icon

When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels

Add code
Bookmark button
Alert button
Oct 28, 2022
Weiyan Shi, Emily Dinan, Kurt Shuster, Jason Weston, Jing Xu

Figure 1 for When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Figure 2 for When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Figure 3 for When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Figure 4 for When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Viaarxiv icon