Picture for Weiyan Shi

Weiyan Shi

Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval

Add code
Jul 04, 2024
Viaarxiv icon

CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies

Add code
Apr 23, 2024
Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Mar 07, 2024
Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Add code
Feb 14, 2024
Viaarxiv icon

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

Add code
Jan 23, 2024
Figure 1 for How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Figure 2 for How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Figure 3 for How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Figure 4 for How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Viaarxiv icon

The Earth is Flat because: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation

Add code
Dec 29, 2023
Figure 1 for The Earth is Flat because: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Figure 2 for The Earth is Flat because: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Figure 3 for The Earth is Flat because: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Figure 4 for The Earth is Flat because: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Viaarxiv icon

From Scroll to Misbelief: Modeling the Unobservable Susceptibility to Misinformation on Social Media

Add code
Nov 16, 2023
Viaarxiv icon

Controllable Mixed-Initiative Dialogue Generation through Prompting

Add code
May 06, 2023
Figure 1 for Controllable Mixed-Initiative Dialogue Generation through Prompting
Figure 2 for Controllable Mixed-Initiative Dialogue Generation through Prompting
Figure 3 for Controllable Mixed-Initiative Dialogue Generation through Prompting
Figure 4 for Controllable Mixed-Initiative Dialogue Generation through Prompting
Viaarxiv icon

AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies

Add code
Nov 22, 2022
Figure 1 for AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies
Figure 2 for AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies
Figure 3 for AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies
Figure 4 for AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies
Viaarxiv icon

When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels

Add code
Oct 28, 2022
Figure 1 for When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Figure 2 for When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Figure 3 for When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Figure 4 for When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Viaarxiv icon