Alert button
Picture for Youliang Yuan

Youliang Yuan

Alert button

How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

Add code
Bookmark button
Alert button
Mar 18, 2024
Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang, Youliang Yuan, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Michael R. Lyu

Figure 1 for How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
Figure 2 for How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
Figure 3 for How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
Figure 4 for How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
Viaarxiv icon

New Job, New Gender? Measuring the Social Bias in Image Generation Models

Add code
Bookmark button
Alert button
Jan 01, 2024
Wenxuan Wang, Haonan Bai, Jen-tse Huang, Yuxuan Wan, Youliang Yuan, Haoyi Qiu, Nanyun Peng, Michael R. Lyu

Viaarxiv icon

The Earth is Flat? Unveiling Factual Errors in Large Language Models

Add code
Bookmark button
Alert button
Jan 01, 2024
Wenxuan Wang, Juluan Shi, Zhaopeng Tu, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

Viaarxiv icon

A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models

Add code
Bookmark button
Alert button
Jan 01, 2024
Yuxuan Wan, Wenxuan Wang, Yiliu Yang, Youliang Yuan, Jen-tse Huang, Pinjia He, Wenxiang Jiao, Michael R. Lyu

Viaarxiv icon

Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench

Add code
Bookmark button
Alert button
Oct 02, 2023
Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho Lam, Shujie Ren, Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu

Figure 1 for Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench
Figure 2 for Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench
Figure 3 for Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench
Figure 4 for Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench
Viaarxiv icon

All Languages Matter: On the Multilingual Safety of Large Language Models

Add code
Bookmark button
Alert button
Oct 02, 2023
Wenxuan Wang, Zhaopeng Tu, Chang Chen, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

Figure 1 for All Languages Matter: On the Multilingual Safety of Large Language Models
Figure 2 for All Languages Matter: On the Multilingual Safety of Large Language Models
Figure 3 for All Languages Matter: On the Multilingual Safety of Large Language Models
Figure 4 for All Languages Matter: On the Multilingual Safety of Large Language Models
Viaarxiv icon

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

Add code
Bookmark button
Alert button
Aug 12, 2023
Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Pinjia He, Shuming Shi, Zhaopeng Tu

Figure 1 for GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Figure 2 for GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Figure 3 for GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Figure 4 for GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Viaarxiv icon