Picture for Raoyuan Zhao

Raoyuan Zhao

Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors

Add code
Oct 10, 2025
Viaarxiv icon

A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages

Add code
Oct 10, 2025
Viaarxiv icon

Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing

Add code
May 27, 2025
Viaarxiv icon

MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs

Add code
May 27, 2025
Viaarxiv icon

What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns

Add code
Apr 22, 2025
Figure 1 for What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
Figure 2 for What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
Figure 3 for What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
Figure 4 for What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
Viaarxiv icon

SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists

Add code
Aug 30, 2024
Figure 1 for SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists
Figure 2 for SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists
Figure 3 for SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists
Figure 4 for SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists
Viaarxiv icon