Picture for Zhi Rui Tam

Zhi Rui Tam

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

Add code
Mar 29, 2026
Viaarxiv icon

Expected Harm: Rethinking Safety Evaluation of (Mis)Aligned LLMs

Add code
Feb 02, 2026
Viaarxiv icon

MedVoiceBias: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making

Add code
Nov 10, 2025
Viaarxiv icon

Language Matters: How Do Multilingual Input and Reasoning Paths Affect Large Reasoning Models?

Add code
May 23, 2025
Viaarxiv icon

VisTW: Benchmarking Vision-Language Models for Traditional Chinese in Taiwan

Add code
Mar 15, 2025
Viaarxiv icon

VisTai: Benchmarking Vision-Language Models for Traditional Chinese in Taiwan

Add code
Mar 13, 2025
Viaarxiv icon

Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models

Add code
Mar 03, 2025
Viaarxiv icon

None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering

Add code
Mar 03, 2025
Viaarxiv icon

Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity

Add code
Jan 24, 2025
Figure 1 for Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity
Figure 2 for Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity
Figure 3 for Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity
Figure 4 for Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity
Viaarxiv icon

Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models

Add code
Aug 05, 2024
Viaarxiv icon