Picture for Muyu He

Muyu He

Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents

Add code
Oct 06, 2025
Viaarxiv icon

TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games

Add code
May 21, 2025
Figure 1 for TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games
Figure 2 for TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games
Figure 3 for TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games
Figure 4 for TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games
Viaarxiv icon

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

Add code
Jun 11, 2024
Figure 1 for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Figure 2 for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Figure 3 for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Figure 4 for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Viaarxiv icon