Picture for Yutao Fan

Yutao Fan

Can RL Improve Generalization of LLM Agents? An Empirical Study

Add code
Mar 12, 2026
Viaarxiv icon

Reading $ eq$ Seeing: Diagnosing and Closing the Typography Gap in Vision-Language Models

Add code
Mar 09, 2026
Viaarxiv icon

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

Add code
Feb 13, 2026
Viaarxiv icon

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

Add code
Oct 04, 2024
Figure 1 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Figure 2 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Figure 3 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Figure 4 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Viaarxiv icon