Picture for Kaihang Pan

Kaihang Pan

What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities

Add code
Jun 10, 2025
Viaarxiv icon

FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL

Add code
Jun 05, 2025
Viaarxiv icon

Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning

Add code
May 18, 2025
Viaarxiv icon

Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning

Add code
May 12, 2025
Viaarxiv icon

On Path to Multimodal Generalist: General-Level and General-Bench

Add code
May 07, 2025
Viaarxiv icon

Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning

Add code
Apr 22, 2025
Viaarxiv icon

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Add code
Apr 20, 2025
Viaarxiv icon

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Add code
Dec 13, 2024
Figure 1 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 2 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 3 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 4 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Viaarxiv icon

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

Add code
Nov 29, 2024
Viaarxiv icon

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

Add code
Nov 24, 2024
Figure 1 for AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Figure 2 for AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Figure 3 for AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Figure 4 for AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Viaarxiv icon