Picture for Yunhao Fang

Yunhao Fang

$π^{*}_{0.6}$: a VLA That Learns From Experience

Add code
Nov 19, 2025
Viaarxiv icon

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Add code
Nov 12, 2025
Figure 1 for Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Figure 2 for Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Figure 3 for Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Figure 4 for Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Viaarxiv icon

Artificial Hippocampus Networks for Efficient Long-Context Modeling

Add code
Oct 08, 2025
Viaarxiv icon

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

Add code
Jul 16, 2025
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Figure 1 for Seed1.5-VL Technical Report
Figure 2 for Seed1.5-VL Technical Report
Figure 3 for Seed1.5-VL Technical Report
Figure 4 for Seed1.5-VL Technical Report
Viaarxiv icon

WorldModelBench: Judging Video Generation Models As World Models

Add code
Feb 28, 2025
Figure 1 for WorldModelBench: Judging Video Generation Models As World Models
Figure 2 for WorldModelBench: Judging Video Generation Models As World Models
Figure 3 for WorldModelBench: Judging Video Generation Models As World Models
Figure 4 for WorldModelBench: Judging Video Generation Models As World Models
Viaarxiv icon

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Add code
Dec 18, 2024
Figure 1 for Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
Figure 2 for Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
Figure 3 for Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
Figure 4 for Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
Viaarxiv icon

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Add code
Sep 06, 2024
Figure 1 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 2 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 3 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 4 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Viaarxiv icon

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Add code
Aug 21, 2024
Figure 1 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 2 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 3 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 4 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Viaarxiv icon