Picture for Yuanhan Zhang

Yuanhan Zhang

From Pixels to Words -- Towards Native One-Vision Models at Scale

Add code
May 27, 2026
Viaarxiv icon

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

Add code
May 25, 2026
Viaarxiv icon

VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding

Add code
May 21, 2026
Viaarxiv icon

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Add code
Feb 09, 2026
Viaarxiv icon

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Add code
Mar 27, 2025
Viaarxiv icon

EgoLife: Towards Egocentric Life Assistant

Add code
Mar 05, 2025
Viaarxiv icon

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Add code
Jan 23, 2025
Figure 1 for Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
Figure 2 for Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
Figure 3 for Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
Figure 4 for Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
Viaarxiv icon

Video Instruction Tuning With Synthetic Data

Add code
Oct 03, 2024
Figure 1 for Video Instruction Tuning With Synthetic Data
Figure 2 for Video Instruction Tuning With Synthetic Data
Figure 3 for Video Instruction Tuning With Synthetic Data
Figure 4 for Video Instruction Tuning With Synthetic Data
Viaarxiv icon

LLaVA-OneVision: Easy Visual Task Transfer

Add code
Aug 06, 2024
Figure 1 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 2 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 3 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 4 for LLaVA-OneVision: Easy Visual Task Transfer
Viaarxiv icon