Picture for Yang Zhao

Yang Zhao

Frank

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Add code
Dec 23, 2025
Viaarxiv icon

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Add code
Dec 17, 2025
Viaarxiv icon

VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery

Add code
Oct 06, 2025
Viaarxiv icon

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Add code
Sep 19, 2025
Figure 1 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 2 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 3 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 4 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Viaarxiv icon

End-to-end image compression and reconstruction with ultrahigh speed and ultralow energy enabled by opto-electronic computing processor

Add code
Jul 30, 2025
Viaarxiv icon

Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit

Add code
Jul 28, 2025
Figure 1 for Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit
Figure 2 for Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit
Figure 3 for Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit
Figure 4 for Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit
Viaarxiv icon

Captain Cinema: Towards Short Movie Generation

Add code
Jul 24, 2025
Viaarxiv icon

Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation

Add code
Jul 10, 2025
Viaarxiv icon

Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search

Add code
Jul 03, 2025
Viaarxiv icon

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Add code
Jun 23, 2025
Figure 1 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Figure 2 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Figure 3 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Figure 4 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Viaarxiv icon