Picture for Yang Zhao

Yang Zhao

Frank

VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery

Add code
Oct 06, 2025
Viaarxiv icon

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Add code
Sep 19, 2025
Viaarxiv icon

End-to-end image compression and reconstruction with ultrahigh speed and ultralow energy enabled by opto-electronic computing processor

Add code
Jul 30, 2025
Viaarxiv icon

Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit

Add code
Jul 28, 2025
Figure 1 for Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit
Figure 2 for Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit
Figure 3 for Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit
Figure 4 for Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit
Viaarxiv icon

Captain Cinema: Towards Short Movie Generation

Add code
Jul 24, 2025
Viaarxiv icon

Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation

Add code
Jul 10, 2025
Viaarxiv icon

Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search

Add code
Jul 03, 2025
Viaarxiv icon

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Add code
Jun 23, 2025
Figure 1 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Figure 2 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Figure 3 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Figure 4 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Viaarxiv icon

Leveraging Reference Documents for Zero-Shot Ranking via Large Language Models

Add code
Jun 13, 2025
Viaarxiv icon

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Add code
Jun 11, 2025
Figure 1 for Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
Figure 2 for Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
Figure 3 for Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
Figure 4 for Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
Viaarxiv icon