Picture for Yun Zheng

Yun Zheng

AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

Add code
Mar 31, 2026
Viaarxiv icon

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning

Add code
Jan 26, 2026
Viaarxiv icon

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Add code
Dec 15, 2025
Viaarxiv icon

ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On

Add code
Jun 06, 2025
Viaarxiv icon

Aligned Better, Listen Better for Audio-Visual Large Language Models

Add code
Apr 02, 2025
Figure 1 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 2 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 3 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 4 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Viaarxiv icon

Wan: Open and Advanced Large-Scale Video Generative Models

Add code
Mar 26, 2025
Figure 1 for Wan: Open and Advanced Large-Scale Video Generative Models
Figure 2 for Wan: Open and Advanced Large-Scale Video Generative Models
Figure 3 for Wan: Open and Advanced Large-Scale Video Generative Models
Figure 4 for Wan: Open and Advanced Large-Scale Video Generative Models
Viaarxiv icon

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

Add code
Mar 20, 2025
Viaarxiv icon

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach

Add code
Mar 05, 2025
Figure 1 for Rethinking Video Tokenization: A Conditioned Diffusion-based Approach
Figure 2 for Rethinking Video Tokenization: A Conditioned Diffusion-based Approach
Figure 3 for Rethinking Video Tokenization: A Conditioned Diffusion-based Approach
Figure 4 for Rethinking Video Tokenization: A Conditioned Diffusion-based Approach
Viaarxiv icon

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface

Add code
Mar 04, 2025
Figure 1 for UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Figure 2 for UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Figure 3 for UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Figure 4 for UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Viaarxiv icon

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs

Add code
Feb 19, 2025
Figure 1 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 2 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 3 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 4 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Viaarxiv icon