Picture for Yun Zheng

Yun Zheng

ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On

Add code
Jun 06, 2025
Viaarxiv icon

Aligned Better, Listen Better for Audio-Visual Large Language Models

Add code
Apr 02, 2025
Viaarxiv icon

Wan: Open and Advanced Large-Scale Video Generative Models

Add code
Mar 26, 2025
Viaarxiv icon

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

Add code
Mar 20, 2025
Viaarxiv icon

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach

Add code
Mar 05, 2025
Viaarxiv icon

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface

Add code
Mar 04, 2025
Viaarxiv icon

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs

Add code
Feb 19, 2025
Viaarxiv icon

ContextHOI: Spatial Context Learning for Human-Object Interaction Detection

Add code
Dec 12, 2024
Viaarxiv icon

Orchestrating the Symphony of Prompt Distribution Learning for Human-Object Interaction Detection

Add code
Dec 11, 2024
Viaarxiv icon

CoReS: Orchestrating the Dance of Reasoning and Segmentation

Add code
Apr 08, 2024
Viaarxiv icon