Picture for YiFan Zhang

YiFan Zhang

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Add code
Mar 16, 2026
Viaarxiv icon

ContextRL: Enhancing MLLM's Knowledge Discovery Efficiency with Context-Augmented RL

Add code
Feb 26, 2026
Viaarxiv icon

MCIE: Multimodal LLM-Driven Complex Instruction Image Editing with Spatial Guidance

Add code
Feb 08, 2026
Viaarxiv icon

ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

Add code
Jan 30, 2026
Viaarxiv icon

OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data

Add code
May 29, 2025
Viaarxiv icon

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

Add code
May 27, 2025
Figure 1 for Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
Figure 2 for Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
Figure 3 for Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
Figure 4 for Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
Viaarxiv icon

VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform

Add code
Apr 21, 2025
Viaarxiv icon

InstructEngine: Instruction-driven Text-to-Image Alignment

Add code
Apr 14, 2025
Figure 1 for InstructEngine: Instruction-driven Text-to-Image Alignment
Figure 2 for InstructEngine: Instruction-driven Text-to-Image Alignment
Figure 3 for InstructEngine: Instruction-driven Text-to-Image Alignment
Figure 4 for InstructEngine: Instruction-driven Text-to-Image Alignment
Viaarxiv icon

DAMO: Data- and Model-aware Alignment of Multi-modal LLMs

Add code
Feb 04, 2025
Figure 1 for DAMO: Data- and Model-aware Alignment of Multi-modal LLMs
Figure 2 for DAMO: Data- and Model-aware Alignment of Multi-modal LLMs
Figure 3 for DAMO: Data- and Model-aware Alignment of Multi-modal LLMs
Figure 4 for DAMO: Data- and Model-aware Alignment of Multi-modal LLMs
Viaarxiv icon

Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models

Add code
Dec 17, 2024
Figure 1 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Figure 2 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Figure 3 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Figure 4 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Viaarxiv icon