Picture for Chaoyou Fu

Chaoyou Fu

Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs

Add code
May 28, 2025
Viaarxiv icon

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Add code
May 27, 2025
Viaarxiv icon

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Add code
May 27, 2025
Viaarxiv icon

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Add code
May 06, 2025
Viaarxiv icon

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Add code
May 05, 2025
Viaarxiv icon

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

Add code
Apr 07, 2025
Viaarxiv icon

Aligning Multimodal LLM with Human Preference: A Survey

Add code
Mar 18, 2025
Viaarxiv icon

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

Add code
Mar 11, 2025
Viaarxiv icon

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Add code
Feb 13, 2025
Figure 1 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 2 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 3 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 4 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Viaarxiv icon

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray

Add code
Feb 07, 2025
Viaarxiv icon