Picture for Chaoyou Fu

Chaoyou Fu

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

Add code
Mar 23, 2026
Viaarxiv icon

MAC: A Conversion Rate Prediction Benchmark Featuring Labels Under Multiple Attribution Mechanisms

Add code
Mar 02, 2026
Viaarxiv icon

BABE: Biology Arena BEnchmark

Add code
Feb 05, 2026
Viaarxiv icon

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

Add code
Oct 10, 2025
Viaarxiv icon

BaseReward: A Strong Baseline for Multimodal Reward Model

Add code
Sep 19, 2025
Figure 1 for BaseReward: A Strong Baseline for Multimodal Reward Model
Figure 2 for BaseReward: A Strong Baseline for Multimodal Reward Model
Figure 3 for BaseReward: A Strong Baseline for Multimodal Reward Model
Figure 4 for BaseReward: A Strong Baseline for Multimodal Reward Model
Viaarxiv icon

Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs

Add code
May 28, 2025
Viaarxiv icon

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Add code
May 27, 2025
Viaarxiv icon

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Add code
May 27, 2025
Viaarxiv icon

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Add code
May 06, 2025
Viaarxiv icon

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Add code
May 05, 2025
Viaarxiv icon