Picture for Fuzheng Zhang

Fuzheng Zhang

Kuaishou Natural Language Processing Center and Audio Center

Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search

Add code
Jun 11, 2025
Viaarxiv icon

DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding

Add code
Jun 04, 2025
Viaarxiv icon

Towards Reward Fairness in RLHF: From a Resource Allocation Perspective

Add code
May 29, 2025
Viaarxiv icon

What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning

Add code
May 28, 2025
Viaarxiv icon

TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos

Add code
May 26, 2025
Viaarxiv icon

Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval

Add code
May 26, 2025
Viaarxiv icon

Clapper: Compact Learning and Video Representation in VLMs

Add code
May 21, 2025
Viaarxiv icon

DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering

Add code
Apr 25, 2025
Viaarxiv icon

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Add code
Apr 14, 2025
Viaarxiv icon

Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing

Add code
Apr 10, 2025
Viaarxiv icon