Picture for Zhengzhong Tu

Zhengzhong Tu

Ben

MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding

Add code
Jul 16, 2025
Viaarxiv icon

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Add code
Jul 09, 2025
Viaarxiv icon

4KAgent: Agentic Any Image to 4K Super-Resolution

Add code
Jul 09, 2025
Viaarxiv icon

AirV2X: Unified Air-Ground Vehicle-to-Everything Collaboration

Add code
Jun 24, 2025
Viaarxiv icon

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

Add code
Jun 18, 2025
Viaarxiv icon

SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems

Add code
Jun 09, 2025
Viaarxiv icon

MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning

Add code
May 30, 2025
Viaarxiv icon

DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models

Add code
May 29, 2025
Viaarxiv icon

CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

Add code
May 29, 2025
Viaarxiv icon

mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation

Add code
May 29, 2025
Viaarxiv icon