Picture for Yuanxing Zhang

Yuanxing Zhang

VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks

Add code
Jun 10, 2025
Viaarxiv icon

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Add code
May 28, 2025
Viaarxiv icon

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Add code
May 27, 2025
Viaarxiv icon

NGENT: Next-Generation AI Agents Must Integrate Multi-Domain Abilities to Achieve Artificial General Intelligence

Add code
Apr 30, 2025
Viaarxiv icon

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Add code
Apr 24, 2025
Viaarxiv icon

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

Add code
Apr 21, 2025
Viaarxiv icon

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Add code
Apr 14, 2025
Viaarxiv icon

Generative Frame Sampler for Long Video Understanding

Add code
Mar 12, 2025
Viaarxiv icon

HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models

Add code
Feb 28, 2025
Viaarxiv icon

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Add code
Feb 23, 2025
Viaarxiv icon