Picture for Xiaobin Hu

Xiaobin Hu

Large-Scale Multidimensional Knowledge Profiling of Scientific Literature

Add code
Jan 21, 2026
Viaarxiv icon

M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding

Add code
Jan 13, 2026
Viaarxiv icon

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Add code
Jan 11, 2026
Viaarxiv icon

FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing

Add code
Jan 06, 2026
Viaarxiv icon

Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Add code
Dec 30, 2025
Viaarxiv icon

The devil is in the details: Enhancing Video Virtual Try-On via Keyframe-Driven Details Injection

Add code
Dec 23, 2025
Viaarxiv icon

Memory in the Age of AI Agents

Add code
Dec 15, 2025
Viaarxiv icon

Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10$\times$

Add code
Dec 15, 2025
Viaarxiv icon

Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation

Add code
Dec 15, 2025
Viaarxiv icon

VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models

Add code
Nov 14, 2025
Viaarxiv icon