Picture for Yali Wang

Yali Wang

ShenZhen Key Lab of Computer Vision and Pattern Recognition, SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Add code
Jun 12, 2025
Viaarxiv icon

Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding

Add code
Jun 09, 2025
Viaarxiv icon

VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning

Add code
Jun 06, 2025
Viaarxiv icon

Video-GPT via Next Clip Diffusion

Add code
May 18, 2025
Viaarxiv icon

Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining

Add code
May 10, 2025
Viaarxiv icon

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Add code
Apr 10, 2025
Viaarxiv icon

V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents

Add code
Mar 15, 2025
Viaarxiv icon

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

Add code
Mar 13, 2025
Viaarxiv icon

TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision

Add code
Mar 10, 2025
Viaarxiv icon

Get In Video: Add Anything You Want to the Video

Add code
Mar 08, 2025
Viaarxiv icon