Picture for Yunlong Tang

Yunlong Tang

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Add code
Oct 06, 2025
Figure 1 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 2 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 3 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 4 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Viaarxiv icon

Can Sound Replace Vision in LLaVA With Token Substitution?

Add code
Jun 12, 2025
Viaarxiv icon

ZeroSep: Separate Anything in Audio with Zero Training

Add code
May 29, 2025
Viaarxiv icon

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Add code
May 26, 2025
Viaarxiv icon

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models

Add code
May 26, 2025
Viaarxiv icon

The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability

Add code
Apr 15, 2025
Viaarxiv icon

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

Add code
Apr 14, 2025
Viaarxiv icon

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Add code
Apr 09, 2025
Viaarxiv icon

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Add code
Apr 04, 2025
Figure 1 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 2 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 3 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 4 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Viaarxiv icon

FreSca: Unveiling the Scaling Space in Diffusion Models

Add code
Apr 02, 2025
Viaarxiv icon