Picture for Luchuan Song

Luchuan Song

Mark

TDMM-LM: Bridging Facial Understanding and Animation via Language Models

Add code
Mar 14, 2026
Viaarxiv icon

Talking Together: Synthesizing Co-Located 3D Conversations from Audio

Add code
Mar 09, 2026
Viaarxiv icon

Classroom Final Exam: An Instructor-Tested Reasoning Benchmark

Add code
Feb 23, 2026
Viaarxiv icon

Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?

Add code
Feb 02, 2026
Viaarxiv icon

When to Think and When to Look: Uncertainty-Guided Lookback

Add code
Nov 19, 2025
Figure 1 for When to Think and When to Look: Uncertainty-Guided Lookback
Figure 2 for When to Think and When to Look: Uncertainty-Guided Lookback
Figure 3 for When to Think and When to Look: Uncertainty-Guided Lookback
Figure 4 for When to Think and When to Look: Uncertainty-Guided Lookback
Viaarxiv icon

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Add code
Oct 06, 2025
Figure 1 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 2 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 3 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 4 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Viaarxiv icon

StreamME: Simplify 3D Gaussian Avatar within Live Stream

Add code
Jul 22, 2025
Viaarxiv icon

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Add code
May 26, 2025
Viaarxiv icon

Intentional Gesture: Deliver Your Intentions with Gestures for Speech

Add code
May 21, 2025
Viaarxiv icon

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Add code
Apr 09, 2025
Viaarxiv icon