Picture for Sirui Zhao

Sirui Zhao

Tango: Taming Visual Signals for Efficient Video Large Language Models

Add code
Apr 13, 2026
Viaarxiv icon

ActFER: Agentic Facial Expression Recognition via Active Tool-Augmented Visual Reasoning

Add code
Apr 10, 2026
Viaarxiv icon

SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation

Add code
Apr 09, 2026
Viaarxiv icon

Dual-Path Learning based on Frequency Structural Decoupling and Regional-Aware Fusion for Low-Light Image Super-Resolution

Add code
Mar 28, 2026
Viaarxiv icon

ReMA: A Training-Free Plug-and-Play Mixing Augmentation for Video Behavior Recognition

Add code
Jan 01, 2026
Viaarxiv icon

LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression Comprehension

Add code
Nov 15, 2025
Figure 1 for LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression Comprehension
Figure 2 for LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression Comprehension
Figure 3 for LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression Comprehension
Figure 4 for LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression Comprehension
Viaarxiv icon

Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router

Add code
Jun 24, 2025
Figure 1 for Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router
Figure 2 for Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router
Figure 3 for Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router
Figure 4 for Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router
Viaarxiv icon

MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception

Add code
May 11, 2025
Figure 1 for MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception
Figure 2 for MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception
Figure 3 for MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception
Figure 4 for MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception
Viaarxiv icon

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs

Add code
Dec 02, 2024
Figure 1 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 2 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 3 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 4 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Viaarxiv icon

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

Add code
Nov 22, 2024
Figure 1 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Figure 2 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Figure 3 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Figure 4 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Viaarxiv icon