Picture for Zhenan Sun

Zhenan Sun

3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory

Add code
Dec 22, 2025
Viaarxiv icon

TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models

Add code
Dec 18, 2025
Viaarxiv icon

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Add code
Aug 20, 2025
Viaarxiv icon

ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension

Add code
Jul 22, 2025
Figure 1 for ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension
Figure 2 for ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension
Figure 3 for ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension
Figure 4 for ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension
Viaarxiv icon

TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

Add code
May 08, 2025
Viaarxiv icon

Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images

Add code
May 06, 2025
Figure 1 for Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Figure 2 for Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Figure 3 for Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Figure 4 for Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Viaarxiv icon

Learning Knowledge-based Prompts for Robust 3D Mask Presentation Attack Detection

Add code
May 06, 2025
Viaarxiv icon

VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction

Add code
Apr 30, 2025
Figure 1 for VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction
Figure 2 for VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction
Figure 3 for VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction
Figure 4 for VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction
Viaarxiv icon

Follow-Your-MultiPose: Tuning-Free Multi-Character Text-to-Video Generation via Pose Guidance

Add code
Dec 21, 2024
Figure 1 for Follow-Your-MultiPose: Tuning-Free Multi-Character Text-to-Video Generation via Pose Guidance
Figure 2 for Follow-Your-MultiPose: Tuning-Free Multi-Character Text-to-Video Generation via Pose Guidance
Figure 3 for Follow-Your-MultiPose: Tuning-Free Multi-Character Text-to-Video Generation via Pose Guidance
Figure 4 for Follow-Your-MultiPose: Tuning-Free Multi-Character Text-to-Video Generation via Pose Guidance
Viaarxiv icon

Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

Add code
Nov 28, 2024
Viaarxiv icon