Picture for Kaiwen Zheng

Kaiwen Zheng

Focal-RegionFace: Generating Fine-Grained Multi-attribute Descriptions for Arbitrarily Selected Face Focal Regions

Add code
Jan 01, 2026
Viaarxiv icon

Vidarc: Embodied Video Diffusion Model for Closed-loop Control

Add code
Dec 19, 2025
Figure 1 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 2 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 3 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 4 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Viaarxiv icon

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Add code
Dec 18, 2025
Viaarxiv icon

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Add code
Sep 19, 2025
Viaarxiv icon

Are Multimodal Embeddings Truly Beneficial for Recommendation? A Deep Dive into Whole vs. Individual Modalities

Add code
Aug 10, 2025
Viaarxiv icon

Bridging Supervised Learning and Reinforcement Learning in Math Reasoning

Add code
May 23, 2025
Viaarxiv icon

Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis

Add code
Apr 14, 2025
Figure 1 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 2 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 3 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 4 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Viaarxiv icon

CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation

Add code
Apr 14, 2025
Viaarxiv icon

Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator

Add code
Mar 03, 2025
Viaarxiv icon

LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation

Add code
Feb 19, 2025
Figure 1 for LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation
Figure 2 for LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation
Figure 3 for LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation
Figure 4 for LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation
Viaarxiv icon