Picture for Kaiwen Zheng

Kaiwen Zheng

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

Add code
Feb 13, 2026
Viaarxiv icon

Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues

Add code
Jan 28, 2026
Viaarxiv icon

Focal-RegionFace: Generating Fine-Grained Multi-attribute Descriptions for Arbitrarily Selected Face Focal Regions

Add code
Jan 01, 2026
Viaarxiv icon

Vidarc: Embodied Video Diffusion Model for Closed-loop Control

Add code
Dec 19, 2025
Figure 1 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 2 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 3 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 4 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Viaarxiv icon

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Add code
Dec 18, 2025
Viaarxiv icon

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Add code
Sep 19, 2025
Viaarxiv icon

Are Multimodal Embeddings Truly Beneficial for Recommendation? A Deep Dive into Whole vs. Individual Modalities

Add code
Aug 10, 2025
Viaarxiv icon

Bridging Supervised Learning and Reinforcement Learning in Math Reasoning

Add code
May 23, 2025
Viaarxiv icon

Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis

Add code
Apr 14, 2025
Figure 1 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 2 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 3 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 4 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Viaarxiv icon

CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation

Add code
Apr 14, 2025
Viaarxiv icon