Picture for Jingdong Wang

Jingdong Wang

VAE-REPA: Variational Autoencoder Representation Alignment for Efficient Diffusion Training

Add code
Jan 25, 2026
Viaarxiv icon

MixFlow Training: Alleviating Exposure Bias with Slowed Interpolation Mixture

Add code
Dec 22, 2025
Viaarxiv icon

GeoLoom: High-quality Geometric Diagram Generation from Textual Input

Add code
Dec 09, 2025
Viaarxiv icon

Query-Kontext: An Unified Multimodal Model for Image Generation and Editing

Add code
Sep 30, 2025
Figure 1 for Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Figure 2 for Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Figure 3 for Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Figure 4 for Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Viaarxiv icon

Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models

Add code
Sep 16, 2025
Viaarxiv icon

Can Understanding and Generation Truly Benefit Together -- or Just Coexist?

Add code
Sep 11, 2025
Figure 1 for Can Understanding and Generation Truly Benefit Together -- or Just Coexist?
Figure 2 for Can Understanding and Generation Truly Benefit Together -- or Just Coexist?
Figure 3 for Can Understanding and Generation Truly Benefit Together -- or Just Coexist?
Figure 4 for Can Understanding and Generation Truly Benefit Together -- or Just Coexist?
Viaarxiv icon

iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer

Add code
Jun 15, 2025
Viaarxiv icon

VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction

Add code
Jun 05, 2025
Viaarxiv icon

Vision Remember: Alleviating Visual Forgetting in Efficient MLLM with Vision Feature Resample

Add code
Jun 04, 2025
Viaarxiv icon

Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation

Add code
May 29, 2025
Viaarxiv icon