Picture for Jingdong Wang

Jingdong Wang

Query-Kontext: An Unified Multimodal Model for Image Generation and Editing

Add code
Sep 30, 2025
Figure 1 for Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Figure 2 for Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Figure 3 for Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Figure 4 for Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Viaarxiv icon

Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models

Add code
Sep 16, 2025
Viaarxiv icon

Can Understanding and Generation Truly Benefit Together -- or Just Coexist?

Add code
Sep 11, 2025
Figure 1 for Can Understanding and Generation Truly Benefit Together -- or Just Coexist?
Figure 2 for Can Understanding and Generation Truly Benefit Together -- or Just Coexist?
Figure 3 for Can Understanding and Generation Truly Benefit Together -- or Just Coexist?
Figure 4 for Can Understanding and Generation Truly Benefit Together -- or Just Coexist?
Viaarxiv icon

iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer

Add code
Jun 15, 2025
Viaarxiv icon

VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction

Add code
Jun 05, 2025
Viaarxiv icon

Vision Remember: Alleviating Visual Forgetting in Efficient MLLM with Vision Feature Resample

Add code
Jun 04, 2025
Viaarxiv icon

Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation

Add code
May 29, 2025
Viaarxiv icon

No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves

Add code
May 05, 2025
Viaarxiv icon

AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers

Add code
Mar 25, 2025
Viaarxiv icon

Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers

Add code
Mar 13, 2025
Viaarxiv icon