Picture for Jingdong Wang

Jingdong Wang

VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction

Add code
Jun 05, 2025
Viaarxiv icon

Vision Remember: Alleviating Visual Forgetting in Efficient MLLM with Vision Feature Resample

Add code
Jun 04, 2025
Viaarxiv icon

Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation

Add code
May 29, 2025
Viaarxiv icon

No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves

Add code
May 05, 2025
Viaarxiv icon

AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers

Add code
Mar 25, 2025
Viaarxiv icon

Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers

Add code
Mar 13, 2025
Viaarxiv icon

MagicGeo: Training-Free Text-Guided Geometric Diagram Generation

Add code
Feb 19, 2025
Viaarxiv icon

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Add code
Jan 03, 2025
Figure 1 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 2 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 3 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 4 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Viaarxiv icon

Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities

Add code
Dec 21, 2024
Figure 1 for Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities
Figure 2 for Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities
Figure 3 for Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities
Figure 4 for Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities
Viaarxiv icon

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

Add code
Dec 18, 2024
Figure 1 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 2 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 3 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 4 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Viaarxiv icon