Picture for Chao Wang

Chao Wang

X-Dancer: Expressive Music to Human Dance Video Generation

Add code
Feb 24, 2025
Viaarxiv icon

Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark

Add code
Feb 08, 2025
Figure 1 for Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark
Figure 2 for Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark
Figure 3 for Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark
Figure 4 for Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark
Viaarxiv icon

Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models

Add code
Feb 06, 2025
Figure 1 for Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models
Figure 2 for Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models
Figure 3 for Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models
Figure 4 for Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models
Viaarxiv icon

LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation

Add code
Feb 04, 2025
Figure 1 for LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
Figure 2 for LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
Figure 3 for LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
Figure 4 for LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
Viaarxiv icon

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Add code
Feb 03, 2025
Figure 1 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 2 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 3 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 4 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Viaarxiv icon

PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation

Add code
Feb 02, 2025
Figure 1 for PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
Figure 2 for PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
Figure 3 for PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
Figure 4 for PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
Viaarxiv icon

VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework

Add code
Feb 02, 2025
Figure 1 for VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Figure 2 for VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Figure 3 for VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Figure 4 for VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Viaarxiv icon

MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction

Add code
Feb 02, 2025
Figure 1 for MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
Figure 2 for MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
Figure 3 for MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
Figure 4 for MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
Viaarxiv icon

Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation

Add code
Jan 24, 2025
Viaarxiv icon

X-Dyna: Expressive Dynamic Human Image Animation

Add code
Jan 20, 2025
Viaarxiv icon