Picture for Chao Wang

Chao Wang

Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios

Add code
Feb 27, 2025
Viaarxiv icon

X-Dancer: Expressive Music to Human Dance Video Generation

Add code
Feb 24, 2025
Viaarxiv icon

Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark

Add code
Feb 08, 2025
Figure 1 for Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark
Figure 2 for Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark
Figure 3 for Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark
Figure 4 for Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark
Viaarxiv icon

Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models

Add code
Feb 06, 2025
Figure 1 for Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models
Figure 2 for Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models
Figure 3 for Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models
Figure 4 for Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models
Viaarxiv icon

LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation

Add code
Feb 04, 2025
Figure 1 for LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
Figure 2 for LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
Figure 3 for LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
Figure 4 for LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
Viaarxiv icon

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Add code
Feb 03, 2025
Figure 1 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 2 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 3 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 4 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Viaarxiv icon

PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation

Add code
Feb 02, 2025
Figure 1 for PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
Figure 2 for PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
Figure 3 for PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
Figure 4 for PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
Viaarxiv icon

MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction

Add code
Feb 02, 2025
Figure 1 for MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
Figure 2 for MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
Figure 3 for MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
Figure 4 for MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
Viaarxiv icon

VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework

Add code
Feb 02, 2025
Figure 1 for VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Figure 2 for VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Figure 3 for VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Figure 4 for VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Viaarxiv icon

Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation

Add code
Jan 24, 2025
Viaarxiv icon