Picture for Hang Xu

Hang Xu

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation

Add code
Oct 14, 2024
Figure 1 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 2 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 3 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 4 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

UNIT: Unifying Image and Text Recognition in One Vision Encoder

Add code
Sep 06, 2024
Figure 1 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Figure 2 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Figure 3 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Figure 4 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Viaarxiv icon

EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation

Add code
Aug 23, 2024
Figure 1 for EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
Figure 2 for EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
Figure 3 for EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
Figure 4 for EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
Viaarxiv icon

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

Add code
Jul 17, 2024
Figure 1 for JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
Figure 2 for JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
Figure 3 for JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
Figure 4 for JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
Viaarxiv icon

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

Add code
Jul 11, 2024
Figure 1 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Figure 2 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Figure 3 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Figure 4 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Viaarxiv icon

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

Add code
Jul 09, 2024
Figure 1 for HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
Figure 2 for HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
Figure 3 for HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
Figure 4 for HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
Viaarxiv icon

Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion

Add code
Jul 03, 2024
Figure 1 for Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
Figure 2 for Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
Figure 3 for Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
Figure 4 for Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
Viaarxiv icon

BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation

Add code
Jun 14, 2024
Figure 1 for BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation
Figure 2 for BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation
Figure 3 for BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation
Figure 4 for BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation
Viaarxiv icon

AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

Add code
Jun 11, 2024
Figure 1 for AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding
Figure 2 for AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding
Figure 3 for AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding
Figure 4 for AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding
Viaarxiv icon