Picture for Xiaodan Liang

Xiaodan Liang

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation

Add code
Oct 14, 2024
Figure 1 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 2 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 3 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 4 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Viaarxiv icon

Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars

Add code
Oct 11, 2024
Figure 1 for Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
Figure 2 for Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
Figure 3 for Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
Figure 4 for Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
Viaarxiv icon

UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation

Add code
Oct 03, 2024
Figure 1 for UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Figure 2 for UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Figure 3 for UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Figure 4 for UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models

Add code
Sep 11, 2024
Viaarxiv icon

Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task

Add code
Sep 06, 2024
Viaarxiv icon

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Add code
Aug 25, 2024
Viaarxiv icon

EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation

Add code
Aug 23, 2024
Figure 1 for EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
Figure 2 for EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
Figure 3 for EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
Figure 4 for EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
Viaarxiv icon

GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections

Add code
Aug 23, 2024
Figure 1 for GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
Figure 2 for GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
Figure 3 for GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
Figure 4 for GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
Viaarxiv icon

MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval

Add code
Aug 20, 2024
Figure 1 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Figure 2 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Figure 3 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Figure 4 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Viaarxiv icon