Picture for Sicheng Xu

Sicheng Xu

Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

Add code
Oct 24, 2025
Viaarxiv icon

Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

Add code
Jul 31, 2025
Viaarxiv icon

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

Add code
Jul 03, 2025
Figure 1 for MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Figure 2 for MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Figure 3 for MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Figure 4 for MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Viaarxiv icon

Structured 3D Latents for Scalable and Versatile 3D Generation

Add code
Dec 02, 2024
Figure 1 for Structured 3D Latents for Scalable and Versatile 3D Generation
Figure 2 for Structured 3D Latents for Scalable and Versatile 3D Generation
Figure 3 for Structured 3D Latents for Scalable and Versatile 3D Generation
Figure 4 for Structured 3D Latents for Scalable and Versatile 3D Generation
Viaarxiv icon

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Add code
Nov 29, 2024
Figure 1 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 2 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 3 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 4 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Viaarxiv icon

MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

Add code
Oct 24, 2024
Figure 1 for MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Figure 2 for MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Figure 3 for MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Figure 4 for MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Viaarxiv icon

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Add code
Apr 16, 2024
Figure 1 for VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Figure 2 for VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Figure 3 for VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Figure 4 for VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Viaarxiv icon

AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

Add code
Sep 05, 2023
Figure 1 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections
Figure 2 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections
Figure 3 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections
Figure 4 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections
Viaarxiv icon

RemoteTouch: Enhancing Immersive 3D Video Communication with Hand Touch

Add code
Feb 28, 2023
Figure 1 for RemoteTouch: Enhancing Immersive 3D Video Communication with Hand Touch
Figure 2 for RemoteTouch: Enhancing Immersive 3D Video Communication with Hand Touch
Figure 3 for RemoteTouch: Enhancing Immersive 3D Video Communication with Hand Touch
Figure 4 for RemoteTouch: Enhancing Immersive 3D Video Communication with Hand Touch
Viaarxiv icon

Deep 3D Portrait from a Single Image

Add code
Apr 24, 2020
Figure 1 for Deep 3D Portrait from a Single Image
Figure 2 for Deep 3D Portrait from a Single Image
Figure 3 for Deep 3D Portrait from a Single Image
Figure 4 for Deep 3D Portrait from a Single Image
Viaarxiv icon