Picture for Sicheng Xu

Sicheng Xu

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

Add code
Jul 03, 2025
Viaarxiv icon

Structured 3D Latents for Scalable and Versatile 3D Generation

Add code
Dec 02, 2024
Figure 1 for Structured 3D Latents for Scalable and Versatile 3D Generation
Figure 2 for Structured 3D Latents for Scalable and Versatile 3D Generation
Figure 3 for Structured 3D Latents for Scalable and Versatile 3D Generation
Figure 4 for Structured 3D Latents for Scalable and Versatile 3D Generation
Viaarxiv icon

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Add code
Nov 29, 2024
Figure 1 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 2 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 3 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 4 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Viaarxiv icon

MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

Add code
Oct 24, 2024
Figure 1 for MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Figure 2 for MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Figure 3 for MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Figure 4 for MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Viaarxiv icon

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Add code
Apr 16, 2024
Figure 1 for VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Figure 2 for VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Figure 3 for VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Figure 4 for VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Viaarxiv icon

AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

Add code
Sep 05, 2023
Figure 1 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections
Figure 2 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections
Figure 3 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections
Figure 4 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections
Viaarxiv icon

RemoteTouch: Enhancing Immersive 3D Video Communication with Hand Touch

Add code
Feb 28, 2023
Figure 1 for RemoteTouch: Enhancing Immersive 3D Video Communication with Hand Touch
Figure 2 for RemoteTouch: Enhancing Immersive 3D Video Communication with Hand Touch
Figure 3 for RemoteTouch: Enhancing Immersive 3D Video Communication with Hand Touch
Figure 4 for RemoteTouch: Enhancing Immersive 3D Video Communication with Hand Touch
Viaarxiv icon

Deep 3D Portrait from a Single Image

Add code
Apr 24, 2020
Figure 1 for Deep 3D Portrait from a Single Image
Figure 2 for Deep 3D Portrait from a Single Image
Figure 3 for Deep 3D Portrait from a Single Image
Figure 4 for Deep 3D Portrait from a Single Image
Viaarxiv icon

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set

Add code
Mar 20, 2019
Figure 1 for Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set
Figure 2 for Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set
Figure 3 for Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set
Figure 4 for Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set
Viaarxiv icon