Picture for Jiaolong Yang

Jiaolong Yang

ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting

Add code
Jan 05, 2026
Viaarxiv icon

From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs

Add code
Dec 22, 2025
Figure 1 for From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
Figure 2 for From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
Figure 3 for From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
Figure 4 for From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
Viaarxiv icon

VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image

Add code
Dec 16, 2025
Viaarxiv icon

Native and Compact Structured Latents for 3D Generation

Add code
Dec 16, 2025
Viaarxiv icon

Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

Add code
Oct 24, 2025
Figure 1 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 2 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 3 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 4 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Viaarxiv icon

Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

Add code
Jul 31, 2025
Viaarxiv icon

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

Add code
Jul 03, 2025
Figure 1 for MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Figure 2 for MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Figure 3 for MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Figure 4 for MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Viaarxiv icon

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Add code
Dec 03, 2024
Figure 1 for UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Figure 2 for UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Figure 3 for UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Figure 4 for UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Viaarxiv icon

Structured 3D Latents for Scalable and Versatile 3D Generation

Add code
Dec 02, 2024
Figure 1 for Structured 3D Latents for Scalable and Versatile 3D Generation
Figure 2 for Structured 3D Latents for Scalable and Versatile 3D Generation
Figure 3 for Structured 3D Latents for Scalable and Versatile 3D Generation
Figure 4 for Structured 3D Latents for Scalable and Versatile 3D Generation
Viaarxiv icon

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Add code
Nov 29, 2024
Figure 1 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 2 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 3 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 4 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Viaarxiv icon