Picture for Tong He

Tong He

DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild

Add code
Nov 20, 2024
Figure 1 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Figure 2 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Figure 3 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Figure 4 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Viaarxiv icon

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

Add code
Oct 31, 2024
Figure 1 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Figure 2 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Figure 3 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Figure 4 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Viaarxiv icon

EMMA: End-to-End Multimodal Model for Autonomous Driving

Add code
Oct 30, 2024
Figure 1 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Figure 2 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Figure 3 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Figure 4 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Viaarxiv icon

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction

Add code
Oct 24, 2024
Figure 1 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Figure 2 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Figure 3 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Figure 4 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Viaarxiv icon

Depth Any Video with Scalable Synthetic Data

Add code
Oct 14, 2024
Figure 1 for Depth Any Video with Scalable Synthetic Data
Figure 2 for Depth Any Video with Scalable Synthetic Data
Figure 3 for Depth Any Video with Scalable Synthetic Data
Figure 4 for Depth Any Video with Scalable Synthetic Data
Viaarxiv icon

VideoSAM: Open-World Video Segmentation

Add code
Oct 11, 2024
Figure 1 for VideoSAM: Open-World Video Segmentation
Figure 2 for VideoSAM: Open-World Video Segmentation
Figure 3 for VideoSAM: Open-World Video Segmentation
Figure 4 for VideoSAM: Open-World Video Segmentation
Viaarxiv icon

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Add code
Oct 10, 2024
Figure 1 for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Figure 2 for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Figure 3 for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Figure 4 for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Viaarxiv icon

StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting

Add code
Oct 06, 2024
Figure 1 for StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting
Figure 2 for StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting
Figure 3 for StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting
Figure 4 for StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting
Viaarxiv icon

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Add code
Sep 29, 2024
Figure 1 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 2 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 3 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 4 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Viaarxiv icon

GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction

Add code
Sep 10, 2024
Viaarxiv icon