Picture for Tong He

Tong He

Sparse Autoencoders, Again?

Add code
Jun 06, 2025
Viaarxiv icon

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation

Add code
May 30, 2025
Figure 1 for S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation
Figure 2 for S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation
Figure 3 for S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation
Figure 4 for S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation
Viaarxiv icon

CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning

Add code
May 22, 2025
Viaarxiv icon

Aether: Geometric-Aware Unified World Modeling

Add code
Mar 25, 2025
Viaarxiv icon

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

Add code
Mar 07, 2025
Viaarxiv icon

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Add code
Feb 25, 2025
Viaarxiv icon

Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds

Add code
Feb 09, 2025
Viaarxiv icon

CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning for Large Scene Reconstruction

Add code
Dec 23, 2024
Viaarxiv icon

Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data

Add code
Dec 19, 2024
Figure 1 for Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data
Figure 2 for Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data
Figure 3 for Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data
Figure 4 for Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Figure 1 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 2 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 3 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 4 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Viaarxiv icon