Picture for Xinlei Chen

Xinlei Chen

Gaussian Masked Autoencoders

Add code
Jan 06, 2025
Figure 1 for Gaussian Masked Autoencoders
Figure 2 for Gaussian Masked Autoencoders
Figure 3 for Gaussian Masked Autoencoders
Figure 4 for Gaussian Masked Autoencoders
Viaarxiv icon

MR-COGraphs: Communication-efficient Multi-Robot Open-vocabulary Mapping System via 3D Scene Graphs

Add code
Dec 24, 2024
Viaarxiv icon

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Add code
Dec 18, 2024
Figure 1 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 2 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 3 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 4 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Viaarxiv icon

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

Add code
Nov 14, 2024
Figure 1 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 2 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 3 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 4 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Viaarxiv icon

SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration

Add code
Nov 09, 2024
Figure 1 for SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration
Figure 2 for SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration
Figure 3 for SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration
Figure 4 for SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration
Viaarxiv icon

Learning Video Representations without Natural Videos

Add code
Oct 31, 2024
Figure 1 for Learning Video Representations without Natural Videos
Figure 2 for Learning Video Representations without Natural Videos
Figure 3 for Learning Video Representations without Natural Videos
Figure 4 for Learning Video Representations without Natural Videos
Viaarxiv icon

EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment

Add code
Oct 12, 2024
Viaarxiv icon

Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers

Add code
Sep 30, 2024
Figure 1 for Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers
Figure 2 for Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers
Figure 3 for Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers
Figure 4 for Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers
Viaarxiv icon

Range-SLAM: Ultra-Wideband-Based Smoke-Resistant Real-Time Localization and Mapping

Add code
Sep 15, 2024
Figure 1 for Range-SLAM: Ultra-Wideband-Based Smoke-Resistant Real-Time Localization and Mapping
Figure 2 for Range-SLAM: Ultra-Wideband-Based Smoke-Resistant Real-Time Localization and Mapping
Figure 3 for Range-SLAM: Ultra-Wideband-Based Smoke-Resistant Real-Time Localization and Mapping
Figure 4 for Range-SLAM: Ultra-Wideband-Based Smoke-Resistant Real-Time Localization and Mapping
Viaarxiv icon

Palantir: Towards Efficient Super Resolution for Ultra-high-definition Live Streaming

Add code
Aug 12, 2024
Viaarxiv icon