Picture for Xinlei Chen

Xinlei Chen

PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models

Add code
Feb 20, 2025
Figure 1 for PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models
Figure 2 for PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models
Figure 3 for PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models
Figure 4 for PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models
Viaarxiv icon

Understanding and Evaluating Hallucinations in 3D Visual Language Models

Add code
Feb 18, 2025
Figure 1 for Understanding and Evaluating Hallucinations in 3D Visual Language Models
Figure 2 for Understanding and Evaluating Hallucinations in 3D Visual Language Models
Figure 3 for Understanding and Evaluating Hallucinations in 3D Visual Language Models
Figure 4 for Understanding and Evaluating Hallucinations in 3D Visual Language Models
Viaarxiv icon

Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression

Add code
Feb 06, 2025
Viaarxiv icon

LLMs can see and hear without any training

Add code
Jan 30, 2025
Figure 1 for LLMs can see and hear without any training
Figure 2 for LLMs can see and hear without any training
Figure 3 for LLMs can see and hear without any training
Figure 4 for LLMs can see and hear without any training
Viaarxiv icon

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Add code
Jan 16, 2025
Figure 1 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Figure 2 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Figure 3 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Figure 4 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Viaarxiv icon

Gaussian Masked Autoencoders

Add code
Jan 06, 2025
Figure 1 for Gaussian Masked Autoencoders
Figure 2 for Gaussian Masked Autoencoders
Figure 3 for Gaussian Masked Autoencoders
Figure 4 for Gaussian Masked Autoencoders
Viaarxiv icon

MR-COGraphs: Communication-efficient Multi-Robot Open-vocabulary Mapping System via 3D Scene Graphs

Add code
Dec 24, 2024
Viaarxiv icon

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Add code
Dec 18, 2024
Figure 1 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 2 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 3 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 4 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Viaarxiv icon

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

Add code
Nov 14, 2024
Figure 1 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 2 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 3 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 4 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Viaarxiv icon

SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration

Add code
Nov 09, 2024
Figure 1 for SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration
Figure 2 for SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration
Figure 3 for SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration
Figure 4 for SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration
Viaarxiv icon