Picture for Shengbang Tong

Shengbang Tong

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Add code
Sep 04, 2024
Viaarxiv icon

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Add code
Jun 24, 2024
Figure 1 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 2 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 3 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 4 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Viaarxiv icon

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Add code
May 17, 2024
Figure 1 for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Figure 2 for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Figure 3 for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Figure 4 for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Viaarxiv icon

Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription

Add code
Mar 16, 2024
Figure 1 for Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription
Figure 2 for Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription
Figure 3 for Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription
Figure 4 for Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription
Viaarxiv icon

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Add code
Jan 11, 2024
Figure 1 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Figure 2 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Figure 3 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Figure 4 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Viaarxiv icon

White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?

Add code
Nov 24, 2023
Viaarxiv icon

Investigating the Catastrophic Forgetting in Multimodal Large Language Models

Add code
Sep 26, 2023
Figure 1 for Investigating the Catastrophic Forgetting in Multimodal Large Language Models
Figure 2 for Investigating the Catastrophic Forgetting in Multimodal Large Language Models
Figure 3 for Investigating the Catastrophic Forgetting in Multimodal Large Language Models
Figure 4 for Investigating the Catastrophic Forgetting in Multimodal Large Language Models
Viaarxiv icon

Emergence of Segmentation with Minimalistic White-Box Transformers

Add code
Aug 30, 2023
Figure 1 for Emergence of Segmentation with Minimalistic White-Box Transformers
Figure 2 for Emergence of Segmentation with Minimalistic White-Box Transformers
Figure 3 for Emergence of Segmentation with Minimalistic White-Box Transformers
Figure 4 for Emergence of Segmentation with Minimalistic White-Box Transformers
Viaarxiv icon

Mass-Producing Failures of Multimodal Systems with Language Models

Add code
Jun 21, 2023
Figure 1 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 2 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 3 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 4 for Mass-Producing Failures of Multimodal Systems with Language Models
Viaarxiv icon

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Add code
Jun 09, 2023
Figure 1 for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models
Figure 2 for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models
Figure 3 for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models
Figure 4 for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models
Viaarxiv icon