Picture for Xichen Pan

Xichen Pan

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Add code
Jun 24, 2024
Figure 1 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 2 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 3 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 4 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Viaarxiv icon

Image Sculpting: Precise Object Editing with 3D Geometry Control

Add code
Jan 02, 2024
Viaarxiv icon

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Add code
Oct 04, 2023
Figure 1 for Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Figure 2 for Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Figure 3 for Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Figure 4 for Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Viaarxiv icon

Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation

Add code
Apr 19, 2023
Figure 1 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Figure 2 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Figure 3 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Figure 4 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Viaarxiv icon

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Add code
Nov 20, 2022
Figure 1 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Figure 2 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Figure 3 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Figure 4 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Viaarxiv icon

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition

Add code
Mar 26, 2022
Figure 1 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Figure 2 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Figure 3 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Figure 4 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Viaarxiv icon