Picture for Xichen Pan

Xichen Pan

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Add code
May 15, 2025
Viaarxiv icon

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Add code
May 14, 2025
Viaarxiv icon

Transfer between Modalities with MetaQueries

Add code
Apr 08, 2025
Viaarxiv icon

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop

Add code
Mar 12, 2025
Viaarxiv icon

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Add code
Jun 24, 2024
Figure 1 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 2 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 3 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 4 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Viaarxiv icon

Image Sculpting: Precise Object Editing with 3D Geometry Control

Add code
Jan 02, 2024
Viaarxiv icon

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Add code
Oct 04, 2023
Viaarxiv icon

Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation

Add code
Apr 19, 2023
Figure 1 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Figure 2 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Figure 3 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Figure 4 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Viaarxiv icon

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Add code
Nov 20, 2022
Figure 1 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Figure 2 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Figure 3 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Figure 4 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Viaarxiv icon

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition

Add code
Mar 26, 2022
Figure 1 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Figure 2 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Figure 3 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Figure 4 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Viaarxiv icon