Picture for Saining Xie

Saining Xie

Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs

Add code
May 21, 2025
Viaarxiv icon

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Add code
May 15, 2025
Viaarxiv icon

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Add code
May 14, 2025
Viaarxiv icon

Science-T2I: Addressing Scientific Illusions in Image Synthesis

Add code
Apr 17, 2025
Viaarxiv icon

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Add code
Apr 14, 2025
Viaarxiv icon

Transfer between Modalities with MetaQueries

Add code
Apr 08, 2025
Viaarxiv icon

Scaling Language-Free Visual Representation Learning

Add code
Apr 01, 2025
Viaarxiv icon

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop

Add code
Mar 12, 2025
Viaarxiv icon

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Add code
Jan 28, 2025
Figure 1 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Figure 2 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Figure 3 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Figure 4 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Viaarxiv icon

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Add code
Jan 16, 2025
Viaarxiv icon