Picture for Saining Xie

Saining Xie

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Add code
Mar 03, 2026
Viaarxiv icon

Solaris: Building a Multiplayer Video World Model in Minecraft

Add code
Feb 26, 2026
Viaarxiv icon

Self-Refining Video Sampling

Add code
Jan 26, 2026
Viaarxiv icon

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Add code
Jan 22, 2026
Viaarxiv icon

Transition Matching Distillation for Fast Video Generation

Add code
Jan 14, 2026
Viaarxiv icon

VULCAN: Tool-Augmented Multi Agents for Iterative 3D Object Arrangement

Add code
Dec 26, 2025
Viaarxiv icon

Next-Embedding Prediction Makes Strong Vision Learners

Add code
Dec 23, 2025
Figure 1 for Next-Embedding Prediction Makes Strong Vision Learners
Figure 2 for Next-Embedding Prediction Makes Strong Vision Learners
Figure 3 for Next-Embedding Prediction Makes Strong Vision Learners
Figure 4 for Next-Embedding Prediction Makes Strong Vision Learners
Viaarxiv icon

FrontierCS: Evolving Challenges for Evolving Intelligence

Add code
Dec 17, 2025
Figure 1 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 2 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 3 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 4 for FrontierCS: Evolving Challenges for Evolving Intelligence
Viaarxiv icon

What matters for Representation Alignment: Global Information or Spatial Structure?

Add code
Dec 11, 2025
Viaarxiv icon

CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting

Add code
Nov 07, 2025
Viaarxiv icon