Picture for Quan Sun

Quan Sun

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Add code
Feb 11, 2026
Viaarxiv icon

GEBench: Benchmarking Image Generation Models as GUI Environments

Add code
Feb 09, 2026
Viaarxiv icon

STEP3-VL-10B Technical Report

Add code
Jan 15, 2026
Viaarxiv icon

The FM Agent

Add code
Oct 30, 2025
Viaarxiv icon

Step1X-Edit: A Practical Framework for General Image Editing

Add code
Apr 24, 2025
Viaarxiv icon

Taming Teacher Forcing for Masked Autoregressive Video Generation

Add code
Jan 21, 2025
Figure 1 for Taming Teacher Forcing for Masked Autoregressive Video Generation
Figure 2 for Taming Teacher Forcing for Masked Autoregressive Video Generation
Figure 3 for Taming Teacher Forcing for Masked Autoregressive Video Generation
Figure 4 for Taming Teacher Forcing for Masked Autoregressive Video Generation
Viaarxiv icon

Emu3: Next-Token Prediction is All You Need

Add code
Sep 27, 2024
Figure 1 for Emu3: Next-Token Prediction is All You Need
Figure 2 for Emu3: Next-Token Prediction is All You Need
Figure 3 for Emu3: Next-Token Prediction is All You Need
Figure 4 for Emu3: Next-Token Prediction is All You Need
Viaarxiv icon

Diffusion Feedback Helps CLIP See Better

Add code
Jul 29, 2024
Viaarxiv icon

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Add code
Feb 06, 2024
Figure 1 for EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Figure 2 for EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Figure 3 for EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Figure 4 for EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Viaarxiv icon

Generative Multimodal Models are In-Context Learners

Add code
Dec 20, 2023
Viaarxiv icon