Picture for Peize Sun

Peize Sun

Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

Add code
Apr 24, 2025
Viaarxiv icon

Perception Encoder: The best visual embeddings are not at the output of the network

Add code
Apr 17, 2025
Viaarxiv icon

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Add code
Apr 17, 2025
Viaarxiv icon

PixelFlow: Pixel-Space Generative Models with Flow

Add code
Apr 10, 2025
Viaarxiv icon

Goku: Flow Based Video Generative Foundation Models

Add code
Feb 10, 2025
Viaarxiv icon

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Add code
Feb 07, 2025
Viaarxiv icon

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Add code
Dec 19, 2024
Viaarxiv icon

Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment

Add code
Oct 12, 2024
Figure 1 for Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
Figure 2 for Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
Figure 3 for Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
Figure 4 for Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
Viaarxiv icon

ControlAR: Controllable Image Generation with Autoregressive Models

Add code
Oct 03, 2024
Figure 1 for ControlAR: Controllable Image Generation with Autoregressive Models
Figure 2 for ControlAR: Controllable Image Generation with Autoregressive Models
Figure 3 for ControlAR: Controllable Image Generation with Autoregressive Models
Figure 4 for ControlAR: Controllable Image Generation with Autoregressive Models
Viaarxiv icon

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

Add code
Jul 10, 2024
Figure 1 for IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Figure 2 for IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Figure 3 for IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Figure 4 for IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Viaarxiv icon