Picture for Yuang Peng

Yuang Peng

DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning

Add code
Jun 11, 2025
Viaarxiv icon

Step1X-Edit: A Practical Framework for General Image Editing

Add code
Apr 24, 2025
Viaarxiv icon

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

Add code
Apr 10, 2025
Viaarxiv icon

Perception in Reflection

Add code
Apr 09, 2025
Figure 1 for Perception in Reflection
Figure 2 for Perception in Reflection
Figure 3 for Perception in Reflection
Figure 4 for Perception in Reflection
Viaarxiv icon

Taming Teacher Forcing for Masked Autoregressive Video Generation

Add code
Jan 21, 2025
Viaarxiv icon

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Add code
Sep 03, 2024
Figure 1 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Figure 2 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Figure 3 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Figure 4 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Viaarxiv icon

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Add code
Jun 24, 2024
Figure 1 for DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Figure 2 for DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Figure 3 for DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Figure 4 for DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Viaarxiv icon

DreamLLM: Synergistic Multimodal Comprehension and Creation

Add code
Sep 20, 2023
Viaarxiv icon

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

Add code
Jul 18, 2023
Viaarxiv icon

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

Add code
Mar 13, 2023
Figure 1 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Figure 2 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Figure 3 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Figure 4 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Viaarxiv icon