Picture for Yi Xin

Yi Xin

Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

Add code
Feb 27, 2026
Viaarxiv icon

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Add code
Feb 13, 2026
Viaarxiv icon

Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding

Add code
Feb 13, 2026
Viaarxiv icon

Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design

Add code
Feb 04, 2026
Viaarxiv icon

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Add code
Feb 02, 2026
Viaarxiv icon

Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models

Add code
Feb 02, 2026
Viaarxiv icon

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Add code
Dec 25, 2025
Figure 1 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 2 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 3 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 4 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Viaarxiv icon

dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models

Add code
Dec 22, 2025
Figure 1 for dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models
Figure 2 for dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models
Figure 3 for dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models
Figure 4 for dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models
Viaarxiv icon

From Masks to Worlds: A Hitchhiker's Guide to World Models

Add code
Oct 23, 2025
Viaarxiv icon

LayerT2V: Interactive Multi-Object Trajectory Layering for Video Generation

Add code
Aug 06, 2025
Viaarxiv icon