Picture for Yi Xin

Yi Xin

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Add code
Mar 10, 2026
Viaarxiv icon

Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

Add code
Feb 27, 2026
Viaarxiv icon

Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding

Add code
Feb 13, 2026
Viaarxiv icon

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Add code
Feb 13, 2026
Viaarxiv icon

Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design

Add code
Feb 04, 2026
Viaarxiv icon

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Add code
Feb 02, 2026
Viaarxiv icon

Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models

Add code
Feb 02, 2026
Viaarxiv icon

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Add code
Dec 25, 2025
Figure 1 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 2 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 3 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 4 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Viaarxiv icon

dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models

Add code
Dec 22, 2025
Figure 1 for dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models
Figure 2 for dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models
Figure 3 for dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models
Figure 4 for dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models
Viaarxiv icon

From Masks to Worlds: A Hitchhiker's Guide to World Models

Add code
Oct 23, 2025
Viaarxiv icon