Picture for Yiren Song

Yiren Song

UENR-600K: A Large-Scale Physically Grounded Dataset for Nighttime Video Deraining

Add code
Apr 06, 2026
Viaarxiv icon

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Add code
Apr 06, 2026
Viaarxiv icon

Unlocking the Latent Canvas: Eliciting and Benchmarking Symbolic Visual Expression in LLMs

Add code
Mar 15, 2026
Viaarxiv icon

SIGMA: Selective-Interleaved Generation with Multi-Attribute Tokens

Add code
Feb 07, 2026
Viaarxiv icon

Loom: Diffusion-Transformer for Interleaved Generation

Add code
Dec 20, 2025
Figure 1 for Loom: Diffusion-Transformer for Interleaved Generation
Figure 2 for Loom: Diffusion-Transformer for Interleaved Generation
Figure 3 for Loom: Diffusion-Transformer for Interleaved Generation
Figure 4 for Loom: Diffusion-Transformer for Interleaved Generation
Viaarxiv icon

Mitty: Diffusion-based Human-to-Robot Video Generation

Add code
Dec 19, 2025
Viaarxiv icon

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

Add code
Dec 17, 2025
Figure 1 for IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning
Figure 2 for IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning
Figure 3 for IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning
Figure 4 for IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning
Viaarxiv icon

H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos

Add code
Dec 10, 2025
Figure 1 for H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
Figure 2 for H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
Figure 3 for H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
Figure 4 for H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
Viaarxiv icon

OmniPSD: Layered PSD Generation with Diffusion Transformer

Add code
Dec 10, 2025
Viaarxiv icon

MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks

Add code
Jun 06, 2025
Viaarxiv icon