Picture for Yiren Song

Yiren Song

Loom: Diffusion-Transformer for Interleaved Generation

Add code
Dec 20, 2025
Viaarxiv icon

Mitty: Diffusion-based Human-to-Robot Video Generation

Add code
Dec 19, 2025
Viaarxiv icon

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

Add code
Dec 17, 2025
Viaarxiv icon

H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos

Add code
Dec 10, 2025
Viaarxiv icon

OmniPSD: Layered PSD Generation with Diffusion Transformer

Add code
Dec 10, 2025
Viaarxiv icon

MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks

Add code
Jun 06, 2025
Viaarxiv icon

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

Add code
May 30, 2025
Viaarxiv icon

EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering

Add code
May 30, 2025
Figure 1 for EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering
Figure 2 for EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering
Figure 3 for EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering
Figure 4 for EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering
Viaarxiv icon

GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains

Add code
May 24, 2025
Viaarxiv icon

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Add code
May 24, 2025
Figure 1 for OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Figure 2 for OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Figure 3 for OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Figure 4 for OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Viaarxiv icon