Picture for Yiren Song

Yiren Song

MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks

Add code
Jun 06, 2025
Viaarxiv icon

EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering

Add code
May 30, 2025
Viaarxiv icon

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

Add code
May 30, 2025
Viaarxiv icon

GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains

Add code
May 24, 2025
Viaarxiv icon

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Add code
May 24, 2025
Viaarxiv icon

FocusedAD: Character-centric Movie Audio Description

Add code
Apr 16, 2025
Viaarxiv icon

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Add code
Mar 10, 2025
Viaarxiv icon

PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data

Add code
Feb 23, 2025
Viaarxiv icon

MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation

Add code
Feb 03, 2025
Figure 1 for MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation
Figure 2 for MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation
Figure 3 for MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation
Figure 4 for MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation
Viaarxiv icon

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Add code
Feb 03, 2025
Viaarxiv icon