Picture for Rong-Cheng Tu

Rong-Cheng Tu

Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval

Add code
May 26, 2025
Viaarxiv icon

MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval

Add code
May 26, 2025
Viaarxiv icon

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Add code
May 24, 2025
Viaarxiv icon

T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation

Add code
May 23, 2025
Viaarxiv icon

Robust Distribution Alignment for Industrial Anomaly Detection under Distribution Shift

Add code
Mar 19, 2025
Viaarxiv icon

AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration

Add code
Dec 16, 2024
Viaarxiv icon

Distribution-Consistency-Guided Multi-modal Hashing

Add code
Dec 15, 2024
Viaarxiv icon

SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing

Add code
Nov 28, 2024
Viaarxiv icon

Multi-modal Retrieval Augmented Multi-modal Generation: A Benchmark, Evaluate Metrics and Strong Baselines

Add code
Nov 25, 2024
Viaarxiv icon

Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark

Add code
Nov 23, 2024
Figure 1 for Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
Figure 2 for Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
Figure 3 for Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
Figure 4 for Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
Viaarxiv icon