Picture for Harry Yang

Harry Yang

AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer

Add code
Mar 16, 2026
Viaarxiv icon

Learning Latent Proxies for Controllable Single-Image Relighting

Add code
Mar 16, 2026
Viaarxiv icon

LoopViT: Scaling Visual ARC with Looped Transformers

Add code
Feb 02, 2026
Viaarxiv icon

RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention

Add code
Dec 30, 2025
Viaarxiv icon

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

Add code
Dec 10, 2025
Viaarxiv icon

Distribution Matching Distillation Meets Reinforcement Learning

Add code
Nov 19, 2025
Viaarxiv icon

Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising

Add code
Nov 18, 2025
Viaarxiv icon

Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control

Add code
Aug 12, 2025
Figure 1 for Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Figure 2 for Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Figure 3 for Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Figure 4 for Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Viaarxiv icon

Enhancing Vector Quantization with Distributional Matching: A Theoretical and Empirical Study

Add code
Jun 18, 2025
Viaarxiv icon

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

Add code
Jun 05, 2025
Figure 1 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 2 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 3 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 4 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Viaarxiv icon