Picture for Sunjae Yoon

Sunjae Yoon

Language-Grounded Multi-Domain Image Translation via Semantic Difference Guidance

Add code
Jan 12, 2026
Viaarxiv icon

GranAlign: Granularity-Aware Alignment Framework for Zero-Shot Video Moment Retrieval

Add code
Jan 02, 2026
Viaarxiv icon

Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models

Add code
Dec 11, 2025
Figure 1 for Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models
Figure 2 for Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models
Figure 3 for Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models
Figure 4 for Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models
Viaarxiv icon

Point to Span: Zero-Shot Moment Retrieval for Navigating Unseen Hour-Long Videos

Add code
Dec 11, 2025
Figure 1 for Point to Span: Zero-Shot Moment Retrieval for Navigating Unseen Hour-Long Videos
Figure 2 for Point to Span: Zero-Shot Moment Retrieval for Navigating Unseen Hour-Long Videos
Figure 3 for Point to Span: Zero-Shot Moment Retrieval for Navigating Unseen Hour-Long Videos
Figure 4 for Point to Span: Zero-Shot Moment Retrieval for Navigating Unseen Hour-Long Videos
Viaarxiv icon

ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On

Add code
Mar 26, 2025
Viaarxiv icon

TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation

Add code
Oct 31, 2024
Figure 1 for TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
Figure 2 for TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
Figure 3 for TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
Figure 4 for TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
Viaarxiv icon

FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

Add code
Jul 25, 2024
Viaarxiv icon

FRAG: Frequency Adapting Group for Diffusion Video Editing

Add code
Jun 10, 2024
Figure 1 for FRAG: Frequency Adapting Group for Diffusion Video Editing
Figure 2 for FRAG: Frequency Adapting Group for Diffusion Video Editing
Figure 3 for FRAG: Frequency Adapting Group for Diffusion Video Editing
Figure 4 for FRAG: Frequency Adapting Group for Diffusion Video Editing
Viaarxiv icon

Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing

Add code
Jan 18, 2024
Viaarxiv icon

HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue

Add code
Dec 15, 2023
Figure 1 for HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Figure 2 for HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Figure 3 for HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Figure 4 for HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Viaarxiv icon