Picture for Yong Jae Lee

Yong Jae Lee

CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems

Add code
Jun 09, 2025
Viaarxiv icon

UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

Add code
May 28, 2025
Viaarxiv icon

Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models

Add code
May 26, 2025
Viaarxiv icon

VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection

Add code
May 26, 2025
Viaarxiv icon

X-Fusion: Introducing New Modality to Frozen Large Language Models

Add code
Apr 29, 2025
Viaarxiv icon

YoChameleon: Personalized Vision and Language Generation

Add code
Apr 29, 2025
Viaarxiv icon

Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features

Add code
Apr 01, 2025
Viaarxiv icon

Do Vision Models Develop Human-Like Progressive Difficulty Understanding?

Add code
Mar 17, 2025
Viaarxiv icon

Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection

Add code
Feb 11, 2025
Viaarxiv icon

LASER: Lip Landmark Assisted Speaker Detection for Robustness

Add code
Jan 21, 2025
Viaarxiv icon