Picture for Yale Song

Yale Song

Don't Pause! Every prediction matters in a streaming video

Add code
Apr 27, 2026
Viaarxiv icon

Co-Director: Agentic Generative Video Storytelling

Add code
Apr 27, 2026
Viaarxiv icon

CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding

Add code
Apr 15, 2026
Viaarxiv icon

PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing

Add code
Apr 06, 2026
Viaarxiv icon

GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks

Add code
Mar 26, 2026
Viaarxiv icon

VQQA: An Agentic Approach for Video Evaluation and Quality Improvement

Add code
Mar 12, 2026
Viaarxiv icon

PaperBanana: Automating Academic Illustration for AI Scientists

Add code
Jan 30, 2026
Viaarxiv icon

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Add code
Apr 17, 2025
Figure 1 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 2 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 3 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 4 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Viaarxiv icon

VITED: Video Temporal Evidence Distillation

Add code
Mar 17, 2025
Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Nov 30, 2023
Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Viaarxiv icon