Picture for Kristen Grauman

Kristen Grauman

Don't Let the Video Speak: Audio-Contrastive Preference Optimization for Audio-Visual Language Models

Add code
Apr 15, 2026
Viaarxiv icon

ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos

Add code
Apr 12, 2026
Viaarxiv icon

UniversalVTG: A Universal and Lightweight Foundation Model for Video Temporal Grounding

Add code
Apr 09, 2026
Viaarxiv icon

SportSkills: Physical Skill Learning from Sports Instructional Videos

Add code
Mar 26, 2026
Viaarxiv icon

MistExit: Learning to Exit for Early Mistake Detection in Procedural Videos

Add code
Mar 15, 2026
Viaarxiv icon

Human detectors are surprisingly powerful reward models

Add code
Jan 21, 2026
Viaarxiv icon

Audio-Visual Camera Pose Estimation with Passive Scene Sounds and In-the-Wild Video

Add code
Dec 16, 2025
Viaarxiv icon

Learning Skill-Attributes for Transferable Assessment in Video

Add code
Nov 17, 2025
Viaarxiv icon

HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

Add code
Oct 27, 2025
Figure 1 for HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
Figure 2 for HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
Figure 3 for HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
Figure 4 for HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
Viaarxiv icon

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Add code
Apr 17, 2025
Figure 1 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 2 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 3 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 4 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Viaarxiv icon