Picture for Andrew Zisserman

Andrew Zisserman

DeepMind

TARA: Simple and Efficient Time Aware Retrieval Adaptation of MLLMs for Video Understanding

Add code
Dec 15, 2025
Viaarxiv icon

Recurrent Video Masked Autoencoders

Add code
Dec 15, 2025
Viaarxiv icon

Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Add code
Dec 10, 2025
Viaarxiv icon

Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing

Add code
Dec 08, 2025
Viaarxiv icon

Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment

Add code
Dec 08, 2025
Viaarxiv icon

Inferring Dynamic Physical Properties from Video Foundation Models

Add code
Oct 02, 2025
Viaarxiv icon

Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening

Add code
Sep 10, 2025
Viaarxiv icon

Open-World Object Counting in Videos

Add code
Jun 18, 2025
Figure 1 for Open-World Object Counting in Videos
Figure 2 for Open-World Object Counting in Videos
Figure 3 for Open-World Object Counting in Videos
Figure 4 for Open-World Object Counting in Videos
Viaarxiv icon

Learning from Streaming Video with Orthogonal Gradients

Add code
Apr 02, 2025
Figure 1 for Learning from Streaming Video with Orthogonal Gradients
Figure 2 for Learning from Streaming Video with Orthogonal Gradients
Figure 3 for Learning from Streaming Video with Orthogonal Gradients
Figure 4 for Learning from Streaming Video with Orthogonal Gradients
Viaarxiv icon

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

Add code
Apr 01, 2025
Viaarxiv icon