Picture for Andrew Zisserman

Andrew Zisserman

DeepMind

TAPVid-3D: A Benchmark for Tracking Any Point in 3D

Add code
Jul 08, 2024
Viaarxiv icon

CountGD: Multi-Modal Open-World Counting

Add code
Jul 05, 2024
Viaarxiv icon

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Add code
Jun 09, 2024
Figure 1 for Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Figure 2 for Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Figure 3 for Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Figure 4 for Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Viaarxiv icon

A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision

Add code
May 16, 2024
Figure 1 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 2 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 3 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 4 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Viaarxiv icon

Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

Add code
Apr 25, 2024
Figure 1 for Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
Figure 2 for Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
Figure 3 for Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
Figure 4 for Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
Viaarxiv icon

AutoAD III: The Prequel -- Back to the Pixels

Add code
Apr 22, 2024
Viaarxiv icon

Moving Object Segmentation: All You Need Is SAM

Add code
Apr 18, 2024
Figure 1 for Moving Object Segmentation: All You Need Is SAM
Figure 2 for Moving Object Segmentation: All You Need Is SAM
Figure 3 for Moving Object Segmentation: All You Need Is SAM
Figure 4 for Moving Object Segmentation: All You Need Is SAM
Viaarxiv icon

TIM: A Time Interval Machine for Audio-Visual Action Recognition

Add code
Apr 09, 2024
Viaarxiv icon

FlexCap: Generating Rich, Localized, and Flexible Captions in Images

Add code
Mar 18, 2024
Figure 1 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Figure 2 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Figure 3 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Figure 4 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Viaarxiv icon

N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields

Add code
Mar 16, 2024
Figure 1 for N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
Figure 2 for N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
Figure 3 for N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
Figure 4 for N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
Viaarxiv icon