Picture for Andrew Zisserman

Andrew Zisserman

DeepMind

TIM: A Time Interval Machine for Audio-Visual Action Recognition

Add code
Apr 09, 2024
Figure 1 for TIM: A Time Interval Machine for Audio-Visual Action Recognition
Figure 2 for TIM: A Time Interval Machine for Audio-Visual Action Recognition
Figure 3 for TIM: A Time Interval Machine for Audio-Visual Action Recognition
Figure 4 for TIM: A Time Interval Machine for Audio-Visual Action Recognition
Viaarxiv icon

FlexCap: Generating Rich, Localized, and Flexible Captions in Images

Add code
Mar 18, 2024
Figure 1 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Figure 2 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Figure 3 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Figure 4 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Viaarxiv icon

N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields

Add code
Mar 16, 2024
Viaarxiv icon

A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval

Add code
Feb 29, 2024
Viaarxiv icon

BootsTAP: Bootstrapped Training for Tracking-Any-Point

Add code
Feb 01, 2024
Figure 1 for BootsTAP: Bootstrapped Training for Tracking-Any-Point
Figure 2 for BootsTAP: Bootstrapped Training for Tracking-Any-Point
Figure 3 for BootsTAP: Bootstrapped Training for Tracking-Any-Point
Figure 4 for BootsTAP: Bootstrapped Training for Tracking-Any-Point
Viaarxiv icon

Synchformer: Efficient Synchronization from Sparse Cues

Add code
Jan 29, 2024
Figure 1 for Synchformer: Efficient Synchronization from Sparse Cues
Figure 2 for Synchformer: Efficient Synchronization from Sparse Cues
Figure 3 for Synchformer: Efficient Synchronization from Sparse Cues
Figure 4 for Synchformer: Efficient Synchronization from Sparse Cues
Viaarxiv icon

Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling

Add code
Jan 22, 2024
Figure 1 for Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling
Figure 2 for Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling
Figure 3 for Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling
Figure 4 for Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling
Viaarxiv icon

The Manga Whisperer: Automatically Generating Transcriptions for Comics

Add code
Jan 18, 2024
Viaarxiv icon

Amodal Ground Truth and Completion in the Wild

Add code
Dec 28, 2023
Viaarxiv icon

Perception Test 2023: A Summary of the First Challenge And Outcome

Add code
Dec 20, 2023
Figure 1 for Perception Test 2023: A Summary of the First Challenge And Outcome
Figure 2 for Perception Test 2023: A Summary of the First Challenge And Outcome
Figure 3 for Perception Test 2023: A Summary of the First Challenge And Outcome
Figure 4 for Perception Test 2023: A Summary of the First Challenge And Outcome
Viaarxiv icon