Picture for Andrew Zisserman

Andrew Zisserman

DeepMind

Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime

Add code
Mar 30, 2023
Figure 1 for Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime
Figure 2 for Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime
Figure 3 for Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime
Figure 4 for Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime
Viaarxiv icon

AutoAD: Movie Description in Context

Add code
Mar 29, 2023
Viaarxiv icon

Three ways to improve feature alignment for open vocabulary detection

Add code
Mar 23, 2023
Figure 1 for Three ways to improve feature alignment for open vocabulary detection
Figure 2 for Three ways to improve feature alignment for open vocabulary detection
Figure 3 for Three ways to improve feature alignment for open vocabulary detection
Figure 4 for Three ways to improve feature alignment for open vocabulary detection
Viaarxiv icon

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge

Add code
Mar 06, 2023
Viaarxiv icon

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Add code
Mar 01, 2023
Figure 1 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Figure 2 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Figure 3 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Figure 4 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Viaarxiv icon

Epic-Sounds: A Large-scale Dataset of Actions That Sound

Add code
Feb 01, 2023
Figure 1 for Epic-Sounds: A Large-scale Dataset of Actions That Sound
Figure 2 for Epic-Sounds: A Large-scale Dataset of Actions That Sound
Figure 3 for Epic-Sounds: A Large-scale Dataset of Actions That Sound
Figure 4 for Epic-Sounds: A Large-scale Dataset of Actions That Sound
Viaarxiv icon

Zorro: the masked multimodal transformer

Add code
Jan 23, 2023
Figure 1 for Zorro: the masked multimodal transformer
Figure 2 for Zorro: the masked multimodal transformer
Figure 3 for Zorro: the masked multimodal transformer
Figure 4 for Zorro: the masked multimodal transformer
Viaarxiv icon

A Light Touch Approach to Teaching Transformers Multi-view Geometry

Add code
Nov 28, 2022
Viaarxiv icon

Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

Add code
Nov 16, 2022
Figure 1 for Weakly-supervised Fingerspelling Recognition in British Sign Language Videos
Figure 2 for Weakly-supervised Fingerspelling Recognition in British Sign Language Videos
Figure 3 for Weakly-supervised Fingerspelling Recognition in British Sign Language Videos
Figure 4 for Weakly-supervised Fingerspelling Recognition in British Sign Language Videos
Viaarxiv icon

TAP-Vid: A Benchmark for Tracking Any Point in a Video

Add code
Nov 07, 2022
Viaarxiv icon