Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Spot the conversation: speaker diarisation in the wild

Jul 02, 2020

Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman

Figure 1 for Spot the conversation: speaker diarisation in the wild

Figure 2 for Spot the conversation: speaker diarisation in the wild

Figure 3 for Spot the conversation: speaker diarisation in the wild

Figure 4 for Spot the conversation: speaker diarisation in the wild

Share this with someone who'll enjoy it:

Abstract:The goal of this paper is speaker diarisation of videos collected 'in the wild'. We make three key contributions. First, we propose an automatic audio-visual diarisation method for YouTube videos. Our method consists of active speaker detection using audio-visual methods and speaker verification using self-enrolled speaker models. Second, we integrate our method into a semi-automatic dataset creation pipeline which significantly reduces the number of hours required to annotate videos with diarisation labels. Finally, we use this pipeline to create a large-scale diarisation dataset called VoxConverse, collected from 'in the wild' videos, which we will release publicly to the research community. Our dataset consists of overlapping speech, a large and diverse speaker pool, and challenging background conditions.

* The dataset will be available for download from http://www.robots.ox.ac.uk/~vgg/data/voxceleb/voxconverse.html . The development set will be released in July 2020, and the test set will be released in October 2020

View paper on

Share this with someone who'll enjoy it:

Title:Spot the conversation: speaker diarisation in the wild

Paper and Code