Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joseph Roth

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

Jan 05, 2019

Joseph Roth, Sourish Chaudhuri, Ondrej Klejch, Radhika Marvin, Andrew Gallagher, Liat Kaver, Sharadh Ramaswamy, Arkadiusz Stopczynski, Cordelia Schmid, Zhonghua Xi(+1 more)

Figure 1 for AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

Figure 2 for AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

Figure 3 for AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

Figure 4 for AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

Abstract:Active speaker detection is an important component in video analysis algorithms for applications such as speaker diarization, video re-targeting for meetings, speech enhancement, and human-robot interaction. The absence of a large, carefully labeled audio-visual dataset for this task has constrained algorithm evaluations with respect to data diversity, environments, and accuracy. This has made comparisons and improvements difficult. In this paper, we present the AVA Active Speaker detection dataset (AVA-ActiveSpeaker) that will be released publicly to facilitate algorithm development and enable comparisons. The dataset contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible. This dataset contains about 3.65 million human labeled frames or about 38.5 hours of face tracks, and the corresponding audio. We also present a new audio-visual approach for active speaker detection, and analyze its performance, demonstrating both its strength and the contributions of the dataset.

Via

Access Paper or Ask Questions

Modeling Uncertainty with Hedged Instance Embedding

Oct 19, 2018

Seong Joon Oh, Kevin Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew Gallagher

Figure 1 for Modeling Uncertainty with Hedged Instance Embedding

Figure 2 for Modeling Uncertainty with Hedged Instance Embedding

Figure 3 for Modeling Uncertainty with Hedged Instance Embedding

Figure 4 for Modeling Uncertainty with Hedged Instance Embedding

Abstract:Instance embeddings are an efficient and versatile image representation that facilitates applications like recognition, verification, retrieval, and clustering. Many metric learning methods represent the input as a single point in the embedding space. Often the distance between points is used as a proxy for match confidence. However, this can fail to represent uncertainty arising when the input is ambiguous, e.g., due to occlusion or blurriness. This work addresses this issue and explicitly models the uncertainty by hedging the location of each input in the embedding space. We introduce the hedged instance embedding (HIB) in which embeddings are modeled as random variables and the model is trained under the variational information bottleneck principle. Empirical results on our new N-digit MNIST dataset show that our method leads to the desired behavior of hedging its bets across the embedding space upon encountering ambiguous inputs. This results in improved performance for image matching and classification tasks, more structure in the learned embedding space, and an ability to compute a per-exemplar uncertainty measure that is correlated with downstream performance.

* 15 pages, 10 figures

Via

Access Paper or Ask Questions