Alert button

"speech": models, code, and papers
Alert button

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Oct 10, 2023
Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Figure 1 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 2 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 3 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 4 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Viaarxiv icon

A Novel Scheme to classify Read and Spontaneous Speech

Jun 13, 2023
Sunil Kumar Kopparapu

Figure 1 for A Novel Scheme to classify Read and Spontaneous Speech
Figure 2 for A Novel Scheme to classify Read and Spontaneous Speech
Figure 3 for A Novel Scheme to classify Read and Spontaneous Speech
Figure 4 for A Novel Scheme to classify Read and Spontaneous Speech
Viaarxiv icon

A comparative study of Grid and Natural sentences effects on Normal-to-Lombard conversion

Sep 19, 2023
Hongyang Chen, Yuhong Yang, Qingmu Liu, Baifeng Li, Weiping Tu, Song Lin

Figure 1 for A comparative study of Grid and Natural sentences effects on Normal-to-Lombard conversion
Figure 2 for A comparative study of Grid and Natural sentences effects on Normal-to-Lombard conversion
Figure 3 for A comparative study of Grid and Natural sentences effects on Normal-to-Lombard conversion
Viaarxiv icon

Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

Add code
Bookmark button
Alert button
Jun 07, 2023
Claytone Sikasote, Kalinda Siaminwe, Stanly Mwape, Bangiwe Zulu, Mofya Phiri, Martin Phiri, David Zulu, Mayumbo Nyirenda, Antonios Anastasopoulos

Figure 1 for Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Figure 2 for Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Figure 3 for Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Figure 4 for Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Viaarxiv icon

Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features

Jun 11, 2023
Hsin-Hao Chen, Yung-Lun Chien, Ming-Chi Yen, Shu-Wei Tsai, Yu Tsao, Tai-shih Chi, Hsin-Min Wang

Figure 1 for Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features
Figure 2 for Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features
Figure 3 for Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features
Viaarxiv icon

Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation

Add code
Bookmark button
Alert button
Jun 07, 2023
Massa Baali, Ibrahim Almakky, Shady Shehata, Fakhri Karray

Figure 1 for Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation
Figure 2 for Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation
Figure 3 for Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation
Figure 4 for Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation
Viaarxiv icon

BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion

Add code
Bookmark button
Alert button
Jun 05, 2023
Ahana Deb, Sayan Nag, Ayan Mahapatra, Soumitri Chattopadhyay, Aritra Marik, Pijush Kanti Gayen, Shankha Sanyal, Archi Banerjee, Samir Karmakar

Figure 1 for BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion
Figure 2 for BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion
Figure 3 for BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion
Figure 4 for BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion
Viaarxiv icon

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Add code
Bookmark button
Alert button
Jun 13, 2023
Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani

Figure 1 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Figure 2 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Figure 3 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Figure 4 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Viaarxiv icon

ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models

Add code
Bookmark button
Alert button
May 23, 2023
Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang

Figure 1 for ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Figure 2 for ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Figure 3 for ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Figure 4 for ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Viaarxiv icon

SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

Add code
Bookmark button
Alert button
Sep 12, 2023
Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Nicholas Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier

Figure 1 for SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
Figure 2 for SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
Figure 3 for SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
Figure 4 for SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
Viaarxiv icon