Alert button

"speech": models, code, and papers
Alert button

Unsupervised Speech Recognition

May 24, 2021
Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Figure 1 for Unsupervised Speech Recognition
Figure 2 for Unsupervised Speech Recognition
Figure 3 for Unsupervised Speech Recognition
Figure 4 for Unsupervised Speech Recognition
Viaarxiv icon

PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Sep 30, 2021
Yi Ren, Jinglin Liu, Zhou Zhao

Figure 1 for PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Figure 2 for PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Figure 3 for PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Figure 4 for PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Viaarxiv icon

Show Me Your Face, And I'll Tell You How You Speak

Jun 28, 2022
Christen Millerdurai, Lotfy Abdel Khaliq, Timon Ulrich

Figure 1 for Show Me Your Face, And I'll Tell You How You Speak
Figure 2 for Show Me Your Face, And I'll Tell You How You Speak
Figure 3 for Show Me Your Face, And I'll Tell You How You Speak
Figure 4 for Show Me Your Face, And I'll Tell You How You Speak
Viaarxiv icon

Signal inpainting from Fourier magnitudes

Oct 28, 2022
Louis Bahrman, Marina Krémé, Paul Magron, Antoine Deleforge

Figure 1 for Signal inpainting from Fourier magnitudes
Figure 2 for Signal inpainting from Fourier magnitudes
Figure 3 for Signal inpainting from Fourier magnitudes
Figure 4 for Signal inpainting from Fourier magnitudes
Viaarxiv icon

One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement

Oct 20, 2021
Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang

Figure 1 for One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement
Figure 2 for One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement
Figure 3 for One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement
Figure 4 for One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement
Viaarxiv icon

VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition

Feb 22, 2022
Jinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas

Figure 1 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Figure 2 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Figure 3 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Figure 4 for VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Viaarxiv icon

Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement

Aug 27, 2021
Yuzi Yan, Wei-Qiang Zhang, Michael T. Johnson

Figure 1 for Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement
Figure 2 for Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement
Figure 3 for Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement
Figure 4 for Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement
Viaarxiv icon

Zero-shot Speech Translation

Jul 13, 2021
Tu Anh Dinh

Figure 1 for Zero-shot Speech Translation
Figure 2 for Zero-shot Speech Translation
Figure 3 for Zero-shot Speech Translation
Figure 4 for Zero-shot Speech Translation
Viaarxiv icon

Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features

Dec 10, 2021
Yicheng Hsu, Yonghan Lee, Mingsian R. Bai

Figure 1 for Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features
Figure 2 for Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features
Figure 3 for Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features
Figure 4 for Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features
Viaarxiv icon

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

Sep 22, 2022
Haohan Guo, Fenglong Xie, Frank K. Soong, Xixin Wu, Helen Meng

Figure 1 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 2 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 3 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 4 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Viaarxiv icon