Alert button

"speech": models, code, and papers
Alert button

Online Automatic Speech Recognition with Listen, Attend and Spell Model

Aug 12, 2020
Roger Hsiao, Dogan Can, Tim Ng, Ruchir Travadi, Arnab Ghoshal

Figure 1 for Online Automatic Speech Recognition with Listen, Attend and Spell Model
Figure 2 for Online Automatic Speech Recognition with Listen, Attend and Spell Model
Figure 3 for Online Automatic Speech Recognition with Listen, Attend and Spell Model
Figure 4 for Online Automatic Speech Recognition with Listen, Attend and Spell Model
Viaarxiv icon

End-to-End Automatic Speech Translation of Audiobooks

Add code
Bookmark button
Alert button
Feb 12, 2018
Alexandre Bérard, Laurent Besacier, Ali Can Kocabiyikoglu, Olivier Pietquin

Figure 1 for End-to-End Automatic Speech Translation of Audiobooks
Figure 2 for End-to-End Automatic Speech Translation of Audiobooks
Figure 3 for End-to-End Automatic Speech Translation of Audiobooks
Figure 4 for End-to-End Automatic Speech Translation of Audiobooks
Viaarxiv icon

Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation

Jul 03, 2019
Siyuan Feng, Tan Lee

Figure 1 for Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation
Figure 2 for Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation
Figure 3 for Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation
Figure 4 for Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation
Viaarxiv icon

STC speaker recognition systems for the NIST SRE 2021

Add code
Bookmark button
Alert button
Nov 03, 2021
Anastasia Avdeeva, Aleksei Gusev, Igor Korsunov, Alexander Kozlov, Galina Lavrentyeva, Sergey Novoselov, Timur Pekhovsky, Andrey Shulipa, Alisa Vinogradova, Vladimir Volokhov, Evgeny Smirnov, Vasily Galyuk

Figure 1 for STC speaker recognition systems for the NIST SRE 2021
Figure 2 for STC speaker recognition systems for the NIST SRE 2021
Figure 3 for STC speaker recognition systems for the NIST SRE 2021
Figure 4 for STC speaker recognition systems for the NIST SRE 2021
Viaarxiv icon

Towards a Real-time Measure of the Perception of Anthropomorphism in Human-robot Interaction

Jan 24, 2022
Maria Tsfasman, Avinash Saravanan, Dekel Viner, Daan Goslinga, Sarah de Wolf, Chirag Raman, Catholijn M. Jonker, Catharine Oertel

Figure 1 for Towards a Real-time Measure of the Perception of Anthropomorphism in Human-robot Interaction
Figure 2 for Towards a Real-time Measure of the Perception of Anthropomorphism in Human-robot Interaction
Figure 3 for Towards a Real-time Measure of the Perception of Anthropomorphism in Human-robot Interaction
Figure 4 for Towards a Real-time Measure of the Perception of Anthropomorphism in Human-robot Interaction
Viaarxiv icon

Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

Add code
Bookmark button
Alert button
Jan 24, 2022
Yaoming Zhu, Liwei Wu, Shanbo Cheng, Mingxuan Wang

Figure 1 for Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus
Figure 2 for Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus
Figure 3 for Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus
Figure 4 for Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus
Viaarxiv icon

End-to-End Text-to-Speech using Latent Duration based on VQ-VAE

Add code
Bookmark button
Alert button
Oct 20, 2020
Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Figure 1 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Figure 2 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Figure 3 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Figure 4 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Viaarxiv icon

TEET! Tunisian Dataset for Toxic Speech Detection

Oct 11, 2021
Slim Gharbi, Heger Arfaoui, Hatem Haddad, Mayssa Kchaou

Figure 1 for TEET! Tunisian Dataset for Toxic Speech Detection
Figure 2 for TEET! Tunisian Dataset for Toxic Speech Detection
Figure 3 for TEET! Tunisian Dataset for Toxic Speech Detection
Figure 4 for TEET! Tunisian Dataset for Toxic Speech Detection
Viaarxiv icon

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Add code
Bookmark button
Alert button
Nov 05, 2018
Ye Jia, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen, Fei Ren, Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu

Figure 1 for Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Figure 2 for Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Figure 3 for Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Figure 4 for Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Viaarxiv icon

Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications

Oct 29, 2020
Yongqiang Wang, Yangyang Shi, Frank Zhang, Chunyang Wu, Julian Chan, Ching-Feng Yeh, Alex Xiao

Figure 1 for Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications
Figure 2 for Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications
Figure 3 for Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications
Figure 4 for Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications
Viaarxiv icon