Alert button

"speech": models, code, and papers
Alert button

Real time spectrogram inversion on mobile phone

Mar 10, 2022
Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy

Figure 1 for Real time spectrogram inversion on mobile phone
Figure 2 for Real time spectrogram inversion on mobile phone
Figure 3 for Real time spectrogram inversion on mobile phone
Figure 4 for Real time spectrogram inversion on mobile phone
Viaarxiv icon

Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models

Oct 01, 2020
Thai Binh Nguyen, Quang Minh Nguyen, Thi Thu Hien Nguyen, Quoc Truong Do, Chi Mai Luong

Figure 1 for Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models
Figure 2 for Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models
Figure 3 for Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models
Figure 4 for Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models
Viaarxiv icon

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition

Jan 24, 2021
Cheng Yi, Shiyu Zhou, Bo Xu

Figure 1 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Figure 2 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Figure 3 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Figure 4 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Viaarxiv icon

An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features

Jun 01, 2020
Shi-Yan Weng, Tien-Hong Lo, Berlin Chen

Figure 1 for An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features
Figure 2 for An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features
Figure 3 for An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features
Figure 4 for An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features
Viaarxiv icon

Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation

May 12, 2022
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda

Figure 1 for Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation
Figure 2 for Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation
Figure 3 for Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation
Figure 4 for Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation
Viaarxiv icon

Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus

Oct 06, 2020
Michel Plüss, Lukas Neukom, Manfred Vogel

Figure 1 for Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus
Figure 2 for Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus
Figure 3 for Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus
Figure 4 for Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus
Viaarxiv icon

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

May 09, 2019
Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney

Figure 1 for Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
Figure 2 for Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
Figure 3 for Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
Figure 4 for Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
Viaarxiv icon

Phonological Features for 0-shot Multilingual Speech Synthesis

Aug 06, 2020
Marlene Staib, Tian Huey Teh, Alexandra Torresquintero, Devang S Ram Mohan, Lorenzo Foglianti, Raphael Lenain, Jiameng Gao

Figure 1 for Phonological Features for 0-shot Multilingual Speech Synthesis
Figure 2 for Phonological Features for 0-shot Multilingual Speech Synthesis
Figure 3 for Phonological Features for 0-shot Multilingual Speech Synthesis
Figure 4 for Phonological Features for 0-shot Multilingual Speech Synthesis
Viaarxiv icon

Cloning one's voice using very limited data in the wild

Oct 08, 2021
Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yuping Wang, Yuxuan Wang

Figure 1 for Cloning one's voice using very limited data in the wild
Figure 2 for Cloning one's voice using very limited data in the wild
Figure 3 for Cloning one's voice using very limited data in the wild
Figure 4 for Cloning one's voice using very limited data in the wild
Viaarxiv icon

Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

Oct 08, 2020
Hieu-Thi Luong, Junichi Yamagishi

Figure 1 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion
Figure 2 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion
Figure 3 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion
Figure 4 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion
Viaarxiv icon