Alert button

"speech": models, code, and papers
Alert button

Single Channel Speech Enhancement Using U-Net Spiking Neural Networks

Jul 26, 2023
Abir Riahi, Éric Plourde

Figure 1 for Single Channel Speech Enhancement Using U-Net Spiking Neural Networks
Figure 2 for Single Channel Speech Enhancement Using U-Net Spiking Neural Networks
Viaarxiv icon

Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech

Jun 09, 2023
Shijun Wang, Jón Guðnason, Damian Borth

Figure 1 for Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech
Figure 2 for Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech
Figure 3 for Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech
Figure 4 for Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech
Viaarxiv icon

PromptTTS 2: Describing and Generating Voices with Text Prompt

Sep 05, 2023
Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

Figure 1 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 2 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 3 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 4 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Viaarxiv icon

SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus

Sep 12, 2023
Haoxu Wang, Fan Yu, Xian Shi, Yuezhang Wang, Shiliang Zhang, Ming Li

Figure 1 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 2 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 3 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 4 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Viaarxiv icon

Generative Spoken Language Model based on continuous word-sized audio tokens

Oct 08, 2023
Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoit Sagot, Emmanuel Dupoux

Viaarxiv icon

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces

Jul 23, 2023
Ivan Vallés-Pérez, Grzegorz Beringer, Piotr Bilinski, Gary Cook, Roberto Barra-Chicote

Figure 1 for SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
Figure 2 for SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
Figure 3 for SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
Figure 4 for SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
Viaarxiv icon

Sparse Fine-tuning for Inference Acceleration of Large Language Models

Oct 13, 2023
Eldar Kurtic, Denis Kuznedelev, Elias Frantar, Michael Goin, Dan Alistarh

Viaarxiv icon

SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer

Jul 20, 2023
Daegyeom Kim, Seongho Hong, Yong-Hoon Choi

Figure 1 for SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Figure 2 for SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Figure 3 for SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Figure 4 for SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Viaarxiv icon

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

Jul 18, 2023
Yinghao Aaron Li, Cong Han, Nima Mesgarani

Figure 1 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Figure 2 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Figure 3 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Figure 4 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Viaarxiv icon

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Jul 23, 2023
Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe

Figure 1 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Figure 2 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Figure 3 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Viaarxiv icon