Alert button

"speech": models, code, and papers
Alert button

The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN

Jun 08, 2023
Zheng Yuan, Aldo Pastore, Dorina de Jong, Hao Xu, Luciano Fadiga, Alessandro D'Ausilio

Figure 1 for The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN
Figure 2 for The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN
Figure 3 for The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN
Figure 4 for The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN
Viaarxiv icon

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

May 25, 2023
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami

Figure 1 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 2 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 3 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 4 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Viaarxiv icon

Adapting the NICT-JLE Corpus for Disfluency Detection Models

Aug 04, 2023
Lucy Skidmore, Roger K. Moore

Figure 1 for Adapting the NICT-JLE Corpus for Disfluency Detection Models
Figure 2 for Adapting the NICT-JLE Corpus for Disfluency Detection Models
Figure 3 for Adapting the NICT-JLE Corpus for Disfluency Detection Models
Figure 4 for Adapting the NICT-JLE Corpus for Disfluency Detection Models
Viaarxiv icon

Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models

Jun 08, 2023
Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia, Yan Deng, Jonathan Tien

Figure 1 for Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models
Figure 2 for Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models
Figure 3 for Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models
Figure 4 for Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models
Viaarxiv icon

Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations

Jun 01, 2023
Salah Zaiem, Titouan Parcollet, Slim Essid

Figure 1 for Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations
Figure 2 for Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations
Figure 3 for Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations
Figure 4 for Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations
Viaarxiv icon

Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset

Aug 29, 2023
Mustafa Eyceoz, Justin Lee, Siddharth Pittie, Homayoon Beigi

Figure 1 for Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset
Figure 2 for Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset
Figure 3 for Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset
Figure 4 for Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset
Viaarxiv icon

On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications

May 23, 2023
Vamsikrishna Chemudupati, Marzieh Tahaei, Heitor Guimaraes, Arthur Pimentel, Anderson Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago Falk

Figure 1 for On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications
Figure 2 for On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications
Figure 3 for On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications
Viaarxiv icon

On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition

May 21, 2023
Lokesh Bansal, S. Pavankumar Dubagunta, Malolan Chetlur, Pushpak Jagtap, Aravind Ganapathiraju

Figure 1 for On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition
Figure 2 for On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition
Figure 3 for On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition
Viaarxiv icon

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

Sep 05, 2023
Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra

Figure 1 for TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Figure 2 for TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Figure 3 for TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Figure 4 for TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Viaarxiv icon

Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration

May 25, 2023
Rustem Yeshpanov, Saida Mussakhojayeva, Yerbolat Khassanov

Figure 1 for Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration
Figure 2 for Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration
Figure 3 for Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration
Viaarxiv icon