Alert button

"speech": models, code, and papers
Alert button

Modified Parametric Multichannel Wiener Filter \\for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers

Jun 29, 2023
Ning Guo, Tomohiro Nakatani, Shoko Araki, Takehiro Moriya

Figure 1 for Modified Parametric Multichannel Wiener Filter \\for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers
Figure 2 for Modified Parametric Multichannel Wiener Filter \\for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers
Figure 3 for Modified Parametric Multichannel Wiener Filter \\for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers
Figure 4 for Modified Parametric Multichannel Wiener Filter \\for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers
Viaarxiv icon

CiwaGAN: Articulatory information exchange

Add code
Bookmark button
Alert button
Sep 14, 2023
Gašper Beguš, Thomas Lu, Alan Zhou, Peter Wu, Gopala K. Anumanchipalli

Figure 1 for CiwaGAN: Articulatory information exchange
Figure 2 for CiwaGAN: Articulatory information exchange
Figure 3 for CiwaGAN: Articulatory information exchange
Figure 4 for CiwaGAN: Articulatory information exchange
Viaarxiv icon

How to Estimate Model Transferability of Pre-Trained Speech Models?

Jun 01, 2023
Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shou-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath

Figure 1 for How to Estimate Model Transferability of Pre-Trained Speech Models?
Figure 2 for How to Estimate Model Transferability of Pre-Trained Speech Models?
Figure 3 for How to Estimate Model Transferability of Pre-Trained Speech Models?
Figure 4 for How to Estimate Model Transferability of Pre-Trained Speech Models?
Viaarxiv icon

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation

Add code
Bookmark button
Alert button
May 24, 2023
Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

Figure 1 for Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation
Figure 2 for Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation
Figure 3 for Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation
Figure 4 for Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation
Viaarxiv icon

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

Add code
Bookmark button
Alert button
Jun 01, 2023
Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur

Figure 1 for Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Figure 2 for Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Figure 3 for Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Figure 4 for Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Viaarxiv icon

CTC-based Non-autoregressive Speech Translation

Add code
Bookmark button
Alert button
May 27, 2023
Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu

Figure 1 for CTC-based Non-autoregressive Speech Translation
Figure 2 for CTC-based Non-autoregressive Speech Translation
Figure 3 for CTC-based Non-autoregressive Speech Translation
Figure 4 for CTC-based Non-autoregressive Speech Translation
Viaarxiv icon

Improving Metrics for Speech Translation

May 22, 2023
Claudio Paonessa, Dominik Frefel, Manfred Vogel

Figure 1 for Improving Metrics for Speech Translation
Figure 2 for Improving Metrics for Speech Translation
Figure 3 for Improving Metrics for Speech Translation
Figure 4 for Improving Metrics for Speech Translation
Viaarxiv icon

Affect Recognition in Conversations Using Large Language Models

Add code
Bookmark button
Alert button
Sep 22, 2023
Shutong Feng, Guangzhi Sun, Nurul Lubis, Chao Zhang, Milica Gašić

Figure 1 for Affect Recognition in Conversations Using Large Language Models
Figure 2 for Affect Recognition in Conversations Using Large Language Models
Figure 3 for Affect Recognition in Conversations Using Large Language Models
Figure 4 for Affect Recognition in Conversations Using Large Language Models
Viaarxiv icon

Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

Add code
Bookmark button
Alert button
Jul 05, 2023
Sandipana Dowerah, Ajinkya Kulkarni, Romain Serizel, Denis Jouvet

Figure 1 for Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions
Figure 2 for Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions
Figure 3 for Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions
Figure 4 for Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions
Viaarxiv icon

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Add code
Bookmark button
Alert button
Aug 22, 2023
Harunori Kawano, Sota Shimizu

Figure 1 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 2 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 3 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 4 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Viaarxiv icon