Alert button

"speech": models, code, and papers
Alert button

Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

Add code
Bookmark button
Alert button
May 17, 2020
Won Ik Cho, Donghyun Kwak, Jiwon Yoon, Nam Soo Kim

Figure 1 for Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation
Figure 2 for Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation
Figure 3 for Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation
Figure 4 for Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation
Viaarxiv icon

Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition

Sep 10, 2021
Rong Gong, Carl Quillen, Dushyant Sharma, Andrew Goderre, José Laínez, Ljubomir Milanović

Figure 1 for Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
Figure 2 for Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
Figure 3 for Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
Figure 4 for Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
Viaarxiv icon

L3Cube-MahaNLP: Marathi Natural Language Processing Datasets, Models, and Library

Add code
Bookmark button
Alert button
May 31, 2022
Raviraj Joshi

Viaarxiv icon

SAQAM: Spatial Audio Quality Assessment Metric

Jun 24, 2022
Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

Figure 1 for SAQAM: Spatial Audio Quality Assessment Metric
Figure 2 for SAQAM: Spatial Audio Quality Assessment Metric
Figure 3 for SAQAM: Spatial Audio Quality Assessment Metric
Figure 4 for SAQAM: Spatial Audio Quality Assessment Metric
Viaarxiv icon

Utterance-by-utterance overlap-aware neural diarization with Graph-PIT

Jul 28, 2022
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Boeddeker, Reinhold Haeb-Umbach

Figure 1 for Utterance-by-utterance overlap-aware neural diarization with Graph-PIT
Figure 2 for Utterance-by-utterance overlap-aware neural diarization with Graph-PIT
Viaarxiv icon

Constrained Variational Autoencoder for improving EEG based Speech Recognition Systems

Jun 01, 2020
Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

Figure 1 for Constrained Variational Autoencoder for improving EEG based Speech Recognition Systems
Figure 2 for Constrained Variational Autoencoder for improving EEG based Speech Recognition Systems
Figure 3 for Constrained Variational Autoencoder for improving EEG based Speech Recognition Systems
Figure 4 for Constrained Variational Autoencoder for improving EEG based Speech Recognition Systems
Viaarxiv icon

Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling

Add code
Bookmark button
Alert button
Apr 01, 2021
Qing He, Zhiping Xiu, Thilo Koehler, Jilong Wu

Figure 1 for Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Figure 2 for Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Figure 3 for Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Figure 4 for Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Viaarxiv icon

Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets

Add code
Bookmark button
Alert button
Apr 20, 2021
Gabriel Mittag, Saman Zadtootaghaj, Thilo Michael, Babak Naderi, Sebastian Möller

Figure 1 for Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets
Figure 2 for Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets
Figure 3 for Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets
Figure 4 for Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets
Viaarxiv icon

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

Feb 03, 2021
Shengkui Zhao, Trung Hieu Nguyen, Bin Ma

Figure 1 for Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses
Figure 2 for Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses
Figure 3 for Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses
Figure 4 for Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses
Viaarxiv icon

Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

Jan 07, 2021
Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao, Tai-Shih Chi

Figure 1 for Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario
Figure 2 for Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario
Figure 3 for Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario
Figure 4 for Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario
Viaarxiv icon