Alert button

"speech": models, code, and papers
Alert button

Improving Monaural Speech Enhancement with Multi-head Self and Cross Attention

May 20, 2022
Xinmeng Xu, Jianjun Hao

Figure 1 for Improving Monaural Speech Enhancement with Multi-head Self and Cross Attention
Figure 2 for Improving Monaural Speech Enhancement with Multi-head Self and Cross Attention
Figure 3 for Improving Monaural Speech Enhancement with Multi-head Self and Cross Attention
Figure 4 for Improving Monaural Speech Enhancement with Multi-head Self and Cross Attention
Viaarxiv icon

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

Sep 13, 2022
Chao Zhang, Bo Li, Tara Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-yiin Chang, Parisa Haghani

Figure 1 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Figure 2 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Figure 3 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Figure 4 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Viaarxiv icon

Scaling Laws for Generative Mixed-Modal Language Models

Jan 10, 2023
Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

Figure 1 for Scaling Laws for Generative Mixed-Modal Language Models
Figure 2 for Scaling Laws for Generative Mixed-Modal Language Models
Figure 3 for Scaling Laws for Generative Mixed-Modal Language Models
Figure 4 for Scaling Laws for Generative Mixed-Modal Language Models
Viaarxiv icon

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

Jun 15, 2022
Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Tianzi Wang, Xunying Liu, Helen Meng

Figure 1 for Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition
Figure 2 for Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition
Figure 3 for Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition
Viaarxiv icon

ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks

Jul 05, 2022
Valentin Pelloin, Franck Dary, Nicolas Herve, Benoit Favre, Nathalie Camelin, Antoine Laurent, Laurent Besacier

Figure 1 for ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks
Figure 2 for ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks
Figure 3 for ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks
Figure 4 for ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks
Viaarxiv icon

Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition

Jan 27, 2022
Ayoub Ghriss, Bo Yang, Viktor Rozgic, Elizabeth Shriberg, Chao Wang

Figure 1 for Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition
Figure 2 for Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition
Figure 3 for Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition
Figure 4 for Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition
Viaarxiv icon

Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages

May 02, 2022
Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi

Figure 1 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 2 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 3 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 4 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Viaarxiv icon

Insights on Neural Representations for End-to-End Speech Recognition

May 19, 2022
Anna Ollerenshaw, Md Asif Jalal, Thomas Hain

Figure 1 for Insights on Neural Representations for End-to-End Speech Recognition
Figure 2 for Insights on Neural Representations for End-to-End Speech Recognition
Figure 3 for Insights on Neural Representations for End-to-End Speech Recognition
Figure 4 for Insights on Neural Representations for End-to-End Speech Recognition
Viaarxiv icon

A Dataset for Speech Emotion Recognition in Greek Theatrical Plays

Mar 27, 2022
Maria Moutti, Sofia Eleftheriou, Panagiotis Koromilas, Theodoros Giannakopoulos

Figure 1 for A Dataset for Speech Emotion Recognition in Greek Theatrical Plays
Figure 2 for A Dataset for Speech Emotion Recognition in Greek Theatrical Plays
Figure 3 for A Dataset for Speech Emotion Recognition in Greek Theatrical Plays
Viaarxiv icon

Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

Apr 08, 2022
Tobias Weise, Philipp Klumpp, Andreas Maier, Elmar Noeth, Bjoern Heismann, Maria Schuster, Seung Hee Yang

Figure 1 for Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment
Figure 2 for Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment
Figure 3 for Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment
Viaarxiv icon