Alert button

"speech recognition": models, code, and papers
Alert button

Explicit Intensity Control for Accented Text-to-speech

Add code
Bookmark button
Alert button
Oct 27, 2022
Rui Liu, Haolin Zuo, De Hu, Guanglai Gao, Haizhou Li

Figure 1 for Explicit Intensity Control for Accented Text-to-speech
Figure 2 for Explicit Intensity Control for Accented Text-to-speech
Figure 3 for Explicit Intensity Control for Accented Text-to-speech
Viaarxiv icon

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Sep 21, 2018
Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Wenyu Jin, Björn Schuller

Figure 1 for Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Figure 2 for Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Figure 3 for Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Figure 4 for Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Viaarxiv icon

Streaming Models for Joint Speech Recognition and Translation

Jan 22, 2021
Orion Weller, Matthias Sperber, Christian Gollan, Joris Kluivers

Figure 1 for Streaming Models for Joint Speech Recognition and Translation
Figure 2 for Streaming Models for Joint Speech Recognition and Translation
Figure 3 for Streaming Models for Joint Speech Recognition and Translation
Figure 4 for Streaming Models for Joint Speech Recognition and Translation
Viaarxiv icon

Whose Emotion Matters? Speaker Detection without Prior Knowledge

Add code
Bookmark button
Alert button
Nov 23, 2022
Hugo Carneiro, Cornelius Weber, Stefan Wermter

Figure 1 for Whose Emotion Matters? Speaker Detection without Prior Knowledge
Figure 2 for Whose Emotion Matters? Speaker Detection without Prior Knowledge
Figure 3 for Whose Emotion Matters? Speaker Detection without Prior Knowledge
Figure 4 for Whose Emotion Matters? Speaker Detection without Prior Knowledge
Viaarxiv icon

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

Apr 04, 2021
Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen, Xuefei Liu

Figure 1 for TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition
Figure 2 for TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition
Figure 3 for TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition
Figure 4 for TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition
Viaarxiv icon

Provable Robustness for Streaming Models with a Sliding Window

Mar 28, 2023
Aounon Kumar, Vinu Sankar Sadasivan, Soheil Feizi

Figure 1 for Provable Robustness for Streaming Models with a Sliding Window
Figure 2 for Provable Robustness for Streaming Models with a Sliding Window
Figure 3 for Provable Robustness for Streaming Models with a Sliding Window
Figure 4 for Provable Robustness for Streaming Models with a Sliding Window
Viaarxiv icon

Towards A Unified Conformer Structure: from ASR to ASV Task

Add code
Bookmark button
Alert button
Nov 14, 2022
Dexin Liao, Tao Jiang, Feng Wang, Lin Li, Qingyang Hong

Figure 1 for Towards A Unified Conformer Structure: from ASR to ASV Task
Figure 2 for Towards A Unified Conformer Structure: from ASR to ASV Task
Figure 3 for Towards A Unified Conformer Structure: from ASR to ASV Task
Figure 4 for Towards A Unified Conformer Structure: from ASR to ASV Task
Viaarxiv icon

Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages

Add code
Bookmark button
Alert button
May 02, 2022
Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi

Figure 1 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 2 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 3 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 4 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Viaarxiv icon

SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning

Jun 27, 2022
Zuheng Kang, Junqing Peng, Jianzong Wang, Jing Xiao

Figure 1 for SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Figure 2 for SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Figure 3 for SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Figure 4 for SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Viaarxiv icon

Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture

Feb 11, 2020
Haoran Miao, Gaofeng Cheng, Changfeng Gao, Pengyuan Zhang, Yonghong Yan

Figure 1 for Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture
Figure 2 for Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture
Figure 3 for Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture
Figure 4 for Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture
Viaarxiv icon