Alert button

"speech recognition": models, code, and papers
Alert button

STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation

Apr 16, 2022
Saad Naeem, Omer Beg

Figure 1 for STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation
Figure 2 for STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation
Figure 3 for STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation
Figure 4 for STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation
Viaarxiv icon

Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration

Mar 04, 2021
Han Li, Sunghyun Park, Aswarth Dara, Jinseok Nam, Sungjin Lee, Young-Bum Kim, Spyros Matsoukas, Ruhi Sarikaya

Figure 1 for Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration
Figure 2 for Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration
Figure 3 for Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration
Figure 4 for Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration
Viaarxiv icon

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS

Oct 06, 2020
Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda

Figure 1 for The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS
Figure 2 for The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS
Figure 3 for The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS
Figure 4 for The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS
Viaarxiv icon

Multi-modal embeddings using multi-task learning for emotion recognition

Sep 10, 2020
Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram

Figure 1 for Multi-modal embeddings using multi-task learning for emotion recognition
Figure 2 for Multi-modal embeddings using multi-task learning for emotion recognition
Figure 3 for Multi-modal embeddings using multi-task learning for emotion recognition
Viaarxiv icon

Continuous Speech Separation with Ad Hoc Microphone Arrays

Mar 03, 2021
Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng

Figure 1 for Continuous Speech Separation with Ad Hoc Microphone Arrays
Figure 2 for Continuous Speech Separation with Ad Hoc Microphone Arrays
Figure 3 for Continuous Speech Separation with Ad Hoc Microphone Arrays
Figure 4 for Continuous Speech Separation with Ad Hoc Microphone Arrays
Viaarxiv icon

ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in Deep Speech Emotion Recognition

May 15, 2020
Mostafa M. Mohamed, Björn W. Schuller

Figure 1 for ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in Deep Speech Emotion Recognition
Figure 2 for ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in Deep Speech Emotion Recognition
Figure 3 for ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in Deep Speech Emotion Recognition
Figure 4 for ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in Deep Speech Emotion Recognition
Viaarxiv icon

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

Dec 24, 2020
Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu

Figure 1 for Multi-channel Multi-frame ADL-MVDR for Target Speech Separation
Figure 2 for Multi-channel Multi-frame ADL-MVDR for Target Speech Separation
Figure 3 for Multi-channel Multi-frame ADL-MVDR for Target Speech Separation
Figure 4 for Multi-channel Multi-frame ADL-MVDR for Target Speech Separation
Viaarxiv icon

BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data

Jan 28, 2021
Demetres Kostas, Stephane Aroca-Ouellette, Frank Rudzicz

Figure 1 for BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data
Figure 2 for BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data
Figure 3 for BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data
Figure 4 for BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data
Viaarxiv icon

Vocoder-free End-to-End Voice Conversion with Transformer Network

Feb 05, 2020
June-Woo Kim, Ho-Young Jung, Minho Lee

Figure 1 for Vocoder-free End-to-End Voice Conversion with Transformer Network
Figure 2 for Vocoder-free End-to-End Voice Conversion with Transformer Network
Figure 3 for Vocoder-free End-to-End Voice Conversion with Transformer Network
Figure 4 for Vocoder-free End-to-End Voice Conversion with Transformer Network
Viaarxiv icon

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans

Dec 23, 2020
Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li, Jing Shi, Aswin Shanmugam Subramanian, Wangyou Zhang

Figure 1 for The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Figure 2 for The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Viaarxiv icon