Alert button

"speech recognition": models, code, and papers
Alert button

StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR

Add code
Bookmark button
Alert button
Jul 01, 2021
Hirofumi Inaguma, Tatsuya Kawahara

Figure 1 for StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR
Figure 2 for StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR
Figure 3 for StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR
Figure 4 for StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR
Viaarxiv icon

A Discussion On the Validity of Manifold Learning

Jun 03, 2021
Dai Shi, Andi Han, Yi Guo, Junbin Gao

Figure 1 for A Discussion On the Validity of Manifold Learning
Figure 2 for A Discussion On the Validity of Manifold Learning
Figure 3 for A Discussion On the Validity of Manifold Learning
Figure 4 for A Discussion On the Validity of Manifold Learning
Viaarxiv icon

Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings

Add code
Bookmark button
Alert button
Aug 11, 2020
Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Figure 1 for Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings
Figure 2 for Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings
Figure 3 for Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings
Figure 4 for Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings
Viaarxiv icon

Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition

Add code
Bookmark button
Alert button
Jan 13, 2018
Che-Wei Huang, Shrikanth. S. Narayanan

Figure 1 for Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Figure 2 for Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Figure 3 for Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Figure 4 for Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Viaarxiv icon

Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

Jun 28, 2021
Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

Figure 1 for Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio
Figure 2 for Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio
Figure 3 for Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio
Figure 4 for Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio
Viaarxiv icon

TLT-school: a Corpus of Non Native Children Speech

Jan 22, 2020
Roberto Gretter, Marco Matassoni, Stefano Bannò, Daniele Falavigna

Figure 1 for TLT-school: a Corpus of Non Native Children Speech
Figure 2 for TLT-school: a Corpus of Non Native Children Speech
Figure 3 for TLT-school: a Corpus of Non Native Children Speech
Figure 4 for TLT-school: a Corpus of Non Native Children Speech
Viaarxiv icon

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

Feb 07, 2021
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

Figure 1 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Figure 2 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Figure 3 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Figure 4 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Viaarxiv icon

Context-Aware Task Handling in Resource-Constrained Robots with Virtualization

Apr 09, 2021
Ramyad Hadidi, Nima Shoghi Ghalehshahi, Bahar Asgari, Hyesoon Kim

Figure 1 for Context-Aware Task Handling in Resource-Constrained Robots with Virtualization
Figure 2 for Context-Aware Task Handling in Resource-Constrained Robots with Virtualization
Figure 3 for Context-Aware Task Handling in Resource-Constrained Robots with Virtualization
Figure 4 for Context-Aware Task Handling in Resource-Constrained Robots with Virtualization
Viaarxiv icon

Multimodal Speech Emotion Recognition and Ambiguity Resolution

Add code
Bookmark button
Alert button
Apr 12, 2019
Gaurav Sahu

Figure 1 for Multimodal Speech Emotion Recognition and Ambiguity Resolution
Figure 2 for Multimodal Speech Emotion Recognition and Ambiguity Resolution
Figure 3 for Multimodal Speech Emotion Recognition and Ambiguity Resolution
Figure 4 for Multimodal Speech Emotion Recognition and Ambiguity Resolution
Viaarxiv icon

MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation

Add code
Bookmark button
Alert button
Apr 17, 2021
Xiyun Li, Yong Xu, Meng Yu, Shi-Xiong Zhang, Jiaming Xu, Bo Xu, Dong Yu

Figure 1 for MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation
Figure 2 for MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation
Figure 3 for MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation
Figure 4 for MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation
Viaarxiv icon