Alert button

"speech recognition": models, code, and papers
Alert button

TLT-school: a Corpus of Non Native Children Speech

Jan 22, 2020
Roberto Gretter, Marco Matassoni, Stefano Bannò, Daniele Falavigna

Figure 1 for TLT-school: a Corpus of Non Native Children Speech
Figure 2 for TLT-school: a Corpus of Non Native Children Speech
Figure 3 for TLT-school: a Corpus of Non Native Children Speech
Figure 4 for TLT-school: a Corpus of Non Native Children Speech
Viaarxiv icon

Multimodal Speech Emotion Recognition and Ambiguity Resolution

Add code
Bookmark button
Alert button
Apr 12, 2019
Gaurav Sahu

Figure 1 for Multimodal Speech Emotion Recognition and Ambiguity Resolution
Figure 2 for Multimodal Speech Emotion Recognition and Ambiguity Resolution
Figure 3 for Multimodal Speech Emotion Recognition and Ambiguity Resolution
Figure 4 for Multimodal Speech Emotion Recognition and Ambiguity Resolution
Viaarxiv icon

Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

Jun 28, 2021
Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

Figure 1 for Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio
Figure 2 for Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio
Figure 3 for Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio
Figure 4 for Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio
Viaarxiv icon

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

Feb 07, 2021
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

Figure 1 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Figure 2 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Figure 3 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Figure 4 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Viaarxiv icon

Context-Aware Task Handling in Resource-Constrained Robots with Virtualization

Apr 09, 2021
Ramyad Hadidi, Nima Shoghi Ghalehshahi, Bahar Asgari, Hyesoon Kim

Figure 1 for Context-Aware Task Handling in Resource-Constrained Robots with Virtualization
Figure 2 for Context-Aware Task Handling in Resource-Constrained Robots with Virtualization
Figure 3 for Context-Aware Task Handling in Resource-Constrained Robots with Virtualization
Figure 4 for Context-Aware Task Handling in Resource-Constrained Robots with Virtualization
Viaarxiv icon

MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation

Add code
Bookmark button
Alert button
Apr 17, 2021
Xiyun Li, Yong Xu, Meng Yu, Shi-Xiong Zhang, Jiaming Xu, Bo Xu, Dong Yu

Figure 1 for MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation
Figure 2 for MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation
Figure 3 for MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation
Figure 4 for MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation
Viaarxiv icon

RNN Transducer Models For Spoken Language Understanding

Apr 08, 2021
Samuel Thomas, Hong-Kwang J. Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory

Figure 1 for RNN Transducer Models For Spoken Language Understanding
Figure 2 for RNN Transducer Models For Spoken Language Understanding
Figure 3 for RNN Transducer Models For Spoken Language Understanding
Figure 4 for RNN Transducer Models For Spoken Language Understanding
Viaarxiv icon

Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval

Apr 08, 2021
Ramon Sanabria, Austin Waters, Jason Baldridge

Figure 1 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
Figure 2 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
Figure 3 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
Figure 4 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
Viaarxiv icon

LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring

Add code
Bookmark button
Alert button
Apr 06, 2021
Anton Mitrofanov, Mariya Korenevskaya, Ivan Podluzhny, Yuri Khokhlov, Aleksandr Laptev, Andrei Andrusenko, Aleksei Ilin, Maxim Korenevsky, Ivan Medennikov, Aleksei Romanenko

Figure 1 for LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
Figure 2 for LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
Figure 3 for LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
Figure 4 for LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
Viaarxiv icon