Alert button

"speech": models, code, and papers
Alert button

What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis

Add code
Bookmark button
Alert button
Nov 04, 2019
Chung-Yi Li, Pei-Chieh Yuan, Hung-Yi Lee

Figure 1 for What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis
Figure 2 for What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis
Figure 3 for What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis
Figure 4 for What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis
Viaarxiv icon

Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking

Add code
Bookmark button
Alert button
May 07, 2022
Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki

Figure 1 for Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking
Figure 2 for Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking
Figure 3 for Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking
Figure 4 for Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking
Viaarxiv icon

On Prosody Modeling for ASR+TTS based Voice Conversion

Add code
Bookmark button
Alert button
Jul 20, 2021
Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, Tomoki Toda

Figure 1 for On Prosody Modeling for ASR+TTS based Voice Conversion
Figure 2 for On Prosody Modeling for ASR+TTS based Voice Conversion
Figure 3 for On Prosody Modeling for ASR+TTS based Voice Conversion
Figure 4 for On Prosody Modeling for ASR+TTS based Voice Conversion
Viaarxiv icon

Conditional probing: measuring usable information beyond a baseline

Add code
Bookmark button
Alert button
Sep 19, 2021
John Hewitt, Kawin Ethayarajh, Percy Liang, Christopher D. Manning

Figure 1 for Conditional probing: measuring usable information beyond a baseline
Figure 2 for Conditional probing: measuring usable information beyond a baseline
Figure 3 for Conditional probing: measuring usable information beyond a baseline
Figure 4 for Conditional probing: measuring usable information beyond a baseline
Viaarxiv icon

Transfer Learning from Audio-Visual Grounding to Speech Recognition

Jul 09, 2019
Wei-Ning Hsu, David Harwath, James Glass

Figure 1 for Transfer Learning from Audio-Visual Grounding to Speech Recognition
Figure 2 for Transfer Learning from Audio-Visual Grounding to Speech Recognition
Figure 3 for Transfer Learning from Audio-Visual Grounding to Speech Recognition
Figure 4 for Transfer Learning from Audio-Visual Grounding to Speech Recognition
Viaarxiv icon

Deep Spoken Keyword Spotting: An Overview

Nov 20, 2021
Iván López-Espejo, Zheng-Hua Tan, John Hansen, Jesper Jensen

Figure 1 for Deep Spoken Keyword Spotting: An Overview
Figure 2 for Deep Spoken Keyword Spotting: An Overview
Figure 3 for Deep Spoken Keyword Spotting: An Overview
Figure 4 for Deep Spoken Keyword Spotting: An Overview
Viaarxiv icon

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion

Jun 27, 2019
Suyoun Kim, Siddharth Dalmia, Florian Metze

Figure 1 for Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
Figure 2 for Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
Figure 3 for Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
Figure 4 for Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
Viaarxiv icon

Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization

Feb 16, 2022
Bing Yang, Hong Liu, Xiaofei Li

Figure 1 for Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization
Figure 2 for Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization
Figure 3 for Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization
Figure 4 for Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization
Viaarxiv icon

Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach

Oct 20, 2021
Mun-Hak Lee, Joon-Hyuk Chang

Figure 1 for Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach
Figure 2 for Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach
Figure 3 for Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach
Figure 4 for Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach
Viaarxiv icon

Deep Annotation of Therapeutic Working Alliance in Psychotherapy

Apr 12, 2022
Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf

Figure 1 for Deep Annotation of Therapeutic Working Alliance in Psychotherapy
Figure 2 for Deep Annotation of Therapeutic Working Alliance in Psychotherapy
Figure 3 for Deep Annotation of Therapeutic Working Alliance in Psychotherapy
Figure 4 for Deep Annotation of Therapeutic Working Alliance in Psychotherapy
Viaarxiv icon