Alert button

"speech": models, code, and papers
Alert button

Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters

Oct 28, 2022
Junyi Peng, Themos Stafylakis, Rongzhi Gu, Oldřich Plchot, Ladislav Mošner, Lukáš Burget, Jan Černocký

Figure 1 for Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters
Figure 2 for Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters
Figure 3 for Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters
Figure 4 for Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters
Viaarxiv icon

TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement

Add code
Bookmark button
Alert button
Oct 20, 2021
Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

Figure 1 for TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement
Figure 2 for TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement
Figure 3 for TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement
Figure 4 for TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement
Viaarxiv icon

Speech Denoising without Clean Training Data: a Noise2Noise Approach

Add code
Bookmark button
Alert button
Apr 08, 2021
Madhav Mahesh Kashyap, Anuj Tambwekar, Krishnamoorthy Manohara, S Natarajan

Figure 1 for Speech Denoising without Clean Training Data: a Noise2Noise Approach
Figure 2 for Speech Denoising without Clean Training Data: a Noise2Noise Approach
Viaarxiv icon

Generating coherent spontaneous speech and gesture from text

Add code
Bookmark button
Alert button
Jan 14, 2021
Simon Alexanderson, Éva Székely, Gustav Eje Henter, Taras Kucherenko, Jonas Beskow

Figure 1 for Generating coherent spontaneous speech and gesture from text
Figure 2 for Generating coherent spontaneous speech and gesture from text
Viaarxiv icon

TransPOS: Transformers for Consolidating Different POS Tagset Datasets

Add code
Bookmark button
Alert button
Sep 24, 2022
Alex Li, Ilyas Bankole-Hameed, Ranadeep Singh, Gabriel Shen Han Ng, Akshat Gupta

Figure 1 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Figure 2 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Figure 3 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Figure 4 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Viaarxiv icon

Egocentric Audio-Visual Noise Suppression

Nov 07, 2022
Roshan Sharma, Weipeng He, Ju Lin, Egor Lakomkin, Yang Liu, Kaustubh Kalgaonkar

Figure 1 for Egocentric Audio-Visual Noise Suppression
Figure 2 for Egocentric Audio-Visual Noise Suppression
Figure 3 for Egocentric Audio-Visual Noise Suppression
Figure 4 for Egocentric Audio-Visual Noise Suppression
Viaarxiv icon

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Nov 07, 2022
Abhinav Joshi, Naman Gupta, Jinang Shah, Binod Bhattarai, Ashutosh Modi, Danail Stoyanov

Figure 1 for Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments
Figure 2 for Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments
Figure 3 for Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments
Figure 4 for Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments
Viaarxiv icon

Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition

Sep 17, 2021
Felix Weninger, Marco Gaudesi, Ralf Leibold, Roberto Gemello, Puming Zhan

Figure 1 for Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition
Figure 2 for Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition
Figure 3 for Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition
Figure 4 for Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition
Viaarxiv icon

An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis

Add code
Bookmark button
Alert button
Sep 28, 2022
Tobias Hallmen, Silvan Mertes, Dominik Schiller, Elisabeth André

Figure 1 for An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Figure 2 for An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Figure 3 for An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Viaarxiv icon

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

Add code
Bookmark button
Alert button
Oct 27, 2022
Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, Shinji Watanabe

Figure 1 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Figure 2 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Figure 3 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Figure 4 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Viaarxiv icon