Alert button

"speech": models, code, and papers
Alert button

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

Mar 14, 2023
Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li, Xixin Wu, Xunying Liu, Helen Meng

Figure 1 for Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Figure 2 for Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Figure 3 for Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Figure 4 for Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Viaarxiv icon

Fine-grained Noise Control for Multispeaker Speech Synthesis

Add code
Bookmark button
Alert button
Apr 11, 2022
Karolos Nikitaras, Georgios Vamvoukakis, Nikolaos Ellinas, Konstantinos Klapsas, Konstantinos Markopoulos, Spyros Raptis, June Sig Sung, Gunu Jho, Aimilios Chalamandaris, Pirros Tsiakoulis

Figure 1 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 2 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 3 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 4 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Viaarxiv icon

Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

Mar 10, 2023
Mohammad Zeineldeen, Kartik Audhkhasi, Murali Karthick Baskar, Bhuvana Ramabhadran

Figure 1 for Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss
Figure 2 for Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss
Figure 3 for Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss
Figure 4 for Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss
Viaarxiv icon

Does human speech follow Benford's Law?

Mar 24, 2022
Leo Hsu, Visar Berisha

Figure 1 for Does human speech follow Benford's Law?
Figure 2 for Does human speech follow Benford's Law?
Figure 3 for Does human speech follow Benford's Law?
Figure 4 for Does human speech follow Benford's Law?
Viaarxiv icon

Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

Add code
Bookmark button
Alert button
Dec 14, 2022
Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux

Figure 1 for Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Figure 2 for Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Figure 3 for Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Figure 4 for Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Viaarxiv icon

SDS-200: A Swiss German Speech to Standard German Text Corpus

Add code
Bookmark button
Alert button
May 19, 2022
Michel Plüss, Manuela Hürlimann, Marc Cuny, Alla Stöckli, Nikolaos Kapotis, Julia Hartmann, Malgorzata Anna Ulasik, Christian Scheller, Yanick Schraner, Amit Jain, Jan Deriu, Mark Cieliebak, Manfred Vogel

Figure 1 for SDS-200: A Swiss German Speech to Standard German Text Corpus
Figure 2 for SDS-200: A Swiss German Speech to Standard German Text Corpus
Figure 3 for SDS-200: A Swiss German Speech to Standard German Text Corpus
Figure 4 for SDS-200: A Swiss German Speech to Standard German Text Corpus
Viaarxiv icon

Multichannel Speech Separation with Narrow-band Conformer

Add code
Bookmark button
Alert button
Apr 09, 2022
Changsheng Quan, Xiaofei Li

Figure 1 for Multichannel Speech Separation with Narrow-band Conformer
Figure 2 for Multichannel Speech Separation with Narrow-band Conformer
Figure 3 for Multichannel Speech Separation with Narrow-band Conformer
Figure 4 for Multichannel Speech Separation with Narrow-band Conformer
Viaarxiv icon

Benchmarking Generative Latent Variable Models for Speech

Add code
Bookmark button
Alert button
Apr 05, 2022
Jakob D. Havtorn, Lasse Borgholt, Søren Hauberg, Jes Frellsen, Lars Maaløe

Figure 1 for Benchmarking Generative Latent Variable Models for Speech
Figure 2 for Benchmarking Generative Latent Variable Models for Speech
Figure 3 for Benchmarking Generative Latent Variable Models for Speech
Figure 4 for Benchmarking Generative Latent Variable Models for Speech
Viaarxiv icon

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

May 20, 2022
Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu

Figure 1 for NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement
Figure 2 for NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement
Figure 3 for NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement
Figure 4 for NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement
Viaarxiv icon

Learning to Dub Movies via Hierarchical Prosody Models

Add code
Bookmark button
Alert button
Dec 08, 2022
Gaoxiang Cong, Liang Li, Yuankai Qi, Zhengjun Zha, Qi Wu, Wenyu Wang, Bin Jiang, Ming-Hsuan Yang, Qingming Huang

Figure 1 for Learning to Dub Movies via Hierarchical Prosody Models
Figure 2 for Learning to Dub Movies via Hierarchical Prosody Models
Figure 3 for Learning to Dub Movies via Hierarchical Prosody Models
Figure 4 for Learning to Dub Movies via Hierarchical Prosody Models
Viaarxiv icon