Alert button

"speech": models, code, and papers
Alert button

Training Autoregressive Speech Recognition Models with Limited in-domain Supervision

Oct 27, 2022
Chak-Fai Li, Francis Keith, William Hartmann, Matthew Snover

Figure 1 for Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Figure 2 for Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Figure 3 for Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Figure 4 for Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Viaarxiv icon

Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features

Mar 14, 2023
Xuchu Chen, Yu Pu, Jinpeng Li, Wei-Qiang Zhang

Figure 1 for Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features
Figure 2 for Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features
Viaarxiv icon

A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training

Nov 16, 2022
Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Figure 1 for A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
Figure 2 for A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
Figure 3 for A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
Figure 4 for A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
Viaarxiv icon

Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices

Add code
Bookmark button
Alert button
Nov 25, 2022
Oliver Watts, Lovisa Wihlborg, Cassia Valentini-Botinhao

Figure 1 for Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
Figure 2 for Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
Figure 3 for Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
Figure 4 for Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
Viaarxiv icon

Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data

May 18, 2023
Yangshuo He, Guanding Yu, Yunlong Cai

Figure 1 for Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data
Figure 2 for Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data
Figure 3 for Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data
Figure 4 for Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data
Viaarxiv icon

Avoid Overthinking in Self-Supervised Models for Speech Recognition

Add code
Bookmark button
Alert button
Nov 01, 2022
Dan Berrebbi, Brian Yan, Shinji Watanabe

Figure 1 for Avoid Overthinking in Self-Supervised Models for Speech Recognition
Figure 2 for Avoid Overthinking in Self-Supervised Models for Speech Recognition
Figure 3 for Avoid Overthinking in Self-Supervised Models for Speech Recognition
Figure 4 for Avoid Overthinking in Self-Supervised Models for Speech Recognition
Viaarxiv icon

Model-based estimation of in-car-communication feedback applied to speech zone detection

Oct 07, 2022
Kaspar Müller, Simon Doclo, Jan Østergaard, Tobias Wolff

Figure 1 for Model-based estimation of in-car-communication feedback applied to speech zone detection
Figure 2 for Model-based estimation of in-car-communication feedback applied to speech zone detection
Figure 3 for Model-based estimation of in-car-communication feedback applied to speech zone detection
Figure 4 for Model-based estimation of in-car-communication feedback applied to speech zone detection
Viaarxiv icon

Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation

Add code
Bookmark button
Alert button
Oct 27, 2022
Marvin Lavechin, Marianne Métais, Hadrien Titeux, Alodie Boissonnet, Jade Copet, Morgane Rivière, Elika Bergelson, Alejandrina Cristia, Emmanuel Dupoux, Hervé Bredin

Figure 1 for Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Figure 2 for Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Figure 3 for Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Figure 4 for Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Viaarxiv icon

Rudolf Christoph Eucken at SemEval-2023 Task 4: An Ensemble Approach for Identifying Human Values from Arguments

Add code
Bookmark button
Alert button
May 09, 2023
Sougata Saha, Rohini Srihari

Figure 1 for Rudolf Christoph Eucken at SemEval-2023 Task 4: An Ensemble Approach for Identifying Human Values from Arguments
Figure 2 for Rudolf Christoph Eucken at SemEval-2023 Task 4: An Ensemble Approach for Identifying Human Values from Arguments
Figure 3 for Rudolf Christoph Eucken at SemEval-2023 Task 4: An Ensemble Approach for Identifying Human Values from Arguments
Figure 4 for Rudolf Christoph Eucken at SemEval-2023 Task 4: An Ensemble Approach for Identifying Human Values from Arguments
Viaarxiv icon

Towards Disentangled Speech Representations

Aug 28, 2022
Cal Peyser, Ronny Huang Andrew Rosenberg Tara N. Sainath, Michael Picheny, Kyunghyun Cho

Figure 1 for Towards Disentangled Speech Representations
Figure 2 for Towards Disentangled Speech Representations
Figure 3 for Towards Disentangled Speech Representations
Figure 4 for Towards Disentangled Speech Representations
Viaarxiv icon