Alert button

"speech": models, code, and papers
Alert button

Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition

Apr 30, 2023
Pangoth Santhosh Kumar, Garika Akshay

Figure 1 for Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition
Figure 2 for Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition
Figure 3 for Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition
Figure 4 for Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition
Viaarxiv icon

A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training

Nov 16, 2022
Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Figure 1 for A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
Figure 2 for A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
Figure 3 for A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
Figure 4 for A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
Viaarxiv icon

Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation

Add code
Bookmark button
Alert button
Oct 27, 2022
Marvin Lavechin, Marianne Métais, Hadrien Titeux, Alodie Boissonnet, Jade Copet, Morgane Rivière, Elika Bergelson, Alejandrina Cristia, Emmanuel Dupoux, Hervé Bredin

Figure 1 for Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Figure 2 for Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Figure 3 for Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Figure 4 for Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Viaarxiv icon

Avoid Overthinking in Self-Supervised Models for Speech Recognition

Add code
Bookmark button
Alert button
Nov 01, 2022
Dan Berrebbi, Brian Yan, Shinji Watanabe

Figure 1 for Avoid Overthinking in Self-Supervised Models for Speech Recognition
Figure 2 for Avoid Overthinking in Self-Supervised Models for Speech Recognition
Figure 3 for Avoid Overthinking in Self-Supervised Models for Speech Recognition
Figure 4 for Avoid Overthinking in Self-Supervised Models for Speech Recognition
Viaarxiv icon

Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices

Add code
Bookmark button
Alert button
Nov 25, 2022
Oliver Watts, Lovisa Wihlborg, Cassia Valentini-Botinhao

Figure 1 for Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
Figure 2 for Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
Figure 3 for Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
Figure 4 for Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
Viaarxiv icon

Can we hear physical and social space together through prosody?

May 22, 2023
Ambre Davat, Véronique Aubergé, Gang Feng

Viaarxiv icon

Iterative autoregression: a novel trick to improve your low-latency speech enhancement model

Nov 03, 2022
Pavel Andreev, Nicholas Babaev, Azat Saginbaev, Ivan Shchekotov

Figure 1 for Iterative autoregression: a novel trick to improve your low-latency speech enhancement model
Figure 2 for Iterative autoregression: a novel trick to improve your low-latency speech enhancement model
Figure 3 for Iterative autoregression: a novel trick to improve your low-latency speech enhancement model
Figure 4 for Iterative autoregression: a novel trick to improve your low-latency speech enhancement model
Viaarxiv icon

Bengali Common Voice Speech Dataset for Automatic Speech Recognition

Add code
Bookmark button
Alert button
Jun 29, 2022
Samiul Alam, Asif Sushmit, Zaowad Abdullah, Shahrin Nakkhatra, MD. Nazmuddoha Ansary, Syed Mobassir Hossen, Sazia Morshed Mehnaz, Tahsin Reasat, Ahmed Imtiaz Humayun

Figure 1 for Bengali Common Voice Speech Dataset for Automatic Speech Recognition
Figure 2 for Bengali Common Voice Speech Dataset for Automatic Speech Recognition
Figure 3 for Bengali Common Voice Speech Dataset for Automatic Speech Recognition
Figure 4 for Bengali Common Voice Speech Dataset for Automatic Speech Recognition
Viaarxiv icon

Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning

Add code
Bookmark button
Alert button
Oct 26, 2022
Yi Chang, Zhao Ren, Thanh Tam Nguyen, Kun Qian, Björn W. Schuller

Figure 1 for Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning
Figure 2 for Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning
Figure 3 for Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning
Figure 4 for Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning
Viaarxiv icon

VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

Add code
Bookmark button
Alert button
Mar 14, 2023
Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro

Figure 1 for VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation
Viaarxiv icon