Alert button

"speech": models, code, and papers
Alert button

An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning

Sep 20, 2022
Tushar Talukder Showrav

Figure 1 for An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning
Figure 2 for An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning
Figure 3 for An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning
Viaarxiv icon

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

May 01, 2023
Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiawei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, Zhou Zhao

Figure 1 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Figure 2 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Figure 3 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Figure 4 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Viaarxiv icon

Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Nov 13, 2022
Jacob J Webber, Cassia Valentini-Botinhao, Evelyn Williams, Gustav Eje Henter, Simon King

Figure 1 for Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
Figure 2 for Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
Figure 3 for Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
Figure 4 for Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
Viaarxiv icon

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

Sep 30, 2022
Chendong Zhao, Jianzong Wang, Wen qi Wei, Xiaoyang Qu, Haoqian Wang, Jing Xiao

Figure 1 for Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
Figure 2 for Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
Figure 3 for Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
Figure 4 for Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
Viaarxiv icon

Conditioning and Sampling in Variational Diffusion Models for Speech Super-resolution

Oct 27, 2022
Chin-Yun Yu, Sung-Lin Yeh, György Fazekas, Hao Tang

Figure 1 for Conditioning and Sampling in Variational Diffusion Models for Speech Super-resolution
Figure 2 for Conditioning and Sampling in Variational Diffusion Models for Speech Super-resolution
Figure 3 for Conditioning and Sampling in Variational Diffusion Models for Speech Super-resolution
Figure 4 for Conditioning and Sampling in Variational Diffusion Models for Speech Super-resolution
Viaarxiv icon

Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model

Mar 13, 2023
Shuangping Huang, Yu Luo, Zhenzhou Zhuang, Jin-Gang Yu, Mengchao He, Yongpan Wang

Figure 1 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 2 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 3 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 4 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Viaarxiv icon

Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence

Mar 13, 2023
Yicheng Hsu, Mingsian Bai

Figure 1 for Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence
Figure 2 for Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence
Figure 3 for Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence
Figure 4 for Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence
Viaarxiv icon

Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing

Nov 03, 2022
Sofoklis Kakouros, Themos Stafylakis, Ladislav Mosner, Lukas Burget

Figure 1 for Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing
Figure 2 for Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing
Figure 3 for Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing
Figure 4 for Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing
Viaarxiv icon

SumREN: Summarizing Reported Speech about Events in News

Dec 02, 2022
Revanth Gangi Reddy, Heba Elfardy, Hou Pong Chan, Kevin Small, Heng Ji

Figure 1 for SumREN: Summarizing Reported Speech about Events in News
Figure 2 for SumREN: Summarizing Reported Speech about Events in News
Figure 3 for SumREN: Summarizing Reported Speech about Events in News
Figure 4 for SumREN: Summarizing Reported Speech about Events in News
Viaarxiv icon

Modeling Speaker-Listener Interaction for Backchannel Prediction

Apr 10, 2023
Daniel Ortega, Sarina Meyer, Antje Schweitzer, Ngoc Thang Vu

Figure 1 for Modeling Speaker-Listener Interaction for Backchannel Prediction
Figure 2 for Modeling Speaker-Listener Interaction for Backchannel Prediction
Figure 3 for Modeling Speaker-Listener Interaction for Backchannel Prediction
Figure 4 for Modeling Speaker-Listener Interaction for Backchannel Prediction
Viaarxiv icon