Alert button

"speech": models, code, and papers
Alert button

MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification

Add code
Bookmark button
Alert button
Mar 29, 2022
Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu, Hung-yi Lee, Helen Meng

Figure 1 for MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification
Figure 2 for MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification
Figure 3 for MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification
Figure 4 for MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification
Viaarxiv icon

Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR

Add code
Bookmark button
Alert button
Mar 29, 2022
Fangyuan Wang, Bo Xu

Figure 1 for Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR
Figure 2 for Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR
Figure 3 for Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR
Figure 4 for Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR
Viaarxiv icon

Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

Jun 30, 2020
Maarten Van Segbroeck, Harish Mallidih, Brian King, I-Fan Chen, Gurpreet Chadha, Roland Maas

Viaarxiv icon

Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition

Jun 11, 2019
Suraj Tripathi, Abhay Kumar, Abhiram Ramesh, Chirag Singh, Promod Yenigalla

Figure 1 for Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition
Figure 2 for Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition
Figure 3 for Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition
Viaarxiv icon

Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation

May 21, 2020
Shun-Po Chuang, Tzu-Wei Sung, Alexander H. Liu, Hung-yi Lee

Figure 1 for Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation
Figure 2 for Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation
Figure 3 for Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation
Figure 4 for Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation
Viaarxiv icon

Computational bioacoustics with deep learning: a review and roadmap

Dec 13, 2021
Dan Stowell

Figure 1 for Computational bioacoustics with deep learning: a review and roadmap
Figure 2 for Computational bioacoustics with deep learning: a review and roadmap
Viaarxiv icon

Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

Dec 13, 2021
Sebastian P. Bayerl, Aniruddha Tammewar, Korbinian Riedhammer, Giuseppe Riccardi

Figure 1 for Detecting Emotion Carriers by Combining Acoustic and Lexical Representations
Figure 2 for Detecting Emotion Carriers by Combining Acoustic and Lexical Representations
Figure 3 for Detecting Emotion Carriers by Combining Acoustic and Lexical Representations
Figure 4 for Detecting Emotion Carriers by Combining Acoustic and Lexical Representations
Viaarxiv icon

Attacker Attribution of Audio Deepfakes

Mar 28, 2022
Nicolas M. Müller, Franziska Dieckmann, Jennifer Williams

Figure 1 for Attacker Attribution of Audio Deepfakes
Figure 2 for Attacker Attribution of Audio Deepfakes
Figure 3 for Attacker Attribution of Audio Deepfakes
Figure 4 for Attacker Attribution of Audio Deepfakes
Viaarxiv icon

Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control

Add code
Bookmark button
Alert button
Nov 19, 2021
Myrsini Christidou, Alexandra Vioni, Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Panos Kakoulidis, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis

Figure 1 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Figure 2 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Figure 3 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Figure 4 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Viaarxiv icon

Joint Spatio-Temporal Discretisation of Nonlinear Active Cochlear Models

Aug 12, 2021
T. Dang, V. Sethu, E. Ambikairajah, J. Epps, H. Li

Figure 1 for Joint Spatio-Temporal Discretisation of Nonlinear Active Cochlear Models
Figure 2 for Joint Spatio-Temporal Discretisation of Nonlinear Active Cochlear Models
Figure 3 for Joint Spatio-Temporal Discretisation of Nonlinear Active Cochlear Models
Figure 4 for Joint Spatio-Temporal Discretisation of Nonlinear Active Cochlear Models
Viaarxiv icon