Alert button

"speech": models, code, and papers
Alert button

Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder

Add code
Bookmark button
Alert button
Nov 15, 2022
Yuying Xie, Thomas Arildsen, Zheng-Hua Tan

Figure 1 for Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder
Figure 2 for Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder
Figure 3 for Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder
Figure 4 for Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder
Viaarxiv icon

Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model

Mar 13, 2023
Shuangping Huang, Yu Luo, Zhenzhou Zhuang, Jin-Gang Yu, Mengchao He, Yongpan Wang

Figure 1 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 2 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 3 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 4 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Viaarxiv icon

Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence

Mar 13, 2023
Yicheng Hsu, Mingsian Bai

Figure 1 for Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence
Figure 2 for Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence
Figure 3 for Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence
Figure 4 for Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence
Viaarxiv icon

Conversion of Acoustic Signal (Speech) Into Text By Digital Filter using Natural Language Processing

Sep 09, 2022
Abhiram Katuri, Sindhu Salugu, Gelli Tharuni, Challa Sri Gouri

Figure 1 for Conversion of Acoustic Signal (Speech) Into Text By Digital Filter using Natural Language Processing
Figure 2 for Conversion of Acoustic Signal (Speech) Into Text By Digital Filter using Natural Language Processing
Figure 3 for Conversion of Acoustic Signal (Speech) Into Text By Digital Filter using Natural Language Processing
Viaarxiv icon

Diacritic Recognition Performance in Arabic ASR

Add code
Bookmark button
Alert button
Feb 27, 2023
Hanan Aldarmaki, Ahmad Ghannam

Figure 1 for Diacritic Recognition Performance in Arabic ASR
Figure 2 for Diacritic Recognition Performance in Arabic ASR
Figure 3 for Diacritic Recognition Performance in Arabic ASR
Figure 4 for Diacritic Recognition Performance in Arabic ASR
Viaarxiv icon

Can deepfakes be created by novice users?

Add code
Bookmark button
Alert button
Apr 28, 2023
Pulak Mehta, Gauri Jagatap, Kevin Gallagher, Brian Timmerman, Progga Deb, Siddharth Garg, Rachel Greenstadt, Brendan Dolan-Gavitt

Figure 1 for Can deepfakes be created by novice users?
Figure 2 for Can deepfakes be created by novice users?
Figure 3 for Can deepfakes be created by novice users?
Figure 4 for Can deepfakes be created by novice users?
Viaarxiv icon

Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

Oct 29, 2022
Roshan Sharma, Hira Dhamyal, Bhiksha Raj, Rita Singh

Figure 1 for Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition
Figure 2 for Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition
Figure 3 for Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition
Figure 4 for Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition
Viaarxiv icon

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Add code
Bookmark button
Alert button
Dec 14, 2022
Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli

Figure 1 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 2 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 3 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 4 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Viaarxiv icon

SIA-FTP: A Spoken Instruction Aware Flight Trajectory Prediction Framework

May 02, 2023
Dongyue Guo, Jianwei Zhang, Yi Lin

Figure 1 for SIA-FTP: A Spoken Instruction Aware Flight Trajectory Prediction Framework
Figure 2 for SIA-FTP: A Spoken Instruction Aware Flight Trajectory Prediction Framework
Figure 3 for SIA-FTP: A Spoken Instruction Aware Flight Trajectory Prediction Framework
Viaarxiv icon

LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition

Dec 05, 2022
Yuguang Yang, Yu Pan, Jingjing Yin, Heng Lu

Figure 1 for LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition
Figure 2 for LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition
Figure 3 for LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition
Figure 4 for LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition
Viaarxiv icon