Alert button

"speech": models, code, and papers
Alert button

Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition

Feb 04, 2024
Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai

Viaarxiv icon

Real-time Stereo Speech Enhancement with Spatial-Cue Preservation based on Dual-Path Structure

Feb 01, 2024
Masahito Togami, Jean-Marc Valin, Karim Helwani, Ritwik Giri, Umut Isik, Michael M. Goodwin

Viaarxiv icon

Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks

Feb 05, 2024
Adrián Vázquez-Romero, Ascensión Gallardo-Antolín

Viaarxiv icon

Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations

Feb 02, 2024
Panos Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis, Myrsini Christidou, Alexandra Vioni, Georgia Maniati, Junkwang Oh, Gunu Jho, Inchul Hwang, Pirros Tsiakoulis, Aimilios Chalamandaris

Viaarxiv icon

Scaling Up Adaptive Filter Optimizers

Mar 01, 2024
Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

Figure 1 for Scaling Up Adaptive Filter Optimizers
Figure 2 for Scaling Up Adaptive Filter Optimizers
Figure 3 for Scaling Up Adaptive Filter Optimizers
Viaarxiv icon

What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis

Jan 31, 2024
Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami, Yusuke Ijima

Viaarxiv icon

Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition

Jan 19, 2024
Yong Wang, Cheng Lu, Hailun Lian, Yan Zhao, Björn Schuller, Yuan Zong, Wenming Zheng

Viaarxiv icon

Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

Feb 03, 2024
Nay San, Georgios Paraskevopoulos, Aryaman Arora, Xiluo He, Prabhjot Kaur, Oliver Adams, Dan Jurafsky

Viaarxiv icon

Multi-Modal Emotion Recognition by Text, Speech and Video Using Pretrained Transformers

Feb 11, 2024
Minoo Shayaninasab, Bagher Babaali

Viaarxiv icon

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Feb 02, 2024
Yi Chang, Zhao Ren, Zixing Zhang, Xin Jing, Kun Qian, Xi Shao, Bin Hu, Tanja Schultz, Björn W. Schuller

Viaarxiv icon