Alert button

"speech": models, code, and papers
Alert button

Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

Dec 06, 2023
Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, Jiatong Shi

Figure 1 for Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus
Figure 2 for Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus
Viaarxiv icon

Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification

Dec 12, 2023
Mohammed Maqsood Shaik, Dietrich Klakow, Badr M. Abdullah

Figure 1 for Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification
Figure 2 for Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification
Figure 3 for Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification
Viaarxiv icon

Resource-constrained stereo singing voice cancellation

Jan 22, 2024
Clara Borrelli, James Rae, Dogac Basaran, Matt McVicar, Mehrez Souden, Matthias Mauch

Viaarxiv icon

Keyword spotting -- Detecting commands in speech using deep learning

Dec 09, 2023
Sumedha Rai, Tong Li, Bella Lyu

Figure 1 for Keyword spotting -- Detecting commands in speech using deep learning
Figure 2 for Keyword spotting -- Detecting commands in speech using deep learning
Figure 3 for Keyword spotting -- Detecting commands in speech using deep learning
Figure 4 for Keyword spotting -- Detecting commands in speech using deep learning
Viaarxiv icon

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

Jan 06, 2024
Hongfei Xue, Yuhao Liang, Bingshen Mu, Shiliang Zhang, Mengzhe Chen, Qian Chen, Lei Xie

Viaarxiv icon

MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

Jan 17, 2024
Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl, Konstantin Böttinger

Viaarxiv icon

ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter

Jan 22, 2024
Yi-Chiao Wu, Dejan Marković, Steven Krenn, Israel D. Gebru, Alexander Richard

Viaarxiv icon

StreamVC: Real-Time Low-Latency Voice Conversion

Jan 05, 2024
Yang Yang, Yury Kartynnik, Yunpeng Li, Jiuqiang Tang, Xing Li, George Sung, Matthias Grundmann

Viaarxiv icon

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Jan 16, 2024
Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

Viaarxiv icon

Leveraged Mel spectrograms using Harmonic and Percussive Components in Speech Emotion Recognition

Dec 18, 2023
David Hason Rudd, Huan Huo, Guandong Xu

Viaarxiv icon