Alert button

"speech": models, code, and papers
Alert button

Nonlinear functional regression by functional deep neural network with kernel embedding

Jan 05, 2024
Zhongjie Shi, Jun Fan, Linhao Song, Ding-Xuan Zhou, Johan A. K. Suykens

Viaarxiv icon

A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognition

Nov 07, 2023
Andrei Barcovschi, Rishabh Jain, Peter Corcoran

Viaarxiv icon

Whisper in Focus: Enhancing Stuttered Speech Classification with Encoder Layer Optimization

Nov 09, 2023
Huma Ameer, Seemab Latif, Rabia Latif, Sana Mukhtar

Viaarxiv icon

Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech

Nov 08, 2023
Jingru Lin, Meng Ge, Wupeng Wang, Haizhou Li, Mengling Feng

Viaarxiv icon

Fine-tuning convergence model in Bengali speech recognition

Nov 07, 2023
Zhu Ruiying, Shen Meng

Viaarxiv icon

Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations

Jan 04, 2024
Yejin Jeon, Yunsu Kim, Gary Geunbae Lee

Viaarxiv icon

Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models

Dec 06, 2023
Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

Viaarxiv icon

FastInject: Injecting Unpaired Text Data into CTC-based ASR training

Dec 14, 2023
Keqi Deng, Philip C. Woodland

Viaarxiv icon

TSST: A Benchmark and Evaluation Models for Text Speech-Style Transfer

Add code
Bookmark button
Alert button
Nov 14, 2023
Huashan Sun, Yixiao Wu, Yinghao Li, Jiawei Li, Yizhe Yang, Yang Gao

Viaarxiv icon

FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer's Speech Detection

Add code
Bookmark button
Alert button
Nov 21, 2023
Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li

Viaarxiv icon