Alert button

"speech": models, code, and papers
Alert button

CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis

Dec 16, 2023
Yayue Deng, Jinlong Xue, Yukang Jia, Qifei Li, Yichen Han, Fengping Wang, Yingming Gao, Dengfeng Ke, Ya Li

Viaarxiv icon

StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis

Dec 19, 2023
Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng

Figure 1 for StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis
Figure 2 for StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis
Figure 3 for StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis
Figure 4 for StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis
Viaarxiv icon

Generative Context-aware Fine-tuning of Self-supervised Speech Models

Dec 15, 2023
Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu

Viaarxiv icon

Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Dec 16, 2023
Mingbin Xu, Alex Jin, Sicheng Wang, Mu Su, Tim Ng, Henry Mason, Shiyi Han, Yaqiao Deng, Zhen Huang, Mahesh Krishnamoorthy

Viaarxiv icon

An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge

Jan 08, 2024
Runduo Han, Xiaopeng Yan, Weiming Xu, Pengcheng Guo, Jiayao Sun, He Wang, Quan Lu, Ning Jiang, Lei Xie

Viaarxiv icon

LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization

Jan 26, 2024
Qianhui Liu, Jiaqi Yan, Malu Zhang, Gang Pan, Haizhou Li

Viaarxiv icon

On the Parameter Estimation of Sinusoidal Models for Speech and Audio Signals

Jan 02, 2024
George P. Kafentzis

Viaarxiv icon

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models

Dec 13, 2023
Shaojin Ding, Qiu David, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Shivani Agrawal, Zhonglin Han, Jian Li, Amir Yazdanbakhsh

Figure 1 for USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Figure 2 for USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Figure 3 for USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Figure 4 for USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Viaarxiv icon

Study of cognitive component of auditory attention to natural speech events

Dec 19, 2023
Nhan D. T. Nguyen, Kaare Mikkelsen, Preben Kidmose

Viaarxiv icon

Toward A Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency

Dec 12, 2023
Pavlos Constas, Vikram Rawal, Matthew Honorio Oliveira, Andreas Constas, Aditya Khan, Kaison Cheung, Najma Sultani, Carrie Chen, Micol Altomare, Michael Akzam, Jiacheng Chen, Vhea He, Lauren Altomare, Heraa Murqi, Asad Khan, Nimit Amikumar Bhanshali, Youssef Rachad, Michael Guerzhoy

Viaarxiv icon