Alert button

"speech": models, code, and papers
Alert button

Controllable Data Generation Via Iterative Data-Property Mutual Mappings

Add code
Bookmark button
Alert button
Oct 11, 2023
Bo Pan, Muran Qin, Shiyu Wang, Yifei Zhang, Liang Zhao

Figure 1 for Controllable Data Generation Via Iterative Data-Property Mutual Mappings
Figure 2 for Controllable Data Generation Via Iterative Data-Property Mutual Mappings
Figure 3 for Controllable Data Generation Via Iterative Data-Property Mutual Mappings
Figure 4 for Controllable Data Generation Via Iterative Data-Property Mutual Mappings
Viaarxiv icon

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Add code
Bookmark button
Alert button
Oct 11, 2023
Dongchao Yang, Jinchuan Tian, Xu Tan, Rongjie Huang, Songxiang Liu, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Xixin Wu, Zhou Zhao, Shinji Watanabe, Helen Meng

Figure 1 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 2 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 3 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 4 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Viaarxiv icon

CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting

Add code
Bookmark button
Alert button
Sep 18, 2023
Yuang Li, Yinglu Li, Min Zhang, Chang Su, Mengyao Piao, Xiaosong Qiao, Jiawei Yu, Miaomiao Ma, Yanqing Zhao, Hao Yang

Figure 1 for CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting
Figure 2 for CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting
Figure 3 for CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting
Figure 4 for CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting
Viaarxiv icon

The complementary roles of non-verbal cues for Robust Pronunciation Assessment

Add code
Bookmark button
Alert button
Sep 14, 2023
Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

Figure 1 for The complementary roles of non-verbal cues for Robust Pronunciation Assessment
Figure 2 for The complementary roles of non-verbal cues for Robust Pronunciation Assessment
Figure 3 for The complementary roles of non-verbal cues for Robust Pronunciation Assessment
Figure 4 for The complementary roles of non-verbal cues for Robust Pronunciation Assessment
Viaarxiv icon

Semantic enrichment towards efficient speech representations

Jul 03, 2023
Gaëlle Laperrière, Ha Nguyen, Sahar Ghannay, Bassam Jabaian, Yannick Estève

Figure 1 for Semantic enrichment towards efficient speech representations
Figure 2 for Semantic enrichment towards efficient speech representations
Figure 3 for Semantic enrichment towards efficient speech representations
Figure 4 for Semantic enrichment towards efficient speech representations
Viaarxiv icon

KIT's Multilingual Speech Translation System for IWSLT 2023

Add code
Bookmark button
Alert button
Jun 15, 2023
Danni Liu, Thai Binh Nguyen, Sai Koneru, Enes Yavuz Ugan, Ngoc-Quan Pham, Tuan-Nam Nguyen, Tu Anh Dinh, Carlos Mullov, Alexander Waibel, Jan Niehues

Figure 1 for KIT's Multilingual Speech Translation System for IWSLT 2023
Figure 2 for KIT's Multilingual Speech Translation System for IWSLT 2023
Figure 3 for KIT's Multilingual Speech Translation System for IWSLT 2023
Figure 4 for KIT's Multilingual Speech Translation System for IWSLT 2023
Viaarxiv icon

Leveraging Label Information for Multimodal Emotion Recognition

Add code
Bookmark button
Alert button
Sep 05, 2023
Peiying Wang, Sunlu Zeng, Junqing Chen, Lu Fan, Meng Chen, Youzheng Wu, Xiaodong He

Figure 1 for Leveraging Label Information for Multimodal Emotion Recognition
Figure 2 for Leveraging Label Information for Multimodal Emotion Recognition
Figure 3 for Leveraging Label Information for Multimodal Emotion Recognition
Figure 4 for Leveraging Label Information for Multimodal Emotion Recognition
Viaarxiv icon

Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features

Aug 17, 2023
Sudarsana Reddy Kadiri, Manila Kodali, Paavo Alku

Figure 1 for Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features
Figure 2 for Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features
Figure 3 for Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features
Figure 4 for Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features
Viaarxiv icon

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Add code
Bookmark button
Alert button
Sep 10, 2023
Harunori Kawano, Sota Shimizu

Figure 1 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 2 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 3 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 4 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Viaarxiv icon

Generative Speech Recognition Error Correction with Large Language Models

Add code
Bookmark button
Alert button
Sep 27, 2023
Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke

Figure 1 for Generative Speech Recognition Error Correction with Large Language Models
Figure 2 for Generative Speech Recognition Error Correction with Large Language Models
Figure 3 for Generative Speech Recognition Error Correction with Large Language Models
Figure 4 for Generative Speech Recognition Error Correction with Large Language Models
Viaarxiv icon