Alert button

"speech": models, code, and papers
Alert button

Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment

Add code
Bookmark button
Alert button
May 19, 2023
Tianshu Yu, Haoyu Gao, Ting-En Lin, Min Yang, Yuchuan Wu, Wentao Ma, Chao Wang, Fei Huang, Yongbin Li

Figure 1 for Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
Figure 2 for Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
Figure 3 for Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
Figure 4 for Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
Viaarxiv icon

EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis

Jun 01, 2023
Haobin Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Figure 1 for EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
Figure 2 for EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
Figure 3 for EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
Figure 4 for EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
Viaarxiv icon

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

Add code
Bookmark button
Alert button
May 24, 2023
Ruiqi Li, Rongjie Huang, Lichao Zhang, Jinglin Liu, Zhou Zhao

Figure 1 for AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Figure 2 for AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Figure 3 for AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Figure 4 for AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Viaarxiv icon

Eve Said Yes: AirBone Authentication for Head-Wearable Smart Voice Assistant

Sep 26, 2023
Chenpei Huang, Hui Zhong, Jie Lian, Pavana Prakash, Dian Shi, Yuan Xu, Miao Pan

Viaarxiv icon

Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers

Jul 14, 2023
Syed Aun Muhammad Zaidi, Siddique Latif, Junaid Qadir

Figure 1 for Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers
Figure 2 for Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers
Figure 3 for Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers
Figure 4 for Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers
Viaarxiv icon

Performance Comparison of Pre-trained Models for Speech-to-Text in Turkish: Whisper-Small and Wav2Vec2-XLS-R-300M

Jul 06, 2023
Oyku Berfin Mercan, Sercan Cepni, Davut Emre Tasar, Sukru Ozan

Viaarxiv icon

LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework

Add code
Bookmark button
Alert button
Aug 30, 2023
Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang

Viaarxiv icon

INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition

May 25, 2023
Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo

Figure 1 for INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition
Figure 2 for INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition
Figure 3 for INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition
Figure 4 for INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition
Viaarxiv icon

HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

Add code
Bookmark button
Alert button
Sep 15, 2023
Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-Jin Yu

Figure 1 for HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Figure 2 for HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Figure 3 for HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Figure 4 for HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Viaarxiv icon

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Add code
Bookmark button
Alert button
May 19, 2023
Dima Rekesh, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Ankur Kumar, Boris Ginsburg

Figure 1 for Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Figure 2 for Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Figure 3 for Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Figure 4 for Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Viaarxiv icon