Alert button

"speech": models, code, and papers
Alert button

Segmentation-Free Streaming Machine Translation

Sep 26, 2023
Javier Iranzo-Sánchez, Jorge Iranzo-Sánchez, Adrià Giménez, Jorge Civera, Alfons Juan

Viaarxiv icon

Duplex Diffusion Models Improve Speech-to-Speech Translation

May 22, 2023
Xianchao Wu

Figure 1 for Duplex Diffusion Models Improve Speech-to-Speech Translation
Figure 2 for Duplex Diffusion Models Improve Speech-to-Speech Translation
Figure 3 for Duplex Diffusion Models Improve Speech-to-Speech Translation
Figure 4 for Duplex Diffusion Models Improve Speech-to-Speech Translation
Viaarxiv icon

Controllable Emphasis with zero data for text-to-speech

Jul 13, 2023
Arnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova

Figure 1 for Controllable Emphasis with zero data for text-to-speech
Figure 2 for Controllable Emphasis with zero data for text-to-speech
Figure 3 for Controllable Emphasis with zero data for text-to-speech
Figure 4 for Controllable Emphasis with zero data for text-to-speech
Viaarxiv icon

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

Sep 14, 2023
Leonardo Pepino, Pablo Riera, Luciana Ferrer

Viaarxiv icon

Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition

Jul 12, 2023
Wenxuan Wang, Guodong Ma, Yuke Li, Binbin Du

Figure 1 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Figure 2 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Figure 3 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Figure 4 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Viaarxiv icon

Semi-Autoregressive Streaming ASR With Label Context

Sep 19, 2023
Siddhant Arora, George Saon, Shinji Watanabe, Brian Kingsbury

Figure 1 for Semi-Autoregressive Streaming ASR With Label Context
Figure 2 for Semi-Autoregressive Streaming ASR With Label Context
Figure 3 for Semi-Autoregressive Streaming ASR With Label Context
Figure 4 for Semi-Autoregressive Streaming ASR With Label Context
Viaarxiv icon

Developing Speech Processing Pipelines for Police Accountability

Jun 09, 2023
Anjalie Field, Prateek Verma, Nay San, Jennifer L. Eberhardt, Dan Jurafsky

Figure 1 for Developing Speech Processing Pipelines for Police Accountability
Figure 2 for Developing Speech Processing Pipelines for Police Accountability
Figure 3 for Developing Speech Processing Pipelines for Police Accountability
Figure 4 for Developing Speech Processing Pipelines for Police Accountability
Viaarxiv icon

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

Sep 28, 2023
Xiang Lyu, Yuhang Cao, Qing Wang, Jingjing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu

Figure 1 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 2 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 3 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 4 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Viaarxiv icon

Continual Contrastive Spoken Language Understanding

Oct 04, 2023
Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj

Figure 1 for Continual Contrastive Spoken Language Understanding
Figure 2 for Continual Contrastive Spoken Language Understanding
Figure 3 for Continual Contrastive Spoken Language Understanding
Figure 4 for Continual Contrastive Spoken Language Understanding
Viaarxiv icon

Shaping the Epochal Individuality and Generality: The Temporal Dynamics of Uncertainty and Prediction Error in Musical Improvisation

Oct 04, 2023
Tatsuya Daikoku

Viaarxiv icon