Alert button

"speech": models, code, and papers
Alert button

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

Add code
Bookmark button
Alert button
Jul 03, 2023
Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee

Figure 1 for ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Figure 2 for ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Figure 3 for ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Figure 4 for ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Viaarxiv icon

Sparse Fine-tuning for Inference Acceleration of Large Language Models

Add code
Bookmark button
Alert button
Oct 13, 2023
Eldar Kurtic, Denis Kuznedelev, Elias Frantar, Michael Goin, Dan Alistarh

Viaarxiv icon

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

Add code
Bookmark button
Alert button
Jul 20, 2023
Weidong Chen, Xiaofen Xing, Peihao Chen, Xiangmin Xu

Figure 1 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Figure 2 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Figure 3 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Figure 4 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Viaarxiv icon

Developing Speech Processing Pipelines for Police Accountability

Jun 09, 2023
Anjalie Field, Prateek Verma, Nay San, Jennifer L. Eberhardt, Dan Jurafsky

Figure 1 for Developing Speech Processing Pipelines for Police Accountability
Figure 2 for Developing Speech Processing Pipelines for Police Accountability
Figure 3 for Developing Speech Processing Pipelines for Police Accountability
Figure 4 for Developing Speech Processing Pipelines for Police Accountability
Viaarxiv icon

Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition

Jul 14, 2023
Wenxuan Wang, Guodong Ma, Yuke Li, Binbin Du

Figure 1 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Figure 2 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Figure 3 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Figure 4 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Viaarxiv icon

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Jul 31, 2023
Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba

Figure 1 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 2 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 3 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 4 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Viaarxiv icon

Leveraging Visemes for Better Visual Speech Representation and Lip Reading

Jul 19, 2023
Javad Peymanfard, Vahid Saeedi, Mohammad Reza Mohammadi, Hossein Zeinali, Nasser Mozayani

Figure 1 for Leveraging Visemes for Better Visual Speech Representation and Lip Reading
Figure 2 for Leveraging Visemes for Better Visual Speech Representation and Lip Reading
Figure 3 for Leveraging Visemes for Better Visual Speech Representation and Lip Reading
Viaarxiv icon

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Oct 05, 2023
Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet

Figure 1 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 2 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 3 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 4 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Viaarxiv icon

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation

Add code
Bookmark button
Alert button
Jun 01, 2023
Kun Song, Yi Ren, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie, Xiang Yin, Zejun Ma

Figure 1 for StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Figure 2 for StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Figure 3 for StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Figure 4 for StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Viaarxiv icon

The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems

Add code
Bookmark button
Alert button
Jul 28, 2023
Andreas Liesenfeld, Alianda Lopez, Mark Dingemanse

Figure 1 for The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems
Figure 2 for The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems
Figure 3 for The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems
Figure 4 for The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems
Viaarxiv icon