Alert button

"speech": models, code, and papers
Alert button

Disentangling Prosody Representations with Unsupervised Speech Reconstruction

Add code
Bookmark button
Alert button
Dec 14, 2022
Leyuan Qu, Taihao Li, Cornelius Weber, Theresa Pekarek-Rosin, Fuji Ren, Stefan Wermter

Figure 1 for Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Figure 2 for Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Figure 3 for Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Figure 4 for Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Viaarxiv icon

LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

Add code
Bookmark button
Alert button
Mar 23, 2023
Teerapat Jenrungrot, Michael Chinen, W. Bastiaan Kleijn, Jan Skoglund, Zalán Borsos, Neil Zeghidour, Marco Tagliasacchi

Figure 1 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Figure 2 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Figure 3 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Figure 4 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Viaarxiv icon

Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses

Jun 06, 2023
Lucía Gómez-Zaragozá, Simone Wills, Cristian Tejedor-Garcia, Javier Marín-Morales, Mariano Alcañiz, Helmer Strik

Figure 1 for Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses
Figure 2 for Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses
Viaarxiv icon

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Add code
Bookmark button
Alert button
Jul 17, 2023
Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang

Figure 1 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 2 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 3 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 4 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Viaarxiv icon

MAC: A unified framework boosting low resource automatic speech recognition

Add code
Bookmark button
Alert button
Feb 15, 2023
Zeping Min, Qian Ge, Zhong Li, Weinan E

Figure 1 for MAC: A unified framework boosting low resource automatic speech recognition
Figure 2 for MAC: A unified framework boosting low resource automatic speech recognition
Figure 3 for MAC: A unified framework boosting low resource automatic speech recognition
Figure 4 for MAC: A unified framework boosting low resource automatic speech recognition
Viaarxiv icon

On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches

Nov 16, 2022
Guilherme Schu, Parvaneh Janbakhshi, Ina Kodrasi

Figure 1 for On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches
Figure 2 for On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches
Figure 3 for On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches
Figure 4 for On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches
Viaarxiv icon

Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models

Add code
Bookmark button
Alert button
Jun 16, 2023
Victor Steinborn, Antonis Maronikolakis, Hinrich Schütze

Figure 1 for Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models
Figure 2 for Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models
Figure 3 for Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models
Figure 4 for Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models
Viaarxiv icon

Svarah: Evaluating English ASR Systems on Indian Accents

May 25, 2023
Tahir Javed, Sakshi Joshi, Vignesh Nagarajan, Sai Sundaresan, Janki Nawale, Abhigyan Raman, Kaushal Bhogale, Pratyush Kumar, Mitesh M. Khapra

Figure 1 for Svarah: Evaluating English ASR Systems on Indian Accents
Figure 2 for Svarah: Evaluating English ASR Systems on Indian Accents
Figure 3 for Svarah: Evaluating English ASR Systems on Indian Accents
Figure 4 for Svarah: Evaluating English ASR Systems on Indian Accents
Viaarxiv icon

Cellular Network Speech Enhancement: Removing Background and Transmission Noise

Add code
Bookmark button
Alert button
Jan 22, 2023
Amanda Shu, Hamza Khalid, Haohui Liu, Shikhar Agnihotri, Joseph Konan, Ojas Bhargave

Figure 1 for Cellular Network Speech Enhancement: Removing Background and Transmission Noise
Figure 2 for Cellular Network Speech Enhancement: Removing Background and Transmission Noise
Figure 3 for Cellular Network Speech Enhancement: Removing Background and Transmission Noise
Figure 4 for Cellular Network Speech Enhancement: Removing Background and Transmission Noise
Viaarxiv icon

On granularity of prosodic representations in expressive text-to-speech

Jan 26, 2023
Mikolaj Babianski, Kamil Pokora, Raahil Shah, Rafal Sienkiewicz, Daniel Korzekwa, Viacheslav Klimkov

Figure 1 for On granularity of prosodic representations in expressive text-to-speech
Figure 2 for On granularity of prosodic representations in expressive text-to-speech
Figure 3 for On granularity of prosodic representations in expressive text-to-speech
Figure 4 for On granularity of prosodic representations in expressive text-to-speech
Viaarxiv icon