Alert button

"speech": models, code, and papers
Alert button

A privacy-preserving method using secret key for convolutional neural network-based speech classification

Oct 06, 2023
Shoko Niwa, Sayaka Shiota, Hitoshi Kiya

Figure 1 for A privacy-preserving method using secret key for convolutional neural network-based speech classification
Figure 2 for A privacy-preserving method using secret key for convolutional neural network-based speech classification
Figure 3 for A privacy-preserving method using secret key for convolutional neural network-based speech classification
Figure 4 for A privacy-preserving method using secret key for convolutional neural network-based speech classification
Viaarxiv icon

Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation

Dec 06, 2023
Wonjun Lee, Gary Geunbae Lee, Yunsu Kim

Viaarxiv icon

Partial Rewriting for Multi-Stage ASR

Dec 08, 2023
Antoine Bruguier, David Qiu, Yanzhang He

Viaarxiv icon

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

Oct 11, 2023
Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Shuo Han, Yunyang Zeng, Ankit Shah, Bhiksha Raj

Figure 1 for Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms
Figure 2 for Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms
Figure 3 for Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms
Viaarxiv icon

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition

Add code
Bookmark button
Alert button
Oct 10, 2023
Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Rohit Kumar, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner

Figure 1 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Figure 2 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Figure 3 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Figure 4 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Viaarxiv icon

Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR

Add code
Bookmark button
Alert button
Nov 08, 2023
Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang

Figure 1 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Figure 2 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Figure 3 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Figure 4 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Viaarxiv icon

SER_AMPEL: A multi-source dataset for SER of Italian older adults

Nov 24, 2023
Alessandra Grossi, Francesca Gasparini

Viaarxiv icon

Unsupervised speech enhancement with diffusion-based generative models

Add code
Bookmark button
Alert button
Sep 19, 2023
Berné Nortier, Mostafa Sadeghi, Romain Serizel

Figure 1 for Unsupervised speech enhancement with diffusion-based generative models
Viaarxiv icon

Speech Wikimedia: A 77 Language Multilingual Speech Dataset

Aug 30, 2023
Rafael Mosquera Gómez, Julián Eusse, Juan Ciro, Daniel Galvez, Ryan Hileman, Kurt Bollacker, David Kanter

Figure 1 for Speech Wikimedia: A 77 Language Multilingual Speech Dataset
Figure 2 for Speech Wikimedia: A 77 Language Multilingual Speech Dataset
Figure 3 for Speech Wikimedia: A 77 Language Multilingual Speech Dataset
Figure 4 for Speech Wikimedia: A 77 Language Multilingual Speech Dataset
Viaarxiv icon

Guided Flows for Generative Modeling and Decision Making

Dec 07, 2023
Qinqing Zheng, Matt Le, Neta Shaul, Yaron Lipman, Aditya Grover, Ricky T. Q. Chen

Viaarxiv icon