Alert button

"speech": models, code, and papers
Alert button

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers

Sep 25, 2023
Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

Figure 1 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Figure 2 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Figure 3 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Figure 4 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Viaarxiv icon

In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms

Sep 05, 2023
Philipp Schilk, Niccolò Polvani, Andrea Ronco, Milos Cernak, Michele Magno

Figure 1 for In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms
Figure 2 for In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms
Figure 3 for In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms
Figure 4 for In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms
Viaarxiv icon

Enhancing Speech-to-Speech Translation with Multiple TTS Targets

Apr 10, 2023
Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe

Figure 1 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Figure 2 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Figure 3 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Figure 4 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Viaarxiv icon

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

Add code
Bookmark button
Alert button
Jun 18, 2023
Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yuping Wang

Figure 1 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Figure 2 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Figure 3 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Figure 4 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Viaarxiv icon

Text-to-Speech Pipeline for Swiss German -- A comparison

May 31, 2023
Tobias Bollinger, Jan Deriu, Manfred Vogel

Figure 1 for Text-to-Speech Pipeline for Swiss German -- A comparison
Figure 2 for Text-to-Speech Pipeline for Swiss German -- A comparison
Figure 3 for Text-to-Speech Pipeline for Swiss German -- A comparison
Figure 4 for Text-to-Speech Pipeline for Swiss German -- A comparison
Viaarxiv icon

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

Sep 20, 2023
Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg

Figure 1 for Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Figure 2 for Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Figure 3 for Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Figure 4 for Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Viaarxiv icon

NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

Sep 22, 2023
Naohiro Tawara, Marc Delcroix, Atsushi Ando, Atsunori Ogawa

Figure 1 for NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization
Figure 2 for NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization
Figure 3 for NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization
Figure 4 for NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization
Viaarxiv icon

Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test

May 22, 2023
Eungbeom Kim, Yunkee Chae, Jaeheon Sim, Kyogu Lee

Figure 1 for Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test
Figure 2 for Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test
Figure 3 for Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test
Viaarxiv icon

Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism

Jul 31, 2023
Rimita Lahiri, Tiantian Feng, Rajat Hebbar, Catherine Lord, So Hyun Kim, Shrikanth Narayanan

Figure 1 for Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism
Figure 2 for Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism
Figure 3 for Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism
Figure 4 for Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism
Viaarxiv icon

Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23

Add code
Bookmark button
Alert button
Jun 02, 2023
Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

Figure 1 for Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23
Figure 2 for Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23
Figure 3 for Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23
Figure 4 for Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23
Viaarxiv icon