Alert button

"speech": models, code, and papers
Alert button

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis

Aug 02, 2023
Ramanan Sivaguru, Vasista Sai Lodagala, S Umesh

Figure 1 for SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis
Figure 2 for SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis
Viaarxiv icon

Target Speech Extraction with Conditional Diffusion Model

Aug 08, 2023
Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatan

Figure 1 for Target Speech Extraction with Conditional Diffusion Model
Figure 2 for Target Speech Extraction with Conditional Diffusion Model
Figure 3 for Target Speech Extraction with Conditional Diffusion Model
Figure 4 for Target Speech Extraction with Conditional Diffusion Model
Viaarxiv icon

LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

Oct 17, 2023
Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li

Viaarxiv icon

Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

Aug 28, 2023
Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang

Figure 1 for Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Figure 2 for Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Figure 3 for Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Figure 4 for Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Viaarxiv icon

Convoifilter: A case study of doing cocktail party speech recognition

Aug 22, 2023
Thai-Binh Nguyen, Alexander Waibel

Figure 1 for Convoifilter: A case study of doing cocktail party speech recognition
Viaarxiv icon

Federated Learning with Differential Privacy for End-to-End Speech Recognition

Sep 29, 2023
Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Tatiana Likhomanenko

Viaarxiv icon

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Nov 01, 2023
Sanchit Gandhi, Patrick von Platen, Alexander M. Rush

Viaarxiv icon

MM-VID: Advancing Video Understanding with GPT-4V(ision)

Oct 30, 2023
Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang

Figure 1 for MM-VID: Advancing Video Understanding with GPT-4V(ision)
Figure 2 for MM-VID: Advancing Video Understanding with GPT-4V(ision)
Figure 3 for MM-VID: Advancing Video Understanding with GPT-4V(ision)
Figure 4 for MM-VID: Advancing Video Understanding with GPT-4V(ision)
Viaarxiv icon

Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey

Sep 26, 2023
Yuchen Liu, Apu Kapadia, Donald Williamson

Viaarxiv icon

Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations

Nov 02, 2023
Hanglei Zhang, Yiwei Guo, Sen Liu, Xie Chen, Kai Yu

Viaarxiv icon