Alert button

"speech": models, code, and papers
Alert button

Leveraging characteristics of the output probability distribution for identifying adversarial audio examples

Add code
Bookmark button
Alert button
May 26, 2023
Matías P. Pizarro B., Dorothea Kolossa, Asja Fischer

Figure 1 for Leveraging characteristics of the output probability distribution for identifying adversarial audio examples
Figure 2 for Leveraging characteristics of the output probability distribution for identifying adversarial audio examples
Figure 3 for Leveraging characteristics of the output probability distribution for identifying adversarial audio examples
Figure 4 for Leveraging characteristics of the output probability distribution for identifying adversarial audio examples
Viaarxiv icon

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

Nov 02, 2022
Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-yi Lee, David Harwath

Figure 1 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 2 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 3 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 4 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Viaarxiv icon

Controllable speech synthesis by learning discrete phoneme-level prosodic representations

Add code
Bookmark button
Alert button
Nov 29, 2022
Nikolaos Ellinas, Myrsini Christidou, Alexandra Vioni, June Sig Sung, Aimilios Chalamandaris, Pirros Tsiakoulis, Paris Mastorocostas

Figure 1 for Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Figure 2 for Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Figure 3 for Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Figure 4 for Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Viaarxiv icon

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

Dec 24, 2022
Rongzhi Gu, Shi-Xiong Zhang, Yuexian Zou, Dong Yu

Figure 1 for Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation
Figure 2 for Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation
Figure 3 for Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation
Figure 4 for Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation
Viaarxiv icon

Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model

Add code
Bookmark button
Alert button
May 24, 2023
Aoi Ito, Shota Horiguchi

Figure 1 for Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model
Figure 2 for Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model
Figure 3 for Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model
Figure 4 for Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model
Viaarxiv icon

How Hate Speech Varies by Target Identity: A Computational Analysis

Add code
Bookmark button
Alert button
Oct 19, 2022
Michael Miller Yoder, Lynnette Hui Xian Ng, David West Brown, Kathleen M. Carley

Figure 1 for How Hate Speech Varies by Target Identity: A Computational Analysis
Figure 2 for How Hate Speech Varies by Target Identity: A Computational Analysis
Figure 3 for How Hate Speech Varies by Target Identity: A Computational Analysis
Figure 4 for How Hate Speech Varies by Target Identity: A Computational Analysis
Viaarxiv icon

Cold Diffusion for Speech Enhancement

Nov 04, 2022
Hao Yen, François G. Germain, Gordon Wichern, Jonathan Le Roux

Figure 1 for Cold Diffusion for Speech Enhancement
Figure 2 for Cold Diffusion for Speech Enhancement
Viaarxiv icon

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

May 22, 2023
Luyao Cheng, Siqi Zheng, Zhang Qinglin, Hui Wang, Yafeng Chen, Qian Chen

Figure 1 for Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization
Figure 2 for Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization
Figure 3 for Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization
Figure 4 for Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization
Viaarxiv icon

Understanding temporally weakly supervised training: A case study for keyword spotting

May 30, 2023
Heinrich Dinkel, Weiji Zhuang, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang

Figure 1 for Understanding temporally weakly supervised training: A case study for keyword spotting
Figure 2 for Understanding temporally weakly supervised training: A case study for keyword spotting
Figure 3 for Understanding temporally weakly supervised training: A case study for keyword spotting
Figure 4 for Understanding temporally weakly supervised training: A case study for keyword spotting
Viaarxiv icon

Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition

Add code
Bookmark button
Alert button
Feb 28, 2023
Zhijie Shen, Wu Guo, Bin Gu

Figure 1 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Figure 2 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Figure 3 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Figure 4 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Viaarxiv icon