Alert button

"speech": models, code, and papers
Alert button

All-neural beamformer for continuous speech separation

Oct 13, 2021
Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez

Figure 1 for All-neural beamformer for continuous speech separation
Figure 2 for All-neural beamformer for continuous speech separation
Figure 3 for All-neural beamformer for continuous speech separation
Figure 4 for All-neural beamformer for continuous speech separation
Viaarxiv icon

Unsupervised Speech Recognition

May 24, 2021
Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Figure 1 for Unsupervised Speech Recognition
Figure 2 for Unsupervised Speech Recognition
Figure 3 for Unsupervised Speech Recognition
Figure 4 for Unsupervised Speech Recognition
Viaarxiv icon

Don't speak too fast: The impact of data bias on self-supervised speech models

Oct 15, 2021
Yen Meng, Yi-Hui Chou, Andy T. Liu, Hung-yi Lee

Figure 1 for Don't speak too fast: The impact of data bias on self-supervised speech models
Figure 2 for Don't speak too fast: The impact of data bias on self-supervised speech models
Figure 3 for Don't speak too fast: The impact of data bias on self-supervised speech models
Figure 4 for Don't speak too fast: The impact of data bias on self-supervised speech models
Viaarxiv icon

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Sep 12, 2021
Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng

Figure 1 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 2 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 3 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 4 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Viaarxiv icon

Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets

Dec 18, 2021
Zaki Mustafa Farooqi, Sreyan Ghosh, Rajiv Ratn Shah

Figure 1 for Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets
Figure 2 for Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets
Figure 3 for Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets
Figure 4 for Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets
Viaarxiv icon

PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Sep 30, 2021
Yi Ren, Jinglin Liu, Zhou Zhao

Figure 1 for PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Figure 2 for PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Figure 3 for PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Figure 4 for PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Viaarxiv icon

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Aug 02, 2022
Jun Xue, Cunhang Fan, Zhao Lv, Jianhua Tao, Jiangyan Yi, Chengshi Zheng, Zhengqi Wen, Minmin Yuan, Shegang Shao

Figure 1 for Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features
Figure 2 for Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features
Figure 3 for Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features
Figure 4 for Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features
Viaarxiv icon

Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data

Nov 15, 2021
Zhu Li, Yuqing Zhang, Mengxi Nie, Ming Yan, Mengnan He, Ruixiong Zhang, Caixia Gong

Figure 1 for Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data
Figure 2 for Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data
Figure 3 for Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data
Figure 4 for Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data
Viaarxiv icon

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

Apr 01, 2022
Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Figure 1 for Perceptual Contrast Stretching on Target Feature for Speech Enhancement
Figure 2 for Perceptual Contrast Stretching on Target Feature for Speech Enhancement
Figure 3 for Perceptual Contrast Stretching on Target Feature for Speech Enhancement
Figure 4 for Perceptual Contrast Stretching on Target Feature for Speech Enhancement
Viaarxiv icon

THUEE system description for NIST 2020 SRE CTS challenge

Oct 12, 2022
Yu Zheng, Jinghan Peng, Miao Zhao, Yufeng Ma, Min Liu, Xinyue Ma, Tianyu Liang, Tianlong Kong, Liang He, Minqiang Xu

Figure 1 for THUEE system description for NIST 2020 SRE CTS challenge
Figure 2 for THUEE system description for NIST 2020 SRE CTS challenge
Figure 3 for THUEE system description for NIST 2020 SRE CTS challenge
Figure 4 for THUEE system description for NIST 2020 SRE CTS challenge
Viaarxiv icon