Alert button

"speech": models, code, and papers
Alert button

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Nov 17, 2021
Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei

Figure 1 for Separating Long-Form Speech with Group-Wise Permutation Invariant Training
Figure 2 for Separating Long-Form Speech with Group-Wise Permutation Invariant Training
Figure 3 for Separating Long-Form Speech with Group-Wise Permutation Invariant Training
Figure 4 for Separating Long-Form Speech with Group-Wise Permutation Invariant Training
Viaarxiv icon

Arabic Text-To-Speech (TTS) Data Preparation

Apr 07, 2022
Hala Al Masri, Muhy Eddin Za'ter

Figure 1 for Arabic Text-To-Speech (TTS) Data Preparation
Viaarxiv icon

LIP: Lightweight Intelligent Preprocessor for meaningful text-to-speech

Jul 11, 2022
Harshvardhan Anand, Nansi Begam, Richa Verma, Sourav Ghosh, Harichandana B. S. S, Sumit Kumar

Figure 1 for LIP: Lightweight Intelligent Preprocessor for meaningful text-to-speech
Figure 2 for LIP: Lightweight Intelligent Preprocessor for meaningful text-to-speech
Viaarxiv icon

Korean Tokenization for Beam Search Rescoring in Speech Recognition

Feb 22, 2022
Kyuhong Shim, Hyewon Bae, Wonyong Sung

Figure 1 for Korean Tokenization for Beam Search Rescoring in Speech Recognition
Figure 2 for Korean Tokenization for Beam Search Rescoring in Speech Recognition
Figure 3 for Korean Tokenization for Beam Search Rescoring in Speech Recognition
Figure 4 for Korean Tokenization for Beam Search Rescoring in Speech Recognition
Viaarxiv icon

FullStop:Punctuation and Segmentation Prediction for Dutch with Transformers

Jan 09, 2023
Vincent Vandeghinste, Oliver Guhr

Figure 1 for FullStop:Punctuation and Segmentation Prediction for Dutch with Transformers
Figure 2 for FullStop:Punctuation and Segmentation Prediction for Dutch with Transformers
Figure 3 for FullStop:Punctuation and Segmentation Prediction for Dutch with Transformers
Figure 4 for FullStop:Punctuation and Segmentation Prediction for Dutch with Transformers
Viaarxiv icon

A subjective study of the perceptual acceptability of audio-video desynchronization in sports videos

Dec 03, 2022
Joshua Peter Ebenezer

Figure 1 for A subjective study of the perceptual acceptability of audio-video desynchronization in sports videos
Figure 2 for A subjective study of the perceptual acceptability of audio-video desynchronization in sports videos
Figure 3 for A subjective study of the perceptual acceptability of audio-video desynchronization in sports videos
Figure 4 for A subjective study of the perceptual acceptability of audio-video desynchronization in sports videos
Viaarxiv icon

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS

Oct 20, 2022
Chunyu Qiang, Jianhua Tao, Ruibo Fu, Zhengqi Wen, Jiangyan Yi, Tao Wang, Shiming Wang

Viaarxiv icon

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

Mar 28, 2022
Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng

Figure 1 for Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
Figure 2 for Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
Figure 3 for Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
Figure 4 for Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
Viaarxiv icon

Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation

Mar 26, 2022
Kohei Saijo, Tetsuji Ogawa

Figure 1 for Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation
Figure 2 for Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation
Figure 3 for Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation
Viaarxiv icon

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

Feb 16, 2022
Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba

Figure 1 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 2 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 3 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 4 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Viaarxiv icon