Alert button

"speech": models, code, and papers
Alert button

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

Add code
Bookmark button
Alert button
Sep 15, 2023
Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang

Figure 1 for Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network
Figure 2 for Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network
Figure 3 for Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network
Figure 4 for Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network
Viaarxiv icon

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

Sep 15, 2023
Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li

Figure 1 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 2 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 3 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 4 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Viaarxiv icon

Segmentation-Free Streaming Machine Translation

Sep 26, 2023
Javier Iranzo-Sánchez, Jorge Iranzo-Sánchez, Adrià Giménez, Jorge Civera, Alfons Juan

Viaarxiv icon

Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study

Jul 13, 2023
Zeping Min, Jinbo Wang

Figure 1 for Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study
Figure 2 for Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study
Figure 3 for Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study
Figure 4 for Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study
Viaarxiv icon

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech

Add code
Bookmark button
Alert button
Jun 27, 2023
Yahuan Cong, Haoyu Zhang, Haopeng Lin, Shichao Liu, Chunfeng Wang, Yi Ren, Xiang Yin, Zejun Ma

Figure 1 for GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
Figure 2 for GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
Figure 3 for GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
Figure 4 for GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
Viaarxiv icon

Audio Contrastive based Fine-tuning

Sep 22, 2023
Yang Wang, Qibin Liang, Chenghao Xiao, Yizhi Li, Noura Al Moubayed, Chenghua Lin

Figure 1 for Audio Contrastive based Fine-tuning
Figure 2 for Audio Contrastive based Fine-tuning
Figure 3 for Audio Contrastive based Fine-tuning
Figure 4 for Audio Contrastive based Fine-tuning
Viaarxiv icon

Improving vision-inspired keyword spotting using dynamic module skipping in streaming conformer encoder

Aug 31, 2023
Alexandre Bittar, Paul Dixon, Mohammad Samragh, Kumari Nishu, Devang Naik

Figure 1 for Improving vision-inspired keyword spotting using dynamic module skipping in streaming conformer encoder
Figure 2 for Improving vision-inspired keyword spotting using dynamic module skipping in streaming conformer encoder
Viaarxiv icon

A small vocabulary database of ultrasound image sequences of vocal tract dynamics

Aug 26, 2023
Margareth Castillo, Felipe Rubio, Dagoberto Porras, Sonia H. Contreras-Ortiz, Alexander Sepúlveda

Viaarxiv icon

The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings

Add code
Bookmark button
Alert button
Aug 24, 2023
Taras Kucherenko, Rajmund Nagy, Youngwoo Yoon, Jieyeon Woo, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter

Figure 1 for The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings
Figure 2 for The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings
Figure 3 for The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings
Figure 4 for The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings
Viaarxiv icon

Uncovering Political Hate Speech During Indian Election Campaign: A New Low-Resource Dataset and Baselines

Add code
Bookmark button
Alert button
Jun 27, 2023
Farhan Ahmad Jafri, Mohammad Aman Siddiqui, Surendrabikram Thapa, Kritesh Rauniyar, Usman Naseem, Imran Razzak

Figure 1 for Uncovering Political Hate Speech During Indian Election Campaign: A New Low-Resource Dataset and Baselines
Figure 2 for Uncovering Political Hate Speech During Indian Election Campaign: A New Low-Resource Dataset and Baselines
Figure 3 for Uncovering Political Hate Speech During Indian Election Campaign: A New Low-Resource Dataset and Baselines
Figure 4 for Uncovering Political Hate Speech During Indian Election Campaign: A New Low-Resource Dataset and Baselines
Viaarxiv icon