Alert button

"speech": models, code, and papers
Alert button

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

Nov 10, 2021
Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf

Figure 1 for Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Figure 2 for Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Figure 3 for Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Figure 4 for Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Viaarxiv icon

Lisan: Yemeni, Iraqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations

Dec 17, 2022
Mustafa Jarrar, Fadi A Zaraket, Tymaa Hammouda, Daanish Masood Alavi, Martin Waahlisch

Figure 1 for Lisan: Yemeni, Iraqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations
Figure 2 for Lisan: Yemeni, Iraqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations
Figure 3 for Lisan: Yemeni, Iraqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations
Figure 4 for Lisan: Yemeni, Iraqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations
Viaarxiv icon

Efficient Use of Large Pre-Trained Models for Low Resource ASR

Add code
Bookmark button
Alert button
Oct 26, 2022
Peter Vieting, Christoph Lüscher, Julian Dierkes, Ralf Schlüter, Hermann Ney

Figure 1 for Efficient Use of Large Pre-Trained Models for Low Resource ASR
Figure 2 for Efficient Use of Large Pre-Trained Models for Low Resource ASR
Figure 3 for Efficient Use of Large Pre-Trained Models for Low Resource ASR
Figure 4 for Efficient Use of Large Pre-Trained Models for Low Resource ASR
Viaarxiv icon

Speaker Diarization Based on Multi-channel Microphone Array in Small-scale Meeting

Oct 26, 2022
Yuxuan Du, Ruohua Zhou

Figure 1 for Speaker Diarization Based on Multi-channel Microphone Array in Small-scale Meeting
Figure 2 for Speaker Diarization Based on Multi-channel Microphone Array in Small-scale Meeting
Figure 3 for Speaker Diarization Based on Multi-channel Microphone Array in Small-scale Meeting
Figure 4 for Speaker Diarization Based on Multi-channel Microphone Array in Small-scale Meeting
Viaarxiv icon

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

Jun 02, 2022
Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer

Figure 1 for Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Figure 2 for Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Figure 3 for Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Figure 4 for Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Viaarxiv icon

FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection

Add code
Bookmark button
Alert button
Oct 18, 2021
Zhenyu Zhang, Yewei Gu, Xiaowei Yi, Xianfeng Zhao

Figure 1 for FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection
Figure 2 for FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection
Figure 3 for FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection
Figure 4 for FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection
Viaarxiv icon

Controllable Multichannel Speech Dereverberation based on Deep Neural Networks

Add code
Bookmark button
Alert button
Oct 16, 2021
Ziteng Wang, Yueyue Na, Biao Tian, Qiang Fu

Figure 1 for Controllable Multichannel Speech Dereverberation based on Deep Neural Networks
Figure 2 for Controllable Multichannel Speech Dereverberation based on Deep Neural Networks
Figure 3 for Controllable Multichannel Speech Dereverberation based on Deep Neural Networks
Figure 4 for Controllable Multichannel Speech Dereverberation based on Deep Neural Networks
Viaarxiv icon

Fast-Slow Transformer for Visually Grounding Speech

Add code
Bookmark button
Alert button
Sep 16, 2021
Puyuan Peng, David Harwath

Figure 1 for Fast-Slow Transformer for Visually Grounding Speech
Figure 2 for Fast-Slow Transformer for Visually Grounding Speech
Figure 3 for Fast-Slow Transformer for Visually Grounding Speech
Figure 4 for Fast-Slow Transformer for Visually Grounding Speech
Viaarxiv icon

Are word boundaries useful for unsupervised language learning?

Add code
Bookmark button
Alert button
Oct 06, 2022
Tu Anh Nguyen, Maureen de Seyssel, Robin Algayres, Patricia Roze, Ewan Dunbar, Emmanuel Dupoux

Figure 1 for Are word boundaries useful for unsupervised language learning?
Figure 2 for Are word boundaries useful for unsupervised language learning?
Figure 3 for Are word boundaries useful for unsupervised language learning?
Figure 4 for Are word boundaries useful for unsupervised language learning?
Viaarxiv icon

VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion

Add code
Bookmark button
Alert button
Feb 18, 2022
Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng

Figure 1 for VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
Figure 2 for VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
Figure 3 for VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
Figure 4 for VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
Viaarxiv icon