Alert button

"speech": models, code, and papers
Alert button

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

Jan 19, 2023
Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman

Figure 1 for From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Figure 2 for From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Figure 3 for From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Figure 4 for From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Viaarxiv icon

DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting

May 23, 2023
Shubo Lv, Xiong Wang, Sining Sun, Long Ma, Lei Xie

Figure 1 for DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting
Figure 2 for DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting
Figure 3 for DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting
Figure 4 for DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting
Viaarxiv icon

Towards Relation Extraction From Speech

Add code
Bookmark button
Alert button
Oct 17, 2022
Tongtong Wu, Guitao Wang, Jinming Zhao, Zhaoran Liu, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari

Figure 1 for Towards Relation Extraction From Speech
Figure 2 for Towards Relation Extraction From Speech
Figure 3 for Towards Relation Extraction From Speech
Figure 4 for Towards Relation Extraction From Speech
Viaarxiv icon

Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion

May 09, 2023
Yanzhen Ren, Hongcheng Zhu, Liming Zhai, Zongkun Sun, Rubing Shen, Lina Wang

Figure 1 for Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
Figure 2 for Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
Figure 3 for Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
Figure 4 for Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
Viaarxiv icon

MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

Jun 16, 2023
Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

Figure 1 for MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion
Figure 2 for MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion
Figure 3 for MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion
Figure 4 for MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion
Viaarxiv icon

FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs

Add code
Bookmark button
Alert button
May 18, 2023
Won Jang, Dan Lim, Heayoung Park

Figure 1 for FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Figure 2 for FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Figure 3 for FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Viaarxiv icon

Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework

Jul 06, 2023
Eliya Segev, Maya Alroy, Ronen Katsir, Noam Wies, Ayana Shenhav, Yael Ben-Oren, David Zar, Oren Tadmor, Jacob Bitterman, Amnon Shashua, Tal Rosenwein

Figure 1 for Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Figure 2 for Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Figure 3 for Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Figure 4 for Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Viaarxiv icon

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

Add code
Bookmark button
Alert button
Nov 29, 2022
Xiaohuan Zhou, Jiaming Wang, Zeyu Cui, Shiliang Zhang, Zhijie Yan, Jingren Zhou, Chang Zhou

Figure 1 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Figure 2 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Figure 3 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Figure 4 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Viaarxiv icon

Simple and Effective Unsupervised Speech Translation

Add code
Bookmark button
Alert button
Oct 18, 2022
Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino

Figure 1 for Simple and Effective Unsupervised Speech Translation
Figure 2 for Simple and Effective Unsupervised Speech Translation
Figure 3 for Simple and Effective Unsupervised Speech Translation
Figure 4 for Simple and Effective Unsupervised Speech Translation
Viaarxiv icon

Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition

Add code
Bookmark button
Alert button
Feb 02, 2023
Minglun Han, Qingyu Wang, Tielin Zhang, Yi Wang, Duzhen Zhang, Bo Xu

Figure 1 for Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition
Figure 2 for Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition
Figure 3 for Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition
Figure 4 for Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition
Viaarxiv icon