Alert button

"speech recognition": models, code, and papers
Alert button

SLM: Bridge the thin gap between speech and text foundation models

Sep 30, 2023
Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Yongqiang Wang, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul Rubenstein, Lukas Zilka, Dian Yu, Zhong Meng, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu

Figure 1 for SLM: Bridge the thin gap between speech and text foundation models
Figure 2 for SLM: Bridge the thin gap between speech and text foundation models
Figure 3 for SLM: Bridge the thin gap between speech and text foundation models
Figure 4 for SLM: Bridge the thin gap between speech and text foundation models
Viaarxiv icon

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

May 25, 2023
Lingwei Meng, Jiawen Kang, Mingyu Cui, Haibin Wu, Xixin Wu, Helen Meng

Figure 1 for Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Figure 2 for Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Figure 3 for Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Figure 4 for Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Viaarxiv icon

Federated Self-Learning with Weak Supervision for Speech Recognition

Jun 21, 2023
Milind Rao, Gopinath Chennupati, Gautam Tiwari, Anit Kumar Sahu, Anirudh Raju, Ariya Rastrow, Jasha Droppo

Figure 1 for Federated Self-Learning with Weak Supervision for Speech Recognition
Figure 2 for Federated Self-Learning with Weak Supervision for Speech Recognition
Figure 3 for Federated Self-Learning with Weak Supervision for Speech Recognition
Figure 4 for Federated Self-Learning with Weak Supervision for Speech Recognition
Viaarxiv icon

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

Sep 29, 2023
Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Pedro Moreno Mengibar

Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Oct 02, 2023
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

Jun 28, 2023
Yuang Li, Yu Wu, Jinyu Li, Shujie Liu

Figure 1 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Figure 2 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Figure 3 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Figure 4 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Viaarxiv icon

Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference

Oct 01, 2023
Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe

Viaarxiv icon

An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples

Oct 05, 2023
Armin Ettenhofer, Jan-Philipp Schulze, Karla Pizzi

Viaarxiv icon

Segmentation-Free Streaming Machine Translation

Sep 26, 2023
Javier Iranzo-Sánchez, Jorge Iranzo-Sánchez, Adrià Giménez, Jorge Civera, Alfons Juan

Viaarxiv icon

Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

Oct 09, 2023
Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen

Figure 1 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 2 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 3 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 4 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Viaarxiv icon