Alert button

"speech recognition": models, code, and papers
Alert button

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

Sep 28, 2023
Xiang Lyu, Yuhang Cao, Qing Wang, Jingjing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu

Figure 1 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 2 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 3 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 4 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Viaarxiv icon

Multi-Head State Space Model for Speech Recognition

May 21, 2023
Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales

Figure 1 for Multi-Head State Space Model for Speech Recognition
Figure 2 for Multi-Head State Space Model for Speech Recognition
Figure 3 for Multi-Head State Space Model for Speech Recognition
Figure 4 for Multi-Head State Space Model for Speech Recognition
Viaarxiv icon

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

Sep 15, 2023
Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney

Viaarxiv icon

HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

May 29, 2023
Florian Mai, Juan Zuluaga-Gomez, Titouan Parcollet, Petr Motlicek

Figure 1 for HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition
Figure 2 for HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition
Figure 3 for HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition
Figure 4 for HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition
Viaarxiv icon

HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model

Oct 06, 2023
Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe

Figure 1 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 2 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 3 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 4 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Viaarxiv icon

2-bit Conformer quantization for automatic speech recognition

May 26, 2023
Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He

Figure 1 for 2-bit Conformer quantization for automatic speech recognition
Figure 2 for 2-bit Conformer quantization for automatic speech recognition
Figure 3 for 2-bit Conformer quantization for automatic speech recognition
Figure 4 for 2-bit Conformer quantization for automatic speech recognition
Viaarxiv icon

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

Oct 01, 2023
Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

Figure 1 for Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Figure 2 for Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Viaarxiv icon

Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

Sep 22, 2023
Hugo Malard, Salah Zaiem, Robin Algayres

Figure 1 for Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences
Figure 2 for Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences
Figure 3 for Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences
Figure 4 for Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences
Viaarxiv icon

SlothSpeech: Denial-of-service Attack Against Speech Recognition Models

Jun 01, 2023
Mirazul Haque, Rutvij Shah, Simin Chen, Berrak Şişman, Cong Liu, Wei Yang

Figure 1 for SlothSpeech: Denial-of-service Attack Against Speech Recognition Models
Figure 2 for SlothSpeech: Denial-of-service Attack Against Speech Recognition Models
Figure 3 for SlothSpeech: Denial-of-service Attack Against Speech Recognition Models
Figure 4 for SlothSpeech: Denial-of-service Attack Against Speech Recognition Models
Viaarxiv icon

LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR

Sep 28, 2023
Guodong Ma, Wenxuan Wang, Yuke Li, Yuting Yang, Binbin Du, Haoran Fu

Figure 1 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 2 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 3 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 4 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Viaarxiv icon