Alert button

"speech recognition": models, code, and papers
Alert button

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Add code
Bookmark button
Alert button
Aug 22, 2023
Harunori Kawano, Sota Shimizu

Figure 1 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 2 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 3 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 4 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Viaarxiv icon

BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing

Add code
Bookmark button
Alert button
Sep 02, 2023
Chen Wang, Minpeng Liao, Zhongqiang Huang, Jinliang Lu, Junhong Wu, Yuchen Liu, Chengqing Zong, Jiajun Zhang

Figure 1 for BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Figure 2 for BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Figure 3 for BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Figure 4 for BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Viaarxiv icon

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Sep 22, 2023
Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

Figure 1 for Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Figure 2 for Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Figure 3 for Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Figure 4 for Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Viaarxiv icon

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

Sep 20, 2023
Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg

Figure 1 for Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Figure 2 for Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Figure 3 for Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Figure 4 for Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Viaarxiv icon

SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization

Add code
Bookmark button
Alert button
Jun 08, 2023
Changhun Kim, Joonhyung Park, Hajin Shim, Eunho Yang

Figure 1 for SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization
Figure 2 for SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization
Figure 3 for SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization
Figure 4 for SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization
Viaarxiv icon

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition

Add code
Bookmark button
Alert button
May 25, 2023
Wangyou Zhang, Yanmin Qian

Figure 1 for Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Figure 2 for Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Figure 3 for Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Figure 4 for Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Viaarxiv icon

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

Add code
Bookmark button
Alert button
May 18, 2023
Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe

Figure 1 for A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Figure 2 for A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Figure 3 for A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Figure 4 for A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Viaarxiv icon

A privacy-preserving method using secret key for convolutional neural network-based speech classification

Oct 06, 2023
Shoko Niwa, Sayaka Shiota, Hitoshi Kiya

Figure 1 for A privacy-preserving method using secret key for convolutional neural network-based speech classification
Figure 2 for A privacy-preserving method using secret key for convolutional neural network-based speech classification
Figure 3 for A privacy-preserving method using secret key for convolutional neural network-based speech classification
Figure 4 for A privacy-preserving method using secret key for convolutional neural network-based speech classification
Viaarxiv icon

Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

Add code
Bookmark button
Alert button
Mar 25, 2023
Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic

Figure 1 for Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Figure 2 for Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Figure 3 for Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Figure 4 for Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Viaarxiv icon

NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning

Jun 21, 2023
Kamer Ali Yuksel, Thiago Ferreira, Golara Javadi, Mohamed El-Badrashiny, Ahmet Gunduz

Figure 1 for NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
Figure 2 for NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
Viaarxiv icon