Alert button

"speech": models, code, and papers
Alert button

Generating Holistic 3D Human Motion from Speech

Dec 08, 2022
Hongwei Yi, Hualin Liang, Yifei Liu, Qiong Cao, Yandong Wen, Timo Bolkart, Dacheng Tao, Michael J. Black

Figure 1 for Generating Holistic 3D Human Motion from Speech
Figure 2 for Generating Holistic 3D Human Motion from Speech
Figure 3 for Generating Holistic 3D Human Motion from Speech
Figure 4 for Generating Holistic 3D Human Motion from Speech
Viaarxiv icon

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

Add code
Bookmark button
Alert button
Nov 02, 2022
Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Lin

Figure 1 for Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Figure 2 for Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Figure 3 for Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Figure 4 for Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Viaarxiv icon

Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective

Nov 05, 2022
Hannaneh B. Pasandi, Haniyeh B. Pasandi

Figure 1 for Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
Figure 2 for Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
Figure 3 for Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
Figure 4 for Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
Viaarxiv icon

End-to-End Speech Recognition: A Survey

Add code
Bookmark button
Alert button
Mar 03, 2023
Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe

Figure 1 for End-to-End Speech Recognition: A Survey
Figure 2 for End-to-End Speech Recognition: A Survey
Figure 3 for End-to-End Speech Recognition: A Survey
Figure 4 for End-to-End Speech Recognition: A Survey
Viaarxiv icon

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion

Add code
Bookmark button
Alert button
May 16, 2023
Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng

Figure 1 for Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion
Figure 2 for Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion
Figure 3 for Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion
Figure 4 for Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion
Viaarxiv icon

Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions

Feb 09, 2023
Nay San, Martijn Bartelds, Blaine Billings, Ella de Falco, Hendi Feriza, Johan Safri, Wawan Sahrozi, Ben Foley, Bradley McDonnell, Dan Jurafsky

Figure 1 for Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions
Viaarxiv icon

Lexical Retrieval Hypothesis in Multimodal Context

May 28, 2023
Po-Ya Angela Wang, Pin-Er Chen, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh

Figure 1 for Lexical Retrieval Hypothesis in Multimodal Context
Figure 2 for Lexical Retrieval Hypothesis in Multimodal Context
Figure 3 for Lexical Retrieval Hypothesis in Multimodal Context
Figure 4 for Lexical Retrieval Hypothesis in Multimodal Context
Viaarxiv icon

AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations

Feb 10, 2023
Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli

Figure 1 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 2 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 3 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 4 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Viaarxiv icon

Timestamped Embedding-Matching Acoustic-to-Word CTC ASR

Jun 20, 2023
Woojay Jeon

Figure 1 for Timestamped Embedding-Matching Acoustic-to-Word CTC ASR
Figure 2 for Timestamped Embedding-Matching Acoustic-to-Word CTC ASR
Figure 3 for Timestamped Embedding-Matching Acoustic-to-Word CTC ASR
Figure 4 for Timestamped Embedding-Matching Acoustic-to-Word CTC ASR
Viaarxiv icon

Temporal Convolution Network Based Onset Detection and Query by Humming System Design

May 09, 2023
Yu Cheng Hung, Jian-Jiun Ding

Figure 1 for Temporal Convolution Network Based Onset Detection and Query by Humming System Design
Figure 2 for Temporal Convolution Network Based Onset Detection and Query by Humming System Design
Figure 3 for Temporal Convolution Network Based Onset Detection and Query by Humming System Design
Figure 4 for Temporal Convolution Network Based Onset Detection and Query by Humming System Design
Viaarxiv icon