Alert button

"speech": models, code, and papers
Alert button

LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR

Add code
Bookmark button
Alert button
Oct 07, 2023
Guodong Ma, Wenxuan Wang, Yuke Li, Yuting Yang, Binbin Du, Haoran Fu

Figure 1 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 2 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 3 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 4 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Viaarxiv icon

Enhancing Code-switching Speech Recognition with Interactive Language Biases

Add code
Bookmark button
Alert button
Sep 29, 2023
Hexin Liu, Leibny Paola Garcia, Xiangyu Zhang, Andy W. H. Khong, Sanjeev Khudanpur

Figure 1 for Enhancing Code-switching Speech Recognition with Interactive Language Biases
Figure 2 for Enhancing Code-switching Speech Recognition with Interactive Language Biases
Figure 3 for Enhancing Code-switching Speech Recognition with Interactive Language Biases
Figure 4 for Enhancing Code-switching Speech Recognition with Interactive Language Biases
Viaarxiv icon

OpenVoice: Versatile Instant Voice Cloning

Add code
Bookmark button
Alert button
Dec 03, 2023
Zengyi Qin, Wenliang Zhao, Xumin Yu, Xin Sun

Viaarxiv icon

DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models

Add code
Bookmark button
Alert button
Sep 30, 2023
Zhiyao Sun, Tian Lv, Sheng Ye, Matthieu Gaetan Lin, Jenny Sheng, Yu-Hui Wen, Minjing Yu, Yong-jin Liu

Figure 1 for DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
Figure 2 for DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
Figure 3 for DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
Figure 4 for DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
Viaarxiv icon

Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

Oct 23, 2023
Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke Wu

Figure 1 for Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models
Figure 2 for Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models
Figure 3 for Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models
Figure 4 for Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models
Viaarxiv icon

DiariST: Streaming Speech Translation with Speaker Diarization

Add code
Bookmark button
Alert button
Sep 14, 2023
Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka

Viaarxiv icon

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

Add code
Bookmark button
Alert button
Sep 14, 2023
Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen

Figure 1 for Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Figure 2 for Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Figure 3 for Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Figure 4 for Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Viaarxiv icon

Long-Form End-to-End Speech Translation via Latent Alignment Segmentation

Sep 20, 2023
Peter Polák, Ondřej Bojar

Viaarxiv icon

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

Nov 05, 2023
Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel

Viaarxiv icon

Fast Word Error Rate Estimation Using Self-Supervised Representations For Speech And Text

Oct 12, 2023
Chanho Park, Chengsong Lu, Mingjie Chen, Thomas Hain

Viaarxiv icon