Alert button

"speech": models, code, and papers
Alert button

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

Sep 13, 2022
Chao Zhang, Bo Li, Tara Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-yiin Chang, Parisa Haghani

Figure 1 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Figure 2 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Figure 3 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Figure 4 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Viaarxiv icon

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Oct 09, 2021
Si-Ioi Ng, Tan Lee

Figure 1 for Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition
Figure 2 for Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition
Figure 3 for Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition
Figure 4 for Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition
Viaarxiv icon

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation

Add code
Bookmark button
Alert button
Mar 24, 2022
Xian Liu, Qianyi Wu, Hang Zhou, Yinghao Xu, Rui Qian, Xinyi Lin, Xiaowei Zhou, Wayne Wu, Bo Dai, Bolei Zhou

Figure 1 for Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
Figure 2 for Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
Figure 3 for Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
Figure 4 for Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
Viaarxiv icon

What can predictive speech coders learn from speaker recognizers?

Apr 05, 2022
Marcos Faundez-Zanuy

Figure 1 for What can predictive speech coders learn from speaker recognizers?
Figure 2 for What can predictive speech coders learn from speaker recognizers?
Figure 3 for What can predictive speech coders learn from speaker recognizers?
Figure 4 for What can predictive speech coders learn from speaker recognizers?
Viaarxiv icon

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

Add code
Bookmark button
Alert button
Jul 13, 2022
Rongjie Huang, Zhou Zhao, Huadai Liu, Jinglin Liu, Chenye Cui, Yi Ren

Figure 1 for ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Figure 2 for ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Figure 3 for ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Figure 4 for ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Viaarxiv icon

Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset

Mar 31, 2022
Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan

Figure 1 for Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset
Figure 2 for Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset
Figure 3 for Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset
Figure 4 for Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset
Viaarxiv icon

What can Speech and Language Tell us About the Working Alliance in Psychotherapy

Jun 27, 2022
Sebastian P. Bayerl, Gabriel Roccabruna, Shammur Absar Chowdhury, Tommaso Ciulli, Morena Danieli, Korbinian Riedhammer, Giuseppe Riccardi

Figure 1 for What can Speech and Language Tell us About the Working Alliance in Psychotherapy
Figure 2 for What can Speech and Language Tell us About the Working Alliance in Psychotherapy
Viaarxiv icon

End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue

Add code
Bookmark button
Alert button
Jun 24, 2022
Kentaro Mitsui, Tianyu Zhao, Kei Sawada, Yukiya Hono, Yoshihiko Nankaku, Keiichi Tokuda

Figure 1 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Figure 2 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Figure 3 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Figure 4 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Viaarxiv icon

Fixed-point quantization aware training for on-device keyword-spotting

Mar 04, 2023
Sashank Macha, Om Oza, Alex Escott, Francesco Caliva, Robbie Armitano, Santosh Kumar Cheekatmalla, Sree Hari Krishnan Parthasarathi, Yuzong Liu

Figure 1 for Fixed-point quantization aware training for on-device keyword-spotting
Figure 2 for Fixed-point quantization aware training for on-device keyword-spotting
Figure 3 for Fixed-point quantization aware training for on-device keyword-spotting
Figure 4 for Fixed-point quantization aware training for on-device keyword-spotting
Viaarxiv icon

MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment

Apr 04, 2022
Karl El Hajal, Milos Cernak, Pablo Mainar

Figure 1 for MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment
Figure 2 for MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment
Figure 3 for MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment
Figure 4 for MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment
Viaarxiv icon