Alert button
Picture for Yu Zhang

Yu Zhang

Alert button

Modular Hybrid Autoregressive Transducer

Add code
Bookmark button
Alert button
Oct 31, 2022
Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno

Figure 1 for Modular Hybrid Autoregressive Transducer
Figure 2 for Modular Hybrid Autoregressive Transducer
Figure 3 for Modular Hybrid Autoregressive Transducer
Figure 4 for Modular Hybrid Autoregressive Transducer
Viaarxiv icon

Accelerating RNN-T Training and Inference Using CTC guidance

Add code
Bookmark button
Alert button
Oct 29, 2022
Yongqiang Wang, Zhehuai Chen, Chengjian Zheng, Yu Zhang, Wei Han, Parisa Haghani

Figure 1 for Accelerating RNN-T Training and Inference Using CTC guidance
Figure 2 for Accelerating RNN-T Training and Inference Using CTC guidance
Figure 3 for Accelerating RNN-T Training and Inference Using CTC guidance
Figure 4 for Accelerating RNN-T Training and Inference Using CTC guidance
Viaarxiv icon

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

Add code
Bookmark button
Alert button
Oct 28, 2022
Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

Figure 1 for Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Figure 2 for Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Viaarxiv icon

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation

Add code
Bookmark button
Alert button
Oct 28, 2022
Nobuyuki Morioka, Heiga Zen, Nanxin Chen, Yu Zhang, Yifan Ding

Figure 1 for Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Figure 2 for Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Figure 3 for Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Figure 4 for Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Viaarxiv icon

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

Add code
Bookmark button
Alert button
Oct 27, 2022
Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran

Figure 1 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Figure 2 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Figure 3 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Figure 4 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Viaarxiv icon

Personalized Dialogue Generation with Persona-Adaptive Attention

Add code
Bookmark button
Alert button
Oct 27, 2022
Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang, Lilian Tang

Figure 1 for Personalized Dialogue Generation with Persona-Adaptive Attention
Figure 2 for Personalized Dialogue Generation with Persona-Adaptive Attention
Figure 3 for Personalized Dialogue Generation with Persona-Adaptive Attention
Figure 4 for Personalized Dialogue Generation with Persona-Adaptive Attention
Viaarxiv icon

Improving generalizability of distilled self-supervised speech processing models under distorted settings

Add code
Bookmark button
Alert button
Oct 20, 2022
Kuan-Po Huang, Yu-Kuan Fu, Tsu-Yuan Hsu, Fabian Ritter Gutierrez, Fan-Lin Wang, Liang-Hsuan Tseng, Yu Zhang, Hung-yi Lee

Figure 1 for Improving generalizability of distilled self-supervised speech processing models under distorted settings
Figure 2 for Improving generalizability of distilled self-supervised speech processing models under distorted settings
Figure 3 for Improving generalizability of distilled self-supervised speech processing models under distorted settings
Figure 4 for Improving generalizability of distilled self-supervised speech processing models under distorted settings
Viaarxiv icon

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

Add code
Bookmark button
Alert button
Oct 18, 2022
Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro Moreno, Nanxin Chen

Figure 1 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 2 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 3 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 4 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Viaarxiv icon

JOIST: A Joint Speech and Text Streaming Model For ASR

Add code
Bookmark button
Alert button
Oct 13, 2022
Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman

Figure 1 for JOIST: A Joint Speech and Text Streaming Model For ASR
Figure 2 for JOIST: A Joint Speech and Text Streaming Model For ASR
Figure 3 for JOIST: A Joint Speech and Text Streaming Model For ASR
Figure 4 for JOIST: A Joint Speech and Text Streaming Model For ASR
Viaarxiv icon

Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR

Add code
Bookmark button
Alert button
Oct 11, 2022
Dongseong Hwang, Khe Chai Sim, Yu Zhang, Trevor Strohman

Figure 1 for Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
Figure 2 for Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
Figure 3 for Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
Figure 4 for Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
Viaarxiv icon