Alert button
Picture for Daniel Povey

Daniel Povey

Alert button

Blank-regularized CTC for Frame Skipping in Neural Transducer

May 19, 2023
Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey

Figure 1 for Blank-regularized CTC for Frame Skipping in Neural Transducer
Figure 2 for Blank-regularized CTC for Frame Skipping in Neural Transducer
Figure 3 for Blank-regularized CTC for Frame Skipping in Neural Transducer
Figure 4 for Blank-regularized CTC for Frame Skipping in Neural Transducer
Viaarxiv icon

Delay-penalized CTC implemented based on Finite State Transducer

May 19, 2023
Zengwei Yao, Wei Kang, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Yifan Yang, Long Lin, Daniel Povey

Figure 1 for Delay-penalized CTC implemented based on Finite State Transducer
Figure 2 for Delay-penalized CTC implemented based on Finite State Transducer
Figure 3 for Delay-penalized CTC implemented based on Finite State Transducer
Figure 4 for Delay-penalized CTC implemented based on Finite State Transducer
Viaarxiv icon

GPU-accelerated Guided Source Separation for Meeting Transcription

Dec 10, 2022
Desh Raj, Daniel Povey, Sanjeev Khudanpur

Figure 1 for GPU-accelerated Guided Source Separation for Meeting Transcription
Figure 2 for GPU-accelerated Guided Source Separation for Meeting Transcription
Figure 3 for GPU-accelerated Guided Source Separation for Meeting Transcription
Figure 4 for GPU-accelerated Guided Source Separation for Meeting Transcription
Viaarxiv icon

Fast and parallel decoding for transducer

Oct 31, 2022
Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei Yao, Xiaoyu Yang, Piotr Żelasko, Daniel Povey

Figure 1 for Fast and parallel decoding for transducer
Figure 2 for Fast and parallel decoding for transducer
Figure 3 for Fast and parallel decoding for transducer
Figure 4 for Fast and parallel decoding for transducer
Viaarxiv icon

Delay-penalized transducer for low-latency streaming ASR

Oct 31, 2022
Wei Kang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long lin, Piotr Żelasko, Daniel Povey

Figure 1 for Delay-penalized transducer for low-latency streaming ASR
Figure 2 for Delay-penalized transducer for low-latency streaming ASR
Figure 3 for Delay-penalized transducer for low-latency streaming ASR
Figure 4 for Delay-penalized transducer for low-latency streaming ASR
Viaarxiv icon

Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation

Oct 31, 2022
Liyong Guo, Xiaoyu Yang, Quandong Wang, Yuxiang Kong, Zengwei Yao, Fan Cui, Fangjun Kuang, Wei Kang, Long Lin, Mingshuang Luo, Piotr Zelasko, Daniel Povey

Figure 1 for Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Figure 2 for Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Figure 3 for Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Figure 4 for Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Viaarxiv icon

Pruned RNN-T for fast, memory-efficient ASR training

Jun 23, 2022
Fangjun Kuang, Liyong Guo, Wei Kang, Long Lin, Mingshuang Luo, Zengwei Yao, Daniel Povey

Figure 1 for Pruned RNN-T for fast, memory-efficient ASR training
Figure 2 for Pruned RNN-T for fast, memory-efficient ASR training
Figure 3 for Pruned RNN-T for fast, memory-efficient ASR training
Figure 4 for Pruned RNN-T for fast, memory-efficient ASR training
Viaarxiv icon

Lhotse: a speech data representation library for the modern deep learning ecosystem

Oct 25, 2021
Piotr Żelasko, Daniel Povey, Jan "Yenda" Trmal, Sanjeev Khudanpur

Figure 1 for Lhotse: a speech data representation library for the modern deep learning ecosystem
Figure 2 for Lhotse: a speech data representation library for the modern deep learning ecosystem
Viaarxiv icon

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Jun 13, 2021
Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan

Figure 1 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 2 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 3 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 4 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Viaarxiv icon

speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment

Apr 03, 2021
Junbo Zhang, Zhiwen Zhang, Yongqing Wang, Zhiyong Yan, Qiong Song, Yukai Huang, Ke Li, Daniel Povey, Yujun Wang

Figure 1 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Figure 2 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Figure 3 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Figure 4 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Viaarxiv icon