Picture for Daniel Povey

Daniel Povey

GPU-accelerated Guided Source Separation for Meeting Transcription

Add code
Dec 10, 2022
Viaarxiv icon

Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation

Add code
Oct 31, 2022
Figure 1 for Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Figure 2 for Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Figure 3 for Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Figure 4 for Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Viaarxiv icon

Delay-penalized transducer for low-latency streaming ASR

Add code
Oct 31, 2022
Viaarxiv icon

Fast and parallel decoding for transducer

Add code
Oct 31, 2022
Viaarxiv icon

Pruned RNN-T for fast, memory-efficient ASR training

Add code
Jun 23, 2022
Figure 1 for Pruned RNN-T for fast, memory-efficient ASR training
Figure 2 for Pruned RNN-T for fast, memory-efficient ASR training
Figure 3 for Pruned RNN-T for fast, memory-efficient ASR training
Figure 4 for Pruned RNN-T for fast, memory-efficient ASR training
Viaarxiv icon

Lhotse: a speech data representation library for the modern deep learning ecosystem

Add code
Oct 25, 2021
Figure 1 for Lhotse: a speech data representation library for the modern deep learning ecosystem
Figure 2 for Lhotse: a speech data representation library for the modern deep learning ecosystem
Viaarxiv icon

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Add code
Jun 13, 2021
Figure 1 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 2 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 3 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 4 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Viaarxiv icon

speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment

Add code
Apr 03, 2021
Figure 1 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Figure 2 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Figure 3 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Figure 4 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Viaarxiv icon

An Asynchronous WFST-Based Decoder For Automatic Speech Recognition

Add code
Mar 16, 2021
Figure 1 for An Asynchronous WFST-Based Decoder For Automatic Speech Recognition
Figure 2 for An Asynchronous WFST-Based Decoder For Automatic Speech Recognition
Figure 3 for An Asynchronous WFST-Based Decoder For Automatic Speech Recognition
Figure 4 for An Asynchronous WFST-Based Decoder For Automatic Speech Recognition
Viaarxiv icon

A Parallelizable Lattice Rescoring Strategy with Neural Language Models

Add code
Mar 08, 2021
Figure 1 for A Parallelizable Lattice Rescoring Strategy with Neural Language Models
Figure 2 for A Parallelizable Lattice Rescoring Strategy with Neural Language Models
Figure 3 for A Parallelizable Lattice Rescoring Strategy with Neural Language Models
Figure 4 for A Parallelizable Lattice Rescoring Strategy with Neural Language Models
Viaarxiv icon