Picture for Chao Weng

Chao Weng

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Add code
Dec 30, 2021
Figure 1 for Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
Figure 2 for Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
Figure 3 for Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
Figure 4 for Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
Viaarxiv icon

Detect what you want: Target Sound Detection

Add code
Dec 19, 2021
Figure 1 for Detect what you want: Target Sound Detection
Figure 2 for Detect what you want: Target Sound Detection
Figure 3 for Detect what you want: Target Sound Detection
Figure 4 for Detect what you want: Target Sound Detection
Viaarxiv icon

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Add code
Nov 29, 2021
Figure 1 for Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization
Figure 2 for Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization
Figure 3 for Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization
Figure 4 for Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization
Viaarxiv icon

Simple Attention Module based Speaker Verification with Iterative noisy label detection

Add code
Oct 13, 2021
Figure 1 for Simple Attention Module based Speaker Verification with Iterative noisy label detection
Figure 2 for Simple Attention Module based Speaker Verification with Iterative noisy label detection
Figure 3 for Simple Attention Module based Speaker Verification with Iterative noisy label detection
Figure 4 for Simple Attention Module based Speaker Verification with Iterative noisy label detection
Viaarxiv icon

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Add code
Jun 13, 2021
Figure 1 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 2 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 3 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 4 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Viaarxiv icon

Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis

Add code
Jun 11, 2021
Figure 1 for Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis
Figure 2 for Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis
Figure 3 for Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis
Figure 4 for Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis
Viaarxiv icon

Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition

Add code
Jun 08, 2021
Figure 1 for Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition
Figure 2 for Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition
Figure 3 for Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition
Figure 4 for Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition
Viaarxiv icon

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation

Add code
Mar 31, 2021
Figure 1 for TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation
Figure 2 for TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation
Figure 3 for TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation
Figure 4 for TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation
Viaarxiv icon

Towards Robust Speaker Verification with Target Speaker Enhancement

Add code
Mar 16, 2021
Figure 1 for Towards Robust Speaker Verification with Target Speaker Enhancement
Figure 2 for Towards Robust Speaker Verification with Target Speaker Enhancement
Figure 3 for Towards Robust Speaker Verification with Target Speaker Enhancement
Figure 4 for Towards Robust Speaker Verification with Target Speaker Enhancement
Viaarxiv icon

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Add code
Feb 16, 2021
Figure 1 for Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Figure 2 for Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Figure 3 for Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Figure 4 for Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Viaarxiv icon