Picture for Junkun Chen

Junkun Chen

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Add code
Jun 12, 2024
Figure 1 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Figure 2 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Figure 3 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Viaarxiv icon

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Add code
Oct 23, 2023
Figure 1 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 2 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 3 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Viaarxiv icon

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

Add code
Oct 06, 2023
Viaarxiv icon

DiariST: Streaming Speech Translation with Speaker Diarization

Add code
Sep 14, 2023
Viaarxiv icon

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Add code
Jul 07, 2023
Figure 1 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 2 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 3 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 4 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Viaarxiv icon

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Add code
Nov 07, 2022
Figure 1 for ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Figure 2 for ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Figure 3 for ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Figure 4 for ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Viaarxiv icon

Is Self-Supervised Learning More Robust Than Supervised Learning?

Add code
Jun 10, 2022
Figure 1 for Is Self-Supervised Learning More Robust Than Supervised Learning?
Figure 2 for Is Self-Supervised Learning More Robust Than Supervised Learning?
Figure 3 for Is Self-Supervised Learning More Robust Than Supervised Learning?
Figure 4 for Is Self-Supervised Learning More Robust Than Supervised Learning?
Viaarxiv icon

PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

Add code
May 20, 2022
Figure 1 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Figure 2 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Figure 3 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Figure 4 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Viaarxiv icon

Data-Driven Adaptive Simultaneous Machine Translation

Add code
Apr 27, 2022
Figure 1 for Data-Driven Adaptive Simultaneous Machine Translation
Figure 2 for Data-Driven Adaptive Simultaneous Machine Translation
Figure 3 for Data-Driven Adaptive Simultaneous Machine Translation
Figure 4 for Data-Driven Adaptive Simultaneous Machine Translation
Viaarxiv icon

A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Add code
Mar 18, 2022
Figure 1 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Figure 2 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Figure 3 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Figure 4 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Viaarxiv icon