Alert button
Picture for Xinjian Li

Xinjian Li

Alert button

Bernie

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Bookmark button
Alert button
Oct 02, 2023
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

Add code
Bookmark button
Alert button
Feb 05, 2023
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari

Figure 1 for Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Figure 2 for Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Figure 3 for Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Figure 4 for Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Viaarxiv icon

Textless Direct Speech-to-Speech Translation with Discrete Speech Representation

Add code
Bookmark button
Alert button
Oct 31, 2022
Xinjian Li, Ye Jia, Chung-Cheng Chiu

Figure 1 for Textless Direct Speech-to-Speech Translation with Discrete Speech Representation
Figure 2 for Textless Direct Speech-to-Speech Translation with Discrete Speech Representation
Figure 3 for Textless Direct Speech-to-Speech Translation with Discrete Speech Representation
Figure 4 for Textless Direct Speech-to-Speech Translation with Discrete Speech Representation
Viaarxiv icon

ASR2K: Speech Recognition for Around 2000 Languages without Audio

Add code
Bookmark button
Alert button
Sep 06, 2022
Xinjian Li, Florian Metze, David R Mortensen, Alan W Black, Shinji Watanabe

Figure 1 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Figure 2 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Figure 3 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Figure 4 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Viaarxiv icon

On Adversarial Robustness of Large-scale Audio Visual Learning

Add code
Bookmark button
Alert button
Mar 23, 2022
Juncheng B Li, Shuhui Qu, Xinjian Li, Po-Yao, Huang, Florian Metze

Figure 1 for On Adversarial Robustness of Large-scale Audio Visual Learning
Figure 2 for On Adversarial Robustness of Large-scale Audio Visual Learning
Figure 3 for On Adversarial Robustness of Large-scale Audio Visual Learning
Figure 4 for On Adversarial Robustness of Large-scale Audio Visual Learning
Viaarxiv icon

Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of Tasks with Multi-dimensional Relations

Add code
Bookmark button
Alert button
Oct 26, 2021
Junning Liu, Zijie Xia, Yu Lei, Xinjian Li, Xu Wang

Figure 1 for Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of Tasks with Multi-dimensional Relations
Figure 2 for Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of Tasks with Multi-dimensional Relations
Figure 3 for Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of Tasks with Multi-dimensional Relations
Figure 4 for Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of Tasks with Multi-dimensional Relations
Viaarxiv icon

On Prosody Modeling for ASR+TTS based Voice Conversion

Add code
Bookmark button
Alert button
Jul 20, 2021
Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, Tomoki Toda

Figure 1 for On Prosody Modeling for ASR+TTS based Voice Conversion
Figure 2 for On Prosody Modeling for ASR+TTS based Voice Conversion
Figure 3 for On Prosody Modeling for ASR+TTS based Voice Conversion
Figure 4 for On Prosody Modeling for ASR+TTS based Voice Conversion
Viaarxiv icon

Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties

Add code
Bookmark button
Alert button
Apr 04, 2021
Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, David Mortensen, Michael R. Marlo, Graham Neubig

Figure 1 for Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties
Figure 2 for Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties
Figure 3 for Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties
Figure 4 for Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties
Viaarxiv icon

Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments

Add code
Bookmark button
Alert button
Apr 02, 2021
David R. Mortensen, Jordan Picone, Xinjian Li, Kathleen Siminyu

Figure 1 for Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments
Figure 2 for Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments
Figure 3 for Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments
Figure 4 for Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments
Viaarxiv icon