Alert button
Picture for Takaaki Saeki

Takaaki Saeki

Alert button

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

Feb 29, 2024
Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov

Viaarxiv icon

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Jan 30, 2024
Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, Hiroshi Saruwatari

Viaarxiv icon

Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

Sep 15, 2023
Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari

Figure 1 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Figure 2 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Figure 3 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Figure 4 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Viaarxiv icon

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Feb 27, 2023
Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari

Figure 1 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Figure 2 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Figure 3 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Figure 4 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Viaarxiv icon

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

Feb 05, 2023
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari

Figure 1 for Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Figure 2 for Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Figure 3 for Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Figure 4 for Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Viaarxiv icon

SpeechLMScore: Evaluating speech generation using speech language model

Dec 08, 2022
Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe

Figure 1 for SpeechLMScore: Evaluating speech generation using speech language model
Figure 2 for SpeechLMScore: Evaluating speech generation using speech language model
Figure 3 for SpeechLMScore: Evaluating speech generation using speech language model
Figure 4 for SpeechLMScore: Evaluating speech generation using speech language model
Viaarxiv icon

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

Oct 27, 2022
Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran

Figure 1 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Figure 2 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Figure 3 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Figure 4 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Viaarxiv icon

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

Oct 26, 2022
Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari

Viaarxiv icon

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses

Oct 18, 2022
Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari

Figure 1 for Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses
Figure 2 for Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses
Figure 3 for Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses
Figure 4 for Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses
Viaarxiv icon