Alert button
Picture for Detai Xin

Detai Xin

Alert button

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Add code
Bookmark button
Alert button
Apr 06, 2024
Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, Jinyu Li, Sheng Zhao

Viaarxiv icon

Building speech corpus with diverse voice characteristics for its prompt-based representation

Add code
Bookmark button
Alert button
Mar 20, 2024
Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari

Figure 1 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Figure 2 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Figure 3 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Figure 4 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Viaarxiv icon

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Add code
Bookmark button
Alert button
Mar 05, 2024
Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Figure 1 for NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Figure 2 for NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Figure 3 for NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Figure 4 for NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Viaarxiv icon

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions

Add code
Bookmark button
Alert button
Oct 09, 2023
Detai Xin, Junfeng Jiang, Shinnosuke Takamichi, Yuki Saito, Akiko Aizawa, Hiroshi Saruwatari

Figure 1 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Figure 2 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Figure 3 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Figure 4 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Viaarxiv icon

Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

Add code
Bookmark button
Alert button
Sep 24, 2023
Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari

Figure 1 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Figure 2 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Figure 3 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Figure 4 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Viaarxiv icon

How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics

Add code
Bookmark button
Alert button
Jun 01, 2023
Joonyong Park, Shinnosuke Takamichi, Tomohiko Nakamura, Kentaro Seki, Detai Xin, Hiroshi Saruwatari

Figure 1 for How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Figure 2 for How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Figure 3 for How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Figure 4 for How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Viaarxiv icon

Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus

Add code
Bookmark button
Alert button
May 26, 2023
Detai Xin, Shinnosuke Takamichi, Ai Morimatsu, Hiroshi Saruwatari

Figure 1 for Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus
Figure 2 for Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus
Figure 3 for Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus
Figure 4 for Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus
Viaarxiv icon

JNV Corpus: A Corpus of Japanese Nonverbal Vocalizations with Diverse Phrases and Emotions

Add code
Bookmark button
Alert button
May 21, 2023
Detai Xin, Shinnosuke Takamichi, Hiroshi Saruwatari

Figure 1 for JNV Corpus: A Corpus of Japanese Nonverbal Vocalizations with Diverse Phrases and Emotions
Figure 2 for JNV Corpus: A Corpus of Japanese Nonverbal Vocalizations with Diverse Phrases and Emotions
Figure 3 for JNV Corpus: A Corpus of Japanese Nonverbal Vocalizations with Diverse Phrases and Emotions
Figure 4 for JNV Corpus: A Corpus of Japanese Nonverbal Vocalizations with Diverse Phrases and Emotions
Viaarxiv icon

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Add code
Bookmark button
Alert button
Feb 27, 2023
Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari

Figure 1 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Figure 2 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Figure 3 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Figure 4 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Viaarxiv icon