Picture for Shinnosuke Takamichi

Shinnosuke Takamichi

Real-Time Generation of Game Video Commentary with Multimodal LLMs: Pause-Aware Decoding Approaches

Add code
Mar 03, 2026
Viaarxiv icon

Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora

Add code
Jul 02, 2025
Figure 1 for Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
Figure 2 for Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
Figure 3 for Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
Figure 4 for Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
Viaarxiv icon

Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models

Add code
May 23, 2025
Viaarxiv icon

A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Add code
Nov 06, 2024
Figure 1 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 2 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 3 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 4 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Viaarxiv icon

A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Add code
Oct 30, 2024
Figure 1 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 2 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 3 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 4 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Viaarxiv icon

DNN-based ensemble singing voice synthesis with interactions between singers

Add code
Sep 16, 2024
Figure 1 for DNN-based ensemble singing voice synthesis with interactions between singers
Figure 2 for DNN-based ensemble singing voice synthesis with interactions between singers
Figure 3 for DNN-based ensemble singing voice synthesis with interactions between singers
Figure 4 for DNN-based ensemble singing voice synthesis with interactions between singers
Viaarxiv icon

Text-To-Speech Synthesis In The Wild

Add code
Sep 13, 2024
Viaarxiv icon

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec

Add code
Sep 09, 2024
Figure 1 for BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
Figure 2 for BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
Figure 3 for BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
Figure 4 for BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
Viaarxiv icon

SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis

Add code
Aug 13, 2024
Viaarxiv icon

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling

Add code
Jul 22, 2024
Figure 1 for J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
Figure 2 for J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
Figure 3 for J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
Figure 4 for J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
Viaarxiv icon