Picture for Shinnosuke Takamichi

Shinnosuke Takamichi

Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation

Add code
Jun 11, 2026
Viaarxiv icon

Do speech foundation models perceive speaker similarity as humans do?

Add code
Jun 04, 2026
Viaarxiv icon

Real-Time Generation of Game Video Commentary with Multimodal LLMs: Pause-Aware Decoding Approaches

Add code
Mar 03, 2026
Viaarxiv icon

Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora

Add code
Jul 02, 2025
Figure 1 for Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
Figure 2 for Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
Figure 3 for Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
Figure 4 for Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
Viaarxiv icon

Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models

Add code
May 23, 2025
Viaarxiv icon

A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Add code
Nov 06, 2024
Figure 1 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 2 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 3 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 4 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Viaarxiv icon

A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Add code
Oct 30, 2024
Figure 1 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 2 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 3 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Figure 4 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization
Viaarxiv icon

DNN-based ensemble singing voice synthesis with interactions between singers

Add code
Sep 16, 2024
Figure 1 for DNN-based ensemble singing voice synthesis with interactions between singers
Figure 2 for DNN-based ensemble singing voice synthesis with interactions between singers
Figure 3 for DNN-based ensemble singing voice synthesis with interactions between singers
Figure 4 for DNN-based ensemble singing voice synthesis with interactions between singers
Viaarxiv icon

Text-To-Speech Synthesis In The Wild

Add code
Sep 13, 2024
Viaarxiv icon

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec

Add code
Sep 09, 2024
Figure 1 for BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
Figure 2 for BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
Figure 3 for BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
Figure 4 for BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
Viaarxiv icon