Picture for Yoshiki Masuyama

Yoshiki Masuyama

Exploring Disentangled Neural Speech Codecs from Self-Supervised Representations

Add code
Aug 11, 2025
Viaarxiv icon

FasTUSS: Faster Task-Aware Unified Source Separation

Add code
Jul 15, 2025
Viaarxiv icon

Physics-Informed Direction-Aware Neural Acoustic Fields

Add code
Jul 09, 2025
Viaarxiv icon

Factorized RVQ-GAN For Disentangled Speech Tokenization

Add code
Jun 18, 2025
Viaarxiv icon

Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses

Add code
May 19, 2025
Viaarxiv icon

Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training

Add code
Apr 19, 2025
Viaarxiv icon

ESPnet-SpeechLM: An Open Speech Language Model Toolkit

Add code
Feb 21, 2025
Viaarxiv icon

Mel-Spectrogram Inversion via Alternating Direction Method of Multipliers

Add code
Jan 09, 2025
Viaarxiv icon

Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition

Add code
Nov 11, 2024
Figure 1 for Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
Figure 2 for Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
Figure 3 for Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
Figure 4 for Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
Viaarxiv icon

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

Add code
Sep 24, 2024
Viaarxiv icon