Picture for Nam Soo Kim

Nam Soo Kim

FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning

Add code
Apr 22, 2025
Viaarxiv icon

Towards Maximum Likelihood Training for Transducer-based Streaming Speech Recognition

Add code
Nov 26, 2024
Figure 1 for Towards Maximum Likelihood Training for Transducer-based Streaming Speech Recognition
Figure 2 for Towards Maximum Likelihood Training for Transducer-based Streaming Speech Recognition
Viaarxiv icon

SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech

Add code
Oct 07, 2024
Viaarxiv icon

High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

Add code
Jun 25, 2024
Viaarxiv icon

MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance

Add code
Jun 10, 2024
Viaarxiv icon

HILCodec: High Fidelity and Lightweight Neural Audio Codec

Add code
May 08, 2024
Figure 1 for HILCodec: High Fidelity and Lightweight Neural Audio Codec
Figure 2 for HILCodec: High Fidelity and Lightweight Neural Audio Codec
Figure 3 for HILCodec: High Fidelity and Lightweight Neural Audio Codec
Figure 4 for HILCodec: High Fidelity and Lightweight Neural Audio Codec
Viaarxiv icon

Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

Add code
Jan 03, 2024
Viaarxiv icon

Efficient Parallel Audio Generation using Group Masked Language Modeling

Add code
Jan 02, 2024
Viaarxiv icon

EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

Add code
Dec 11, 2023
Viaarxiv icon

Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction

Add code
Nov 08, 2023
Viaarxiv icon