Picture for Minchan Kim

Minchan Kim

Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models

Add code
Mar 26, 2024
Viaarxiv icon

Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

Add code
Jan 03, 2024
Viaarxiv icon

Efficient Parallel Audio Generation using Group Masked Language Modeling

Add code
Jan 02, 2024
Figure 1 for Efficient Parallel Audio Generation using Group Masked Language Modeling
Figure 2 for Efficient Parallel Audio Generation using Group Masked Language Modeling
Figure 3 for Efficient Parallel Audio Generation using Group Masked Language Modeling
Figure 4 for Efficient Parallel Audio Generation using Group Masked Language Modeling
Viaarxiv icon

Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction

Add code
Nov 08, 2023
Viaarxiv icon

Pre- and post-contact policy decomposition for non-prehensile manipulation with zero-shot sim-to-real transfer

Add code
Sep 06, 2023
Viaarxiv icon

EM-Network: Oracle Guided Self-distillation for Sequence Learning

Add code
Jun 14, 2023
Figure 1 for EM-Network: Oracle Guided Self-distillation for Sequence Learning
Figure 2 for EM-Network: Oracle Guided Self-distillation for Sequence Learning
Figure 3 for EM-Network: Oracle Guided Self-distillation for Sequence Learning
Figure 4 for EM-Network: Oracle Guided Self-distillation for Sequence Learning
Viaarxiv icon

Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech

Add code
Oct 12, 2022
Figure 1 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Figure 2 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Figure 3 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Figure 4 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Viaarxiv icon

Fully Unsupervised Training of Few-shot Keyword Spotting

Add code
Oct 07, 2022
Figure 1 for Fully Unsupervised Training of Few-shot Keyword Spotting
Figure 2 for Fully Unsupervised Training of Few-shot Keyword Spotting
Figure 3 for Fully Unsupervised Training of Few-shot Keyword Spotting
Viaarxiv icon

Disentangled Speaker Representation Learning via Mutual Information Minimization

Add code
Aug 17, 2022
Figure 1 for Disentangled Speaker Representation Learning via Mutual Information Minimization
Figure 2 for Disentangled Speaker Representation Learning via Mutual Information Minimization
Figure 3 for Disentangled Speaker Representation Learning via Mutual Information Minimization
Figure 4 for Disentangled Speaker Representation Learning via Mutual Information Minimization
Viaarxiv icon

Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus

Add code
Mar 29, 2022
Figure 1 for Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Figure 2 for Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Figure 3 for Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Viaarxiv icon