Picture for Ji-Hoon Kim

Ji-Hoon Kim

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

Add code
May 26, 2025
Viaarxiv icon

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

Add code
Apr 29, 2025
Viaarxiv icon

SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models

Add code
Apr 01, 2025
Viaarxiv icon

EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models

Add code
Jan 10, 2025
Viaarxiv icon

AdaptVC: High Quality Voice Conversion with Adaptive Learning

Add code
Jan 07, 2025
Figure 1 for AdaptVC: High Quality Voice Conversion with Adaptive Learning
Figure 2 for AdaptVC: High Quality Voice Conversion with Adaptive Learning
Figure 3 for AdaptVC: High Quality Voice Conversion with Adaptive Learning
Figure 4 for AdaptVC: High Quality Voice Conversion with Adaptive Learning
Viaarxiv icon

CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation

Add code
Dec 28, 2024
Figure 1 for CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Figure 2 for CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Figure 3 for CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Figure 4 for CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Viaarxiv icon

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow

Add code
Nov 29, 2024
Viaarxiv icon

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding

Add code
Oct 17, 2024
Figure 1 for Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Figure 2 for Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Figure 3 for Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Figure 4 for Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Viaarxiv icon

Text-To-Speech Synthesis In The Wild

Add code
Sep 13, 2024
Viaarxiv icon

VoxSim: A perceptual voice similarity dataset

Add code
Jul 26, 2024
Figure 1 for VoxSim: A perceptual voice similarity dataset
Figure 2 for VoxSim: A perceptual voice similarity dataset
Figure 3 for VoxSim: A perceptual voice similarity dataset
Figure 4 for VoxSim: A perceptual voice similarity dataset
Viaarxiv icon