Picture for Jeongsoo Choi

Jeongsoo Choi

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing

Add code
May 27, 2025
Viaarxiv icon

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

Add code
May 26, 2025
Viaarxiv icon

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

Add code
Apr 29, 2025
Viaarxiv icon

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

Add code
Apr 03, 2025
Viaarxiv icon

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

Add code
Mar 14, 2025
Viaarxiv icon

Deep Understanding of Sign Language for Sign to Subtitle Alignment

Add code
Mar 05, 2025
Viaarxiv icon

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow

Add code
Nov 29, 2024
Viaarxiv icon

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Add code
Oct 27, 2024
Figure 1 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 2 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 3 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 4 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Viaarxiv icon

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding

Add code
Oct 17, 2024
Figure 1 for Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Figure 2 for Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Figure 3 for Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Figure 4 for Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Viaarxiv icon

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

Add code
Jan 18, 2024
Figure 1 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Figure 2 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Figure 3 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Figure 4 for Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
Viaarxiv icon