Alert button
Picture for Jeongsoo Choi

Jeongsoo Choi

Alert button

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

Add code
Bookmark button
Alert button
Jan 18, 2024
Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Se Jin Park, Yong Man Ro

Viaarxiv icon

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Add code
Bookmark button
Alert button
Dec 05, 2023
Jeongsoo Choi, Se Jin Park, Minsu Kim, Yong Man Ro

Viaarxiv icon

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

Add code
Bookmark button
Alert button
Sep 15, 2023
Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro

Figure 1 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 2 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 3 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 4 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Viaarxiv icon

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

Add code
Bookmark button
Alert button
Aug 18, 2023
Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Yong Man Ro

Viaarxiv icon

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

Add code
Bookmark button
Alert button
Aug 15, 2023
Jeongsoo Choi, Joanna Hong, Yong Man Ro

Figure 1 for DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
Figure 2 for DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
Figure 3 for DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
Figure 4 for DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
Viaarxiv icon

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

Add code
Bookmark button
Alert button
Aug 15, 2023
Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro

Figure 1 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Figure 2 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Figure 3 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Figure 4 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Viaarxiv icon

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

Add code
Bookmark button
Alert button
Aug 03, 2023
Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Ro

Figure 1 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Figure 2 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Figure 3 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Figure 4 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Viaarxiv icon

Reprogramming Audio-driven Talking Face Synthesis into Text-driven

Add code
Bookmark button
Alert button
Jun 28, 2023
Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro

Figure 1 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 2 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 3 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 4 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Viaarxiv icon

Intelligible Lip-to-Speech Synthesis with Speech Units

Add code
Bookmark button
Alert button
May 31, 2023
Jeongsoo Choi, Minsu Kim, Yong Man Ro

Figure 1 for Intelligible Lip-to-Speech Synthesis with Speech Units
Figure 2 for Intelligible Lip-to-Speech Synthesis with Speech Units
Figure 3 for Intelligible Lip-to-Speech Synthesis with Speech Units
Viaarxiv icon

Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation

Add code
Bookmark button
Alert button
May 31, 2023
Se Jin Park, Minsu Kim, Jeongsoo Choi, Yong Man Ro

Figure 1 for Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation
Figure 2 for Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation
Figure 3 for Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation
Figure 4 for Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation
Viaarxiv icon