Picture for Jeongsoo Choi

Jeongsoo Choi

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

Add code
Jan 18, 2024
Viaarxiv icon

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Add code
Dec 05, 2023
Viaarxiv icon

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

Add code
Sep 15, 2023
Figure 1 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 2 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 3 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 4 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Viaarxiv icon

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

Add code
Aug 18, 2023
Viaarxiv icon

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

Add code
Aug 15, 2023
Figure 1 for DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
Figure 2 for DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
Figure 3 for DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
Figure 4 for DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
Viaarxiv icon

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

Add code
Aug 15, 2023
Figure 1 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Figure 2 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Figure 3 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Figure 4 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Viaarxiv icon

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

Add code
Aug 03, 2023
Figure 1 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Figure 2 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Figure 3 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Figure 4 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Viaarxiv icon

Reprogramming Audio-driven Talking Face Synthesis into Text-driven

Add code
Jun 28, 2023
Figure 1 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 2 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 3 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 4 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Viaarxiv icon

Intelligible Lip-to-Speech Synthesis with Speech Units

Add code
May 31, 2023
Figure 1 for Intelligible Lip-to-Speech Synthesis with Speech Units
Figure 2 for Intelligible Lip-to-Speech Synthesis with Speech Units
Figure 3 for Intelligible Lip-to-Speech Synthesis with Speech Units
Viaarxiv icon

Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation

Add code
May 31, 2023
Figure 1 for Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation
Figure 2 for Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation
Figure 3 for Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation
Figure 4 for Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation
Viaarxiv icon