Picture for Shinji Watanabe

Shinji Watanabe

Carnegie Mellon University

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

Add code
Sep 15, 2023
Figure 1 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 2 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 3 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 4 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Viaarxiv icon

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Add code
Sep 15, 2023
Figure 1 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 2 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 3 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 4 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Viaarxiv icon

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Add code
Aug 19, 2023
Viaarxiv icon

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

Add code
Jul 24, 2023
Viaarxiv icon

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Add code
Jul 23, 2023
Figure 1 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Figure 2 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Figure 3 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Viaarxiv icon

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

Add code
Jul 20, 2023
Figure 1 for Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
Figure 2 for Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
Figure 3 for Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
Viaarxiv icon

BASS: Block-wise Adaptation for Speech Summarization

Add code
Jul 17, 2023
Figure 1 for BASS: Block-wise Adaptation for Speech Summarization
Figure 2 for BASS: Block-wise Adaptation for Speech Summarization
Figure 3 for BASS: Block-wise Adaptation for Speech Summarization
Figure 4 for BASS: Block-wise Adaptation for Speech Summarization
Viaarxiv icon

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

Add code
Jul 14, 2023
Figure 1 for The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Figure 2 for The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Figure 3 for The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Figure 4 for The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Viaarxiv icon

Deep Speech Synthesis from MRI-Based Articulatory Representations

Add code
Jul 05, 2023
Figure 1 for Deep Speech Synthesis from MRI-Based Articulatory Representations
Figure 2 for Deep Speech Synthesis from MRI-Based Articulatory Representations
Figure 3 for Deep Speech Synthesis from MRI-Based Articulatory Representations
Figure 4 for Deep Speech Synthesis from MRI-Based Articulatory Representations
Viaarxiv icon

Exploration on HuBERT with Multiple Resolutions

Add code
Jun 22, 2023
Figure 1 for Exploration on HuBERT with Multiple Resolutions
Figure 2 for Exploration on HuBERT with Multiple Resolutions
Figure 3 for Exploration on HuBERT with Multiple Resolutions
Figure 4 for Exploration on HuBERT with Multiple Resolutions
Viaarxiv icon