Picture for Shinji Watanabe

Shinji Watanabe

Carnegie Mellon University

Findings of the IWSLT 2024 Evaluation Campaign

Add code
Nov 07, 2024
Viaarxiv icon

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning

Add code
Oct 23, 2024
Figure 1 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Figure 2 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Figure 3 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Figure 4 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Viaarxiv icon

FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model

Add code
Oct 03, 2024
Figure 1 for FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Figure 2 for FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Figure 3 for FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Figure 4 for FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Viaarxiv icon

End-to-End Speech Recognition with Pre-trained Masked Language Model

Add code
Oct 01, 2024
Figure 1 for End-to-End Speech Recognition with Pre-trained Masked Language Model
Figure 2 for End-to-End Speech Recognition with Pre-trained Masked Language Model
Figure 3 for End-to-End Speech Recognition with Pre-trained Masked Language Model
Figure 4 for End-to-End Speech Recognition with Pre-trained Masked Language Model
Viaarxiv icon

Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking

Add code
Sep 27, 2024
Figure 1 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Figure 2 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Figure 3 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Figure 4 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Viaarxiv icon

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

Add code
Sep 24, 2024
Viaarxiv icon

Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models

Add code
Sep 21, 2024
Figure 1 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 2 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 3 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 4 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Viaarxiv icon

Task Arithmetic for Language Expansion in Speech Translation

Add code
Sep 17, 2024
Figure 1 for Task Arithmetic for Language Expansion in Speech Translation
Figure 2 for Task Arithmetic for Language Expansion in Speech Translation
Figure 3 for Task Arithmetic for Language Expansion in Speech Translation
Figure 4 for Task Arithmetic for Language Expansion in Speech Translation
Viaarxiv icon

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Add code
Sep 17, 2024
Figure 1 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 2 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 3 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 4 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Viaarxiv icon

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models

Add code
Sep 16, 2024
Figure 1 for Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Figure 2 for Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Figure 3 for Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Figure 4 for Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Viaarxiv icon