Picture for Shinji Watanabe

Shinji Watanabe

Carnegie Mellon University

Improving ASR Contextual Biasing with Guided Attention

Add code
Jan 16, 2024
Viaarxiv icon

AugSumm: towards generalizable speech summarization using synthetic labels from large language model

Add code
Jan 10, 2024
Figure 1 for AugSumm: towards generalizable speech summarization using synthetic labels from large language model
Figure 2 for AugSumm: towards generalizable speech summarization using synthetic labels from large language model
Figure 3 for AugSumm: towards generalizable speech summarization using synthetic labels from large language model
Figure 4 for AugSumm: towards generalizable speech summarization using synthetic labels from large language model
Viaarxiv icon

Generative Context-aware Fine-tuning of Self-supervised Speech Models

Add code
Dec 15, 2023
Figure 1 for Generative Context-aware Fine-tuning of Self-supervised Speech Models
Figure 2 for Generative Context-aware Fine-tuning of Self-supervised Speech Models
Figure 3 for Generative Context-aware Fine-tuning of Self-supervised Speech Models
Figure 4 for Generative Context-aware Fine-tuning of Self-supervised Speech Models
Viaarxiv icon

Phoneme-aware Encoding for Prefix-tree-based Contextual ASR

Add code
Dec 15, 2023
Viaarxiv icon

Understanding Probe Behaviors through Variational Bounds of Mutual Information

Add code
Dec 15, 2023
Figure 1 for Understanding Probe Behaviors through Variational Bounds of Mutual Information
Figure 2 for Understanding Probe Behaviors through Variational Bounds of Mutual Information
Figure 3 for Understanding Probe Behaviors through Variational Bounds of Mutual Information
Viaarxiv icon

Music ControlNet: Multiple Time-varying Controls for Music Generation

Add code
Nov 13, 2023
Figure 1 for Music ControlNet: Multiple Time-varying Controls for Music Generation
Figure 2 for Music ControlNet: Multiple Time-varying Controls for Music Generation
Figure 3 for Music ControlNet: Multiple Time-varying Controls for Music Generation
Figure 4 for Music ControlNet: Multiple Time-varying Controls for Music Generation
Viaarxiv icon

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Add code
Oct 27, 2023
Figure 1 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 2 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 3 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 4 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Viaarxiv icon

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction

Add code
Oct 12, 2023
Viaarxiv icon

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Add code
Oct 11, 2023
Figure 1 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 2 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 3 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 4 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Viaarxiv icon

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

Add code
Oct 09, 2023
Figure 1 for Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Figure 2 for Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Figure 3 for Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Figure 4 for Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Viaarxiv icon