Picture for Shinji Watanabe

Shinji Watanabe

Carnegie Mellon University

Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing

Add code
Jun 25, 2024
Figure 1 for Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing
Figure 2 for Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing
Figure 3 for Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing
Figure 4 for Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing
Viaarxiv icon

Decoder-only Architecture for Streaming End-to-end Speech Recognition

Add code
Jun 23, 2024
Figure 1 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Figure 2 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Figure 3 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Viaarxiv icon

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss

Add code
Jun 23, 2024
Figure 1 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Figure 2 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Figure 3 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Figure 4 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Viaarxiv icon

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement

Add code
Jun 19, 2024
Figure 1 for Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
Figure 2 for Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
Viaarxiv icon

Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting

Add code
Jun 18, 2024
Figure 1 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 2 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 3 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 4 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Viaarxiv icon

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model

Add code
Jun 18, 2024
Figure 1 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 2 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 3 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 4 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Viaarxiv icon

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model

Add code
Jun 14, 2024
Figure 1 for MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Figure 2 for MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Figure 3 for MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Figure 4 for MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Viaarxiv icon

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

Add code
Jun 14, 2024
Viaarxiv icon

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

Add code
Jun 13, 2024
Viaarxiv icon

DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

Add code
Jun 13, 2024
Figure 1 for DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Figure 2 for DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Figure 3 for DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Figure 4 for DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Viaarxiv icon