Picture for Shinji Watanabe

Shinji Watanabe

Carnegie Mellon University

Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing

Add code
Jun 25, 2024
Figure 1 for Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing
Figure 2 for Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing
Figure 3 for Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing
Figure 4 for Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing
Viaarxiv icon

Decoder-only Architecture for Streaming End-to-end Speech Recognition

Add code
Jun 23, 2024
Figure 1 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Figure 2 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Figure 3 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Viaarxiv icon

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss

Add code
Jun 23, 2024
Figure 1 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Figure 2 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Figure 3 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Figure 4 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Viaarxiv icon

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement

Add code
Jun 19, 2024
Figure 1 for Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
Figure 2 for Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
Viaarxiv icon

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model

Add code
Jun 18, 2024
Figure 1 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 2 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 3 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 4 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Viaarxiv icon

Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting

Add code
Jun 18, 2024
Figure 1 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 2 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 3 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 4 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Viaarxiv icon

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model

Add code
Jun 14, 2024
Figure 1 for MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Figure 2 for MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Figure 3 for MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Figure 4 for MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Viaarxiv icon

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

Add code
Jun 14, 2024
Viaarxiv icon

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

Add code
Jun 13, 2024
Figure 1 for VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Figure 2 for VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Figure 3 for VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Figure 4 for VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Viaarxiv icon

DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

Add code
Jun 13, 2024
Figure 1 for DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Figure 2 for DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Figure 3 for DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Figure 4 for DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Viaarxiv icon