Picture for Hiroshi Sato

Hiroshi Sato

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling

Add code
Jul 01, 2024
Figure 1 for SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling
Figure 2 for SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling
Figure 3 for SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling
Viaarxiv icon

Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance

Add code
Apr 23, 2024
Figure 1 for Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
Figure 2 for Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
Figure 3 for Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
Figure 4 for Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
Viaarxiv icon

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters

Add code
Jan 10, 2024
Figure 1 for Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Figure 2 for Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Figure 3 for Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Figure 4 for Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Viaarxiv icon

How does end-to-end speech recognition training impact speech enhancement artifacts?

Add code
Nov 20, 2023
Figure 1 for How does end-to-end speech recognition training impact speech enhancement artifacts?
Figure 2 for How does end-to-end speech recognition training impact speech enhancement artifacts?
Viaarxiv icon

End-to-End Joint Target and Non-Target Speakers ASR

Add code
Jun 04, 2023
Figure 1 for End-to-End Joint Target and Non-Target Speakers ASR
Figure 2 for End-to-End Joint Target and Non-Target Speakers ASR
Figure 3 for End-to-End Joint Target and Non-Target Speakers ASR
Viaarxiv icon

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

Add code
May 25, 2023
Figure 1 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 2 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 3 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 4 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Viaarxiv icon

Improving Scheduled Sampling for Neural Transducer-based ASR

Add code
May 25, 2023
Figure 1 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 2 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 3 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 4 for Improving Scheduled Sampling for Neural Transducer-based ASR
Viaarxiv icon

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

Add code
May 24, 2023
Figure 1 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Figure 2 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Figure 3 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Figure 4 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Viaarxiv icon

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

Add code
Oct 28, 2022
Figure 1 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 2 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 3 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 4 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Viaarxiv icon

Streaming Target-Speaker ASR with Neural Transducer

Add code
Sep 19, 2022
Figure 1 for Streaming Target-Speaker ASR with Neural Transducer
Figure 2 for Streaming Target-Speaker ASR with Neural Transducer
Figure 3 for Streaming Target-Speaker ASR with Neural Transducer
Figure 4 for Streaming Target-Speaker ASR with Neural Transducer
Viaarxiv icon