Picture for Ryo Masumura

Ryo Masumura

Factor-Conditioned Speaking-Style Captioning

Add code
Jun 27, 2024
Viaarxiv icon

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff

Add code
Aug 31, 2023
Figure 1 for Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
Figure 2 for Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
Figure 3 for Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
Figure 4 for Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
Viaarxiv icon

End-to-End Joint Target and Non-Target Speakers ASR

Add code
Jun 04, 2023
Figure 1 for End-to-End Joint Target and Non-Target Speakers ASR
Figure 2 for End-to-End Joint Target and Non-Target Speakers ASR
Figure 3 for End-to-End Joint Target and Non-Target Speakers ASR
Viaarxiv icon

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

Add code
May 25, 2023
Figure 1 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 2 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 3 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 4 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Viaarxiv icon

Improving Scheduled Sampling for Neural Transducer-based ASR

Add code
May 25, 2023
Figure 1 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 2 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 3 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 4 for Improving Scheduled Sampling for Neural Transducer-based ASR
Viaarxiv icon

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

Add code
May 24, 2023
Figure 1 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Figure 2 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Figure 3 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Figure 4 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Viaarxiv icon

Leveraging Large Text Corpora for End-to-End Speech Summarization

Add code
Mar 02, 2023
Figure 1 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 2 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 3 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 4 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Viaarxiv icon

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

Add code
Oct 28, 2022
Figure 1 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 2 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 3 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 4 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Viaarxiv icon

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

Add code
Jul 11, 2022
Figure 1 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Figure 2 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Figure 3 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Viaarxiv icon

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Add code
Jun 16, 2022
Figure 1 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations
Figure 2 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations
Figure 3 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations
Figure 4 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations
Viaarxiv icon