Picture for Ryo Masumura

Ryo Masumura

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff

Aug 31, 2023
Figure 1 for Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
Figure 2 for Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
Figure 3 for Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
Figure 4 for Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
Viaarxiv icon

End-to-End Joint Target and Non-Target Speakers ASR

Jun 04, 2023
Figure 1 for End-to-End Joint Target and Non-Target Speakers ASR
Figure 2 for End-to-End Joint Target and Non-Target Speakers ASR
Figure 3 for End-to-End Joint Target and Non-Target Speakers ASR
Viaarxiv icon

Improving Scheduled Sampling for Neural Transducer-based ASR

May 25, 2023
Figure 1 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 2 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 3 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 4 for Improving Scheduled Sampling for Neural Transducer-based ASR
Viaarxiv icon

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

May 25, 2023
Figure 1 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 2 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 3 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 4 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Viaarxiv icon

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

May 24, 2023
Figure 1 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Figure 2 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Figure 3 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Figure 4 for Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Viaarxiv icon

Leveraging Large Text Corpora for End-to-End Speech Summarization

Add code
Mar 02, 2023
Figure 1 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 2 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 3 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 4 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Viaarxiv icon

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

Add code
Oct 28, 2022
Figure 1 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 2 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 3 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 4 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Viaarxiv icon

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

Jul 11, 2022
Figure 1 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Figure 2 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Figure 3 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Viaarxiv icon

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Jun 16, 2022
Figure 1 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations
Figure 2 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations
Figure 3 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations
Figure 4 for Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations
Viaarxiv icon

Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations

Add code
Feb 21, 2022
Figure 1 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Figure 2 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Figure 3 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Figure 4 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Viaarxiv icon