Picture for Shinji Watanabe

Shinji Watanabe

Carnegie Mellon University

The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge

Add code
May 11, 2023
Viaarxiv icon

A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge

Add code
May 06, 2023
Viaarxiv icon

Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History

Add code
May 01, 2023
Viaarxiv icon

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Add code
Apr 25, 2023
Figure 1 for AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Figure 2 for AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Figure 3 for AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Figure 4 for AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Viaarxiv icon

Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

Add code
Apr 18, 2023
Figure 1 for Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
Figure 2 for Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
Figure 3 for Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
Figure 4 for Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
Viaarxiv icon

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

Add code
Apr 13, 2023
Viaarxiv icon

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

Add code
Apr 11, 2023
Figure 1 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Figure 2 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Figure 3 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Figure 4 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Viaarxiv icon

Enhancing Speech-to-Speech Translation with Multiple TTS Targets

Add code
Apr 10, 2023
Figure 1 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Figure 2 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Figure 3 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Figure 4 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Viaarxiv icon

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Add code
Mar 14, 2023
Viaarxiv icon

End-to-End Speech Recognition: A Survey

Add code
Mar 03, 2023
Figure 1 for End-to-End Speech Recognition: A Survey
Figure 2 for End-to-End Speech Recognition: A Survey
Figure 3 for End-to-End Speech Recognition: A Survey
Figure 4 for End-to-End Speech Recognition: A Survey
Viaarxiv icon