Picture for Shinji Watanabe

Shinji Watanabe

Carnegie Mellon University

HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model

Add code
Oct 06, 2023
Viaarxiv icon

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios

Add code
Oct 05, 2023
Figure 1 for EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios
Figure 2 for EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios
Figure 3 for EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios
Figure 4 for EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios
Viaarxiv icon

UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network

Add code
Oct 04, 2023
Figure 1 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 2 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 3 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 4 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Viaarxiv icon

One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition

Add code
Oct 02, 2023
Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Oct 02, 2023
Viaarxiv icon

Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference

Add code
Oct 01, 2023
Figure 1 for Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
Figure 2 for Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
Figure 3 for Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
Figure 4 for Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
Viaarxiv icon

Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation

Add code
Sep 29, 2023
Figure 1 for Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Figure 2 for Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Figure 3 for Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Figure 4 for Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Viaarxiv icon

Toward Universal Speech Enhancement for Diverse Input Conditions

Add code
Sep 29, 2023
Viaarxiv icon

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

Add code
Sep 28, 2023
Viaarxiv icon

Speech collage: code-switched audio generation by collaging monolingual corpora

Add code
Sep 27, 2023
Figure 1 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 2 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 3 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 4 for Speech collage: code-switched audio generation by collaging monolingual corpora
Viaarxiv icon