Picture for Siddhant Arora

Siddhant Arora

Decoder-only Architecture for Streaming End-to-end Speech Recognition

Add code
Jun 23, 2024
Viaarxiv icon

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model

Add code
Jun 18, 2024
Viaarxiv icon

Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting

Add code
Jun 18, 2024
Viaarxiv icon

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

Add code
Jun 14, 2024
Viaarxiv icon

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Add code
Feb 25, 2024
Viaarxiv icon

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Add code
Jan 30, 2024
Figure 1 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 2 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 3 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 4 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Viaarxiv icon

Phoneme-aware Encoding for Prefix-tree-based Contextual ASR

Add code
Dec 15, 2023
Figure 1 for Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
Figure 2 for Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
Figure 3 for Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
Figure 4 for Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
Viaarxiv icon

UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network

Add code
Oct 04, 2023
Figure 1 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 2 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 3 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 4 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Oct 02, 2023
Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Semi-Autoregressive Streaming ASR With Label Context

Add code
Sep 19, 2023
Figure 1 for Semi-Autoregressive Streaming ASR With Label Context
Figure 2 for Semi-Autoregressive Streaming ASR With Label Context
Figure 3 for Semi-Autoregressive Streaming ASR With Label Context
Figure 4 for Semi-Autoregressive Streaming ASR With Label Context
Viaarxiv icon