Picture for Muhammad Shakeel

Muhammad Shakeel

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss

Add code
Jun 23, 2024
Viaarxiv icon

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

Add code
Jun 05, 2024
Viaarxiv icon

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation

Add code
May 22, 2024
Viaarxiv icon

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Add code
Feb 20, 2024
Figure 1 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Figure 2 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Figure 3 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Figure 4 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Viaarxiv icon

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Add code
Jan 30, 2024
Figure 1 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 2 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 3 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 4 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Viaarxiv icon

Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search

Add code
Jan 19, 2024
Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Oct 02, 2023
Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

Add code
Dec 21, 2022
Figure 1 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Figure 2 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Figure 3 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Viaarxiv icon

Metric-based multimodal meta-learning for human movement identification via footstep recognition

Add code
Nov 15, 2021
Figure 1 for Metric-based multimodal meta-learning for human movement identification via footstep recognition
Figure 2 for Metric-based multimodal meta-learning for human movement identification via footstep recognition
Figure 3 for Metric-based multimodal meta-learning for human movement identification via footstep recognition
Figure 4 for Metric-based multimodal meta-learning for human movement identification via footstep recognition
Viaarxiv icon