Alert button
Picture for Siddhant Arora

Siddhant Arora

Alert button

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Add code
Bookmark button
Alert button
Feb 25, 2024
Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro

Viaarxiv icon

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Add code
Bookmark button
Alert button
Jan 30, 2024
Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe

Viaarxiv icon

Phoneme-aware Encoding for Prefix-tree-based Contextual ASR

Add code
Bookmark button
Alert button
Dec 15, 2023
Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe

Viaarxiv icon

UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network

Add code
Bookmark button
Alert button
Oct 04, 2023
Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe

Figure 1 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 2 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 3 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 4 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Bookmark button
Alert button
Oct 02, 2023
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Semi-Autoregressive Streaming ASR With Label Context

Add code
Bookmark button
Alert button
Sep 19, 2023
Siddhant Arora, George Saon, Shinji Watanabe, Brian Kingsbury

Figure 1 for Semi-Autoregressive Streaming ASR With Label Context
Figure 2 for Semi-Autoregressive Streaming ASR With Label Context
Figure 3 for Semi-Autoregressive Streaming ASR With Label Context
Figure 4 for Semi-Autoregressive Streaming ASR With Label Context
Viaarxiv icon

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

Add code
Bookmark button
Alert button
Sep 18, 2023
Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee

Figure 1 for Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Figure 2 for Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Figure 3 for Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Figure 4 for Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Viaarxiv icon

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

Add code
Bookmark button
Alert button
Sep 16, 2023
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Viaarxiv icon

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

Add code
Bookmark button
Alert button
Jul 24, 2023
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Viaarxiv icon

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

Add code
Bookmark button
Alert button
Jul 20, 2023
Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe

Figure 1 for Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
Figure 2 for Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
Figure 3 for Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
Viaarxiv icon