Alert button
Picture for Siddharth Dalmia

Siddharth Dalmia

Alert button

LLM Augmented LLMs: Expanding Capabilities through Composition

Jan 04, 2024
Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar

Viaarxiv icon

Multimodal Modeling For Spoken Language Identification

Sep 19, 2023
Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa

Figure 1 for Multimodal Modeling For Spoken Language Identification
Figure 2 for Multimodal Modeling For Spoken Language Identification
Figure 3 for Multimodal Modeling For Spoken Language Identification
Figure 4 for Multimodal Modeling For Spoken Language Identification
Viaarxiv icon

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

Apr 11, 2023
Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polák, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe

Figure 1 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Figure 2 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Figure 3 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Figure 4 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Viaarxiv icon

Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

Nov 11, 2022
Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe

Figure 1 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Figure 2 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Figure 3 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Figure 4 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Viaarxiv icon

A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

Nov 10, 2022
Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe

Figure 1 for A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Figure 2 for A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Figure 3 for A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Figure 4 for A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Viaarxiv icon

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

Oct 27, 2022
Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, Shinji Watanabe

Figure 1 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Figure 2 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Figure 3 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Figure 4 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Viaarxiv icon

CTC Alignments Improve Autoregressive Translation

Oct 11, 2022
Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, Shinji Watanabe

Figure 1 for CTC Alignments Improve Autoregressive Translation
Figure 2 for CTC Alignments Improve Autoregressive Translation
Figure 3 for CTC Alignments Improve Autoregressive Translation
Figure 4 for CTC Alignments Improve Autoregressive Translation
Viaarxiv icon

Two-Pass Low Latency End-to-End Spoken Language Understanding

Jul 14, 2022
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe

Figure 1 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 2 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 3 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 4 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Viaarxiv icon

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Jul 06, 2022
Yifan Peng, Siddharth Dalmia, Ian Lane, Shinji Watanabe

Figure 1 for Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Figure 2 for Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Figure 3 for Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Figure 4 for Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Viaarxiv icon

LegoNN: Building Modular Encoder-Decoder Models

Jun 07, 2022
Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed

Figure 1 for LegoNN: Building Modular Encoder-Decoder Models
Figure 2 for LegoNN: Building Modular Encoder-Decoder Models
Figure 3 for LegoNN: Building Modular Encoder-Decoder Models
Figure 4 for LegoNN: Building Modular Encoder-Decoder Models
Viaarxiv icon