Alert button

"speech recognition": models, code, and papers
Alert button

Adapting Text-based Dialogue State Tracker for Spoken Dialogues

Add code
Bookmark button
Alert button
Aug 30, 2023
Jaeseok Yoon, Seunghyun Hwang, Ran Han, Jeonguk Bang, Kee-Eung Kim

Figure 1 for Adapting Text-based Dialogue State Tracker for Spoken Dialogues
Figure 2 for Adapting Text-based Dialogue State Tracker for Spoken Dialogues
Figure 3 for Adapting Text-based Dialogue State Tracker for Spoken Dialogues
Figure 4 for Adapting Text-based Dialogue State Tracker for Spoken Dialogues
Viaarxiv icon

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers

Sep 25, 2023
Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

Figure 1 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Figure 2 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Figure 3 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Figure 4 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Viaarxiv icon

INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition

May 25, 2023
Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo

Figure 1 for INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition
Figure 2 for INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition
Figure 3 for INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition
Figure 4 for INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition
Viaarxiv icon

HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

Add code
Bookmark button
Alert button
Oct 07, 2023
Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne

Figure 1 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 2 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 3 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 4 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Viaarxiv icon

The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains

Add code
Bookmark button
Alert button
Sep 21, 2023
Reza Khanmohammadi, Tuka Alhanai, Mohammad M. Ghassemi

Viaarxiv icon

Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling

Sep 21, 2023
Zheng Nan, Ting Dang, Vidhyasaharan Sethu, Beena Ahmed

Figure 1 for Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Figure 2 for Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Figure 3 for Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Viaarxiv icon

Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models

Nov 18, 2023
Haoran Zhao, Jake Ryland Williams

Viaarxiv icon

Exploring Speech Enhancement for Low-resource Speech Synthesis

Add code
Bookmark button
Alert button
Sep 19, 2023
Zhaoheng Ni, Sravya Popuri, Ning Dong, Kohei Saijo, Xiaohui Zhang, Gael Le Lan, Yangyang Shi, Vikas Chandra, Changhan Wang

Figure 1 for Exploring Speech Enhancement for Low-resource Speech Synthesis
Figure 2 for Exploring Speech Enhancement for Low-resource Speech Synthesis
Figure 3 for Exploring Speech Enhancement for Low-resource Speech Synthesis
Figure 4 for Exploring Speech Enhancement for Low-resource Speech Synthesis
Viaarxiv icon

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

Add code
Bookmark button
Alert button
Jul 20, 2023
Weidong Chen, Xiaofen Xing, Peihao Chen, Xiangmin Xu

Figure 1 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Figure 2 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Figure 3 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Figure 4 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Viaarxiv icon

RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation

Sep 29, 2023
Samuel Pegg, Kai Li, Xiaolin Hu

Figure 1 for RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation
Figure 2 for RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation
Figure 3 for RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation
Figure 4 for RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation
Viaarxiv icon