Alert button

"speech": models, code, and papers
Alert button

Make BERT-based Chinese Spelling Check Model Enhanced by Layerwise Attention and Gaussian Mixture Model

Dec 27, 2023
Yongchang Cao, Liang He, Zhen Wu, Xinyu Dai

Viaarxiv icon

Extending Whisper with prompt tuning to target-speaker ASR

Dec 13, 2023
Hao Ma, Zhiyuan Peng, Mingjie Shao, Jing Li, Ju Liu

Viaarxiv icon

Enhancing Consistency in Multimodal Dialogue System Using LLM with Dialogue Scenario

Dec 20, 2023
Hiroki Onozeki, Zhiyang Qi, Kazuma Akiyama, Ryutaro Asahara, Takumasa Kaneko, Michimasa Inaba

Viaarxiv icon

Acoustic BPE for Speech Generation with Discrete Tokens

Add code
Bookmark button
Alert button
Oct 23, 2023
Feiyu Shen, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu

Viaarxiv icon

Audio-visual fine-tuning of audio-only ASR models

Dec 14, 2023
Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, Olivier Siohan

Viaarxiv icon

AE-Flow: AutoEncoder Normalizing Flow

Dec 27, 2023
Jakub Mosiński, Piotr Biliński, Thomas Merritt, Abdelhamid Ezzerg, Daniel Korzekwa

Viaarxiv icon

HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model for online comments

Dec 20, 2023
Neeraj Kumar Singh, Koyel Ghosh, Joy Mahapatra, Utpal Garain, Apurbalal Senapati

Viaarxiv icon

DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation

Add code
Bookmark button
Alert button
Oct 11, 2023
Qingkai Fang, Yan Zhou, Yang Feng

Figure 1 for DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
Figure 2 for DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
Figure 3 for DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
Figure 4 for DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
Viaarxiv icon

An Exploration of In-Context Learning for Speech Language Model

Oct 19, 2023
Ming-Hao Hsu, Kai-Wei Chang, Shang-Wen Li, Hung-yi Lee

Figure 1 for An Exploration of In-Context Learning for Speech Language Model
Figure 2 for An Exploration of In-Context Learning for Speech Language Model
Figure 3 for An Exploration of In-Context Learning for Speech Language Model
Figure 4 for An Exploration of In-Context Learning for Speech Language Model
Viaarxiv icon

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Add code
Bookmark button
Alert button
Oct 27, 2023
Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

Figure 1 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 2 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 3 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 4 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Viaarxiv icon