Alert button

"speech": models, code, and papers
Alert button

Audio-visual fine-tuning of audio-only ASR models

Dec 14, 2023
Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, Olivier Siohan

Viaarxiv icon

An Exploration of In-Context Learning for Speech Language Model

Oct 19, 2023
Ming-Hao Hsu, Kai-Wei Chang, Shang-Wen Li, Hung-yi Lee

Figure 1 for An Exploration of In-Context Learning for Speech Language Model
Figure 2 for An Exploration of In-Context Learning for Speech Language Model
Figure 3 for An Exploration of In-Context Learning for Speech Language Model
Figure 4 for An Exploration of In-Context Learning for Speech Language Model
Viaarxiv icon

AE-Flow: AutoEncoder Normalizing Flow

Dec 27, 2023
Jakub Mosiński, Piotr Biliński, Thomas Merritt, Abdelhamid Ezzerg, Daniel Korzekwa

Viaarxiv icon

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Add code
Bookmark button
Alert button
Oct 27, 2023
Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

Figure 1 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 2 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 3 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 4 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Viaarxiv icon

HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model for online comments

Dec 20, 2023
Neeraj Kumar Singh, Koyel Ghosh, Joy Mahapatra, Utpal Garain, Apurbalal Senapati

Viaarxiv icon

Long-form Simultaneous Speech Translation: Thesis Proposal

Oct 17, 2023
Peter Polák

Viaarxiv icon

Toward Joint Language Modeling for Speech Units and Text

Oct 12, 2023
Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli

Figure 1 for Toward Joint Language Modeling for Speech Units and Text
Figure 2 for Toward Joint Language Modeling for Speech Units and Text
Figure 3 for Toward Joint Language Modeling for Speech Units and Text
Figure 4 for Toward Joint Language Modeling for Speech Units and Text
Viaarxiv icon

Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model

Oct 23, 2023
Joanna Hong, Se Jin Park, Yong Man Ro

Figure 1 for Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model
Figure 2 for Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model
Viaarxiv icon

Joint Audio and Speech Understanding

Oct 02, 2023
Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass

Viaarxiv icon

Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition

Nov 19, 2023
Keqi Deng, Philip C. Woodland

Viaarxiv icon