Alert button

"speech recognition": models, code, and papers
Alert button

Multi-stage Large Language Model Correction for Speech Recognition

Oct 17, 2023
Jie Pu, Thai-Son Nguyen, Sebastian Stüker

Viaarxiv icon

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Oct 27, 2023
Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

Figure 1 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 2 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 3 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 4 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Viaarxiv icon

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

Sep 18, 2023
Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe

Figure 1 for Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Figure 2 for Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Figure 3 for Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Figure 4 for Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Viaarxiv icon

Towards Probing Contact Center Large Language Models

Dec 26, 2023
Varun Nathan, Ayush Kumar, Digvijay Ingle, Jithendra Vepa

Viaarxiv icon

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

Oct 22, 2023
Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie

Viaarxiv icon

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

Sep 21, 2023
Yang Li, Liangzhen Lai, Yuan Shangguan, Forrest N. Iandola, Ernie Chang, Yangyang Shi, Vikas Chandra

Figure 1 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Figure 2 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Figure 3 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Figure 4 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Viaarxiv icon

Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks

Jan 22, 2024
François Deloche, Laurent Bonnasse-Gahot, Judit Gervain

Viaarxiv icon

PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition

Dec 13, 2023
Chengxi Lei, Satwinder Singh, Feng Hou, Xiaoyun Jia, Ruili Wang

Viaarxiv icon

Discriminative Speech Recognition Rescoring with Pre-trained Language Models

Oct 10, 2023
Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

Figure 1 for Discriminative Speech Recognition Rescoring with Pre-trained Language Models
Figure 2 for Discriminative Speech Recognition Rescoring with Pre-trained Language Models
Viaarxiv icon

Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers

Dec 18, 2023
Guru Prakash Arumugam, Shuo-yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia

Viaarxiv icon