Alert button

"speech": models, code, and papers
Alert button

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

Jun 26, 2023
Jiajun Deng, Guinan Li, Xurong Xie, Zengrui Jin, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu

Figure 1 for Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems
Figure 2 for Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems
Figure 3 for Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems
Figure 4 for Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems
Viaarxiv icon

Recent Advances in Direct Speech-to-text Translation

Jun 20, 2023
Chen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, Jingbo Zhu

Figure 1 for Recent Advances in Direct Speech-to-text Translation
Figure 2 for Recent Advances in Direct Speech-to-text Translation
Viaarxiv icon

Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation

Add code
Bookmark button
Alert button
Jul 03, 2023
Matthew Raffel, Drew Penney, Lizhong Chen

Figure 1 for Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation
Figure 2 for Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation
Figure 3 for Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation
Figure 4 for Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation
Viaarxiv icon

PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling

Add code
Bookmark button
Alert button
Jun 13, 2023
Ji-Sang Hwang, Sang-Hoon Lee, Seong-Whan Lee

Figure 1 for PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
Figure 2 for PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
Figure 3 for PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
Figure 4 for PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
Viaarxiv icon

PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models

Add code
Bookmark button
Alert button
Jun 08, 2023
Tiantian Feng, Shrikanth Narayanan

Figure 1 for PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
Figure 2 for PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
Figure 3 for PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
Figure 4 for PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
Viaarxiv icon

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

Add code
Bookmark button
Alert button
May 30, 2023
Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

Figure 1 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Figure 2 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Figure 3 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Figure 4 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Viaarxiv icon

Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners

Sep 21, 2023
Jieyi Huang, Chunhao Zhang, Yufei Wang, Mengyue Wu, Kenny Zhu

Viaarxiv icon

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Sep 22, 2023
Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

Figure 1 for Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Figure 2 for Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Figure 3 for Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Figure 4 for Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Viaarxiv icon

STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions

Add code
Bookmark button
Alert button
May 30, 2023
Michel Plüss, Jan Deriu, Yanick Schraner, Claudio Paonessa, Julia Hartmann, Larissa Schmidt, Christian Scheller, Manuela Hürlimann, Tanja Samardžić, Manfred Vogel, Mark Cieliebak

Figure 1 for STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions
Figure 2 for STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions
Figure 3 for STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions
Figure 4 for STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions
Viaarxiv icon

End-to-End Simultaneous Speech Translation with Differentiable Segmentation

Add code
Bookmark button
Alert button
May 25, 2023
Shaolei Zhang, Yang Feng

Figure 1 for End-to-End Simultaneous Speech Translation with Differentiable Segmentation
Figure 2 for End-to-End Simultaneous Speech Translation with Differentiable Segmentation
Figure 3 for End-to-End Simultaneous Speech Translation with Differentiable Segmentation
Figure 4 for End-to-End Simultaneous Speech Translation with Differentiable Segmentation
Viaarxiv icon