Alert button

"speech": models, code, and papers
Alert button

Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring

Oct 27, 2023
Ankitha Sudarshan, Vinay Samuel, Parth Patwa, Ibtihel Amara, Aman Chadha

Figure 1 for Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring
Figure 2 for Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring
Figure 3 for Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring
Figure 4 for Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring
Viaarxiv icon

OpenVoice: Versatile Instant Voice Cloning

Dec 03, 2023
Zengyi Qin, Wenliang Zhao, Xumin Yu, Xin Sun

Viaarxiv icon

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

Sep 25, 2023
Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Xinkai Wang, Hemin Yang, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng

Viaarxiv icon

Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices

Oct 12, 2023
Matthew Baas, Herman Kamper

Figure 1 for Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices
Figure 2 for Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices
Figure 3 for Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices
Figure 4 for Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices
Viaarxiv icon

The Impact of Silence on Speech Anti-Spoofing

Sep 21, 2023
Yuxiang Zhang, Zhuo Li, Jingze Lu, Hua Hua, Wenchao Wang, Pengyuan Zhang

Figure 1 for The Impact of Silence on Speech Anti-Spoofing
Figure 2 for The Impact of Silence on Speech Anti-Spoofing
Figure 3 for The Impact of Silence on Speech Anti-Spoofing
Figure 4 for The Impact of Silence on Speech Anti-Spoofing
Viaarxiv icon

Average Token Delay: A Duration-aware Latency Metric for Simultaneous Translation

Nov 27, 2023
Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

Viaarxiv icon

Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for Vietnamese Abstractive Multi-document Summarization

Nov 27, 2023
Mai-Vu Tran, Hoang-Quynh Le, Duy-Cat Can, Quoc-An Nguyen

Viaarxiv icon

AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation

Oct 11, 2023
Liyang Chen, Weihong Bao, Shun Lei, Boshi Tang, Zhiyong Wu, Shiyin Kang, Haozhi Huang

Figure 1 for AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation
Figure 2 for AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation
Figure 3 for AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation
Figure 4 for AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation
Viaarxiv icon

No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation

Oct 10, 2023
Dennis Fucci, Marco Gaido, Matteo Negri, Mauro Cettolo, Luisa Bentivogli

Figure 1 for No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation
Figure 2 for No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation
Figure 3 for No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation
Figure 4 for No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation
Viaarxiv icon

LLaSM: Large Language and Speech Model

Sep 16, 2023
Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen Shi, Qiqi Xiang, Yemin Shi

Figure 1 for LLaSM: Large Language and Speech Model
Figure 2 for LLaSM: Large Language and Speech Model
Figure 3 for LLaSM: Large Language and Speech Model
Figure 4 for LLaSM: Large Language and Speech Model
Viaarxiv icon