Alert button

"speech": models, code, and papers
Alert button

Sentence Embedding Models for Ancient Greek Using Multilingual Knowledge Distillation

Add code
Bookmark button
Alert button
Aug 24, 2023
Kevin Krahn, Derrick Tate, Andrew C. Lamicela

Viaarxiv icon

SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition

Add code
Bookmark button
Alert button
Jun 18, 2023
Desh Raj, Daniel Povey, Sanjeev Khudanpur

Figure 1 for SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
Figure 2 for SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
Figure 3 for SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
Figure 4 for SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
Viaarxiv icon

HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer

Add code
Bookmark button
Alert button
Jul 30, 2023
Sang-Hoon Lee, Ha-Yeong Choi, Hyung-Seok Oh, Seong-Whan Lee

Figure 1 for HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Figure 2 for HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Figure 3 for HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Figure 4 for HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Viaarxiv icon

Multi-Head State Space Model for Speech Recognition

May 25, 2023
Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales

Figure 1 for Multi-Head State Space Model for Speech Recognition
Figure 2 for Multi-Head State Space Model for Speech Recognition
Figure 3 for Multi-Head State Space Model for Speech Recognition
Figure 4 for Multi-Head State Space Model for Speech Recognition
Viaarxiv icon

Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition

Jun 12, 2023
Belen Alastruey, Lukas Drude, Jahn Heymann, Simon Wiesler

Figure 1 for Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Figure 2 for Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Figure 3 for Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Figure 4 for Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Viaarxiv icon

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Add code
Bookmark button
Alert button
Apr 18, 2023
Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian

Figure 1 for NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Figure 2 for NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Figure 3 for NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Figure 4 for NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Viaarxiv icon

Enhancing Speech Articulation Analysis using a Geometric Transformation of the X-ray Microbeam Dataset

May 18, 2023
Ahmed Adel Attia, Mark Tiede, Carol Y. Espy-Wilson

Figure 1 for Enhancing Speech Articulation Analysis using a Geometric Transformation of the X-ray Microbeam Dataset
Figure 2 for Enhancing Speech Articulation Analysis using a Geometric Transformation of the X-ray Microbeam Dataset
Figure 3 for Enhancing Speech Articulation Analysis using a Geometric Transformation of the X-ray Microbeam Dataset
Viaarxiv icon

Attention-based Speech Enhancement Using Human Quality Perception Modelling

Mar 23, 2023
Khandokar Md. Nayem, Donald S. Williamson

Figure 1 for Attention-based Speech Enhancement Using Human Quality Perception Modelling
Figure 2 for Attention-based Speech Enhancement Using Human Quality Perception Modelling
Figure 3 for Attention-based Speech Enhancement Using Human Quality Perception Modelling
Figure 4 for Attention-based Speech Enhancement Using Human Quality Perception Modelling
Viaarxiv icon

Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Jun 20, 2023
Xuefei Wang, Yanhua Long, Yijie Li, Haoran Wei

Figure 1 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Figure 2 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Figure 3 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Figure 4 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Viaarxiv icon

Characterization of cough sounds using statistical analysis

Aug 06, 2023
Naveenkumar Vodnala, Pratap Reddy Lankireddy, Padmasai Yarlagadda

Figure 1 for Characterization of cough sounds using statistical analysis
Figure 2 for Characterization of cough sounds using statistical analysis
Figure 3 for Characterization of cough sounds using statistical analysis
Figure 4 for Characterization of cough sounds using statistical analysis
Viaarxiv icon