Alert button

"speech": models, code, and papers
Alert button

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

Add code
Bookmark button
Alert button
Sep 12, 2023
Hyungseob Lim, Kyungguen Byun, Sunkuk Moon, Erik Visser

Figure 1 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 2 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 3 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 4 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Viaarxiv icon

BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

Oct 04, 2023
Peikun Chen, Fan Yu, Yuhao Lian, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

Figure 1 for BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition
Figure 2 for BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition
Figure 3 for BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition
Figure 4 for BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition
Viaarxiv icon

End-to-End real time tracking of children's reading with pointer network

Oct 17, 2023
Vishal Sunder, Beulah Karrolla, Eric Fosler-Lussier

Viaarxiv icon

SpatialCodec: Neural Spatial Speech Coding

Add code
Bookmark button
Alert button
Sep 14, 2023
Zhongweiyang Xu, Yong Xu, Vinay Kothapally, Heming Wang, Muqiao Yang, Dong Yu

Figure 1 for SpatialCodec: Neural Spatial Speech Coding
Figure 2 for SpatialCodec: Neural Spatial Speech Coding
Figure 3 for SpatialCodec: Neural Spatial Speech Coding
Viaarxiv icon

M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models

Add code
Bookmark button
Alert button
Nov 19, 2023
Atin Sakkeer Hussain, Shansong Liu, Chenshuo Sun, Ying Shan

Figure 1 for M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models
Figure 2 for M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models
Figure 3 for M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models
Figure 4 for M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models
Viaarxiv icon

A Few-Shot Approach to Dysarthric Speech Intelligibility Level Classification Using Transformers

Sep 17, 2023
Paleti Nikhil Chowdary, Vadlapudi Sai Aravind, Gorantla V N S L Vishnu Vardhan, Menta Sai Akshay, Menta Sai Aashish, Jyothish Lal. G

Figure 1 for A Few-Shot Approach to Dysarthric Speech Intelligibility Level Classification Using Transformers
Figure 2 for A Few-Shot Approach to Dysarthric Speech Intelligibility Level Classification Using Transformers
Figure 3 for A Few-Shot Approach to Dysarthric Speech Intelligibility Level Classification Using Transformers
Figure 4 for A Few-Shot Approach to Dysarthric Speech Intelligibility Level Classification Using Transformers
Viaarxiv icon

Zero-shot audio captioning with audio-language model guidance and audio context keywords

Add code
Bookmark button
Alert button
Nov 14, 2023
Leonard Salewski, Stefan Fauth, A. Sophia Koepke, Zeynep Akata

Figure 1 for Zero-shot audio captioning with audio-language model guidance and audio context keywords
Figure 2 for Zero-shot audio captioning with audio-language model guidance and audio context keywords
Figure 3 for Zero-shot audio captioning with audio-language model guidance and audio context keywords
Viaarxiv icon

On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation

Add code
Bookmark button
Alert button
Nov 14, 2023
Anssi Moisio, Mathias Creutz, Mikko Kurimo

Viaarxiv icon

Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

Nov 14, 2023
Sai Muralidhar Jayanthi, Devang Kulshreshtha, Saket Dingliwal, Srikanth Ronanki, Sravan Bodapati

Figure 1 for Retrieve and Copy: Scaling ASR Personalization to Large Catalogs
Figure 2 for Retrieve and Copy: Scaling ASR Personalization to Large Catalogs
Figure 3 for Retrieve and Copy: Scaling ASR Personalization to Large Catalogs
Figure 4 for Retrieve and Copy: Scaling ASR Personalization to Large Catalogs
Viaarxiv icon

A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation

Oct 22, 2023
Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

Viaarxiv icon