Alert button

"speech": models, code, and papers
Alert button

Controllable Data Generation Via Iterative Data-Property Mutual Mappings

Oct 11, 2023
Bo Pan, Muran Qin, Shiyu Wang, Yifei Zhang, Liang Zhao

Figure 1 for Controllable Data Generation Via Iterative Data-Property Mutual Mappings
Figure 2 for Controllable Data Generation Via Iterative Data-Property Mutual Mappings
Figure 3 for Controllable Data Generation Via Iterative Data-Property Mutual Mappings
Figure 4 for Controllable Data Generation Via Iterative Data-Property Mutual Mappings
Viaarxiv icon

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Oct 11, 2023
Dongchao Yang, Jinchuan Tian, Xu Tan, Rongjie Huang, Songxiang Liu, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Xixin Wu, Zhou Zhao, Shinji Watanabe, Helen Meng

Figure 1 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 2 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 3 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 4 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Viaarxiv icon

Intelligible Lip-to-Speech Synthesis with Speech Units

May 31, 2023
Jeongsoo Choi, Minsu Kim, Yong Man Ro

Figure 1 for Intelligible Lip-to-Speech Synthesis with Speech Units
Figure 2 for Intelligible Lip-to-Speech Synthesis with Speech Units
Figure 3 for Intelligible Lip-to-Speech Synthesis with Speech Units
Viaarxiv icon

Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content

Jun 13, 2023
Tiantian Feng, Digbalay Bose, Xuan Shi, Shrikanth Narayanan

Figure 1 for Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content
Figure 2 for Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content
Figure 3 for Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content
Figure 4 for Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content
Viaarxiv icon

Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

Sep 14, 2023
Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu

Figure 1 for Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models
Figure 2 for Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models
Figure 3 for Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models
Figure 4 for Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models
Viaarxiv icon

Massive End-to-end Models for Short Search Queries

Sep 22, 2023
Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar

Figure 1 for Massive End-to-end Models for Short Search Queries
Figure 2 for Massive End-to-end Models for Short Search Queries
Figure 3 for Massive End-to-end Models for Short Search Queries
Figure 4 for Massive End-to-end Models for Short Search Queries
Viaarxiv icon

ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

Jul 17, 2023
Yanir Marmor, Kinneret Misgav, Yair Lifshitz

Figure 1 for ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development
Figure 2 for ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development
Figure 3 for ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development
Figure 4 for ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development
Viaarxiv icon

Audio-Visual Speaker Verification via Joint Cross-Attention

Sep 28, 2023
R. Gnana Praveen, Jahangir Alam

Figure 1 for Audio-Visual Speaker Verification via Joint Cross-Attention
Figure 2 for Audio-Visual Speaker Verification via Joint Cross-Attention
Figure 3 for Audio-Visual Speaker Verification via Joint Cross-Attention
Viaarxiv icon

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Oct 05, 2023
Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet

Figure 1 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 2 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 3 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 4 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Viaarxiv icon

MultiPA: a multi-task speech pronunciation assessment system for a closed and open response scenario

Aug 24, 2023
Yu-Wen Chen, Zhou Yu, Julia Hirschberg

Viaarxiv icon