Alert button

"speech": models, code, and papers
Alert button

QuickVC: Many-to-any Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion

Add code
Bookmark button
Alert button
Feb 17, 2023
Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro

Figure 1 for QuickVC: Many-to-any Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Figure 2 for QuickVC: Many-to-any Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Figure 3 for QuickVC: Many-to-any Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Figure 4 for QuickVC: Many-to-any Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Viaarxiv icon

Dynamic Chuck Convolution For Unified Streaming And Non-streaming Conformer ASR

Apr 18, 2023
Xilai Li, Goeric Huybrechts, Srikanth Ronanki, Jeff Farris, Sravan Bodapati

Figure 1 for Dynamic Chuck Convolution For Unified Streaming And Non-streaming Conformer ASR
Figure 2 for Dynamic Chuck Convolution For Unified Streaming And Non-streaming Conformer ASR
Figure 3 for Dynamic Chuck Convolution For Unified Streaming And Non-streaming Conformer ASR
Figure 4 for Dynamic Chuck Convolution For Unified Streaming And Non-streaming Conformer ASR
Viaarxiv icon

Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search

Apr 18, 2023
Wenping Wang, Yunxi Guo, Chiyao Shen, Shuai Ding, Guangdeng Liao, Hao Fu, Pramodh Karanth Prabhakar

Figure 1 for Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search
Figure 2 for Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search
Figure 3 for Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search
Viaarxiv icon

Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models

Add code
Bookmark button
Alert button
Apr 24, 2023
Xiangming Gu, Wei Zeng, Jianan Zhang, Longshen Ou, Ye Wang

Figure 1 for Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models
Figure 2 for Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models
Figure 3 for Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models
Figure 4 for Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models
Viaarxiv icon

"It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting Harmful Speech Online

Add code
Bookmark button
Alert button
Oct 28, 2022
Federico Bianchi, Stefanie Anja Hills, Patricia Rossini, Dirk Hovy, Rebekah Tromble, Nava Tintarev

Figure 1 for "It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting Harmful Speech Online
Figure 2 for "It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting Harmful Speech Online
Figure 3 for "It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting Harmful Speech Online
Figure 4 for "It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting Harmful Speech Online
Viaarxiv icon

Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement

Nov 08, 2022
Shucong Zhang, Malcolm Chadwick, Alberto Gil C. P. Ramos, Sourav Bhattacharya

Figure 1 for Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement
Figure 2 for Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement
Viaarxiv icon

Automated speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting

Add code
Bookmark button
Alert button
Jan 13, 2023
Lasse Hansen, Roberta Rocca, Arndis Simonsen, Alberto Parola, Vibeke Bliksted, Nicolai Ladegaard, Dan Bang, Kristian Tylén, Ethan Weed, Søren Dinesen Østergaard, Riccardo Fusaroli

Figure 1 for Automated speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting
Figure 2 for Automated speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting
Figure 3 for Automated speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting
Figure 4 for Automated speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting
Viaarxiv icon

Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder

Add code
Bookmark button
Alert button
Nov 15, 2022
Yuying Xie, Thomas Arildsen, Zheng-Hua Tan

Figure 1 for Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder
Figure 2 for Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder
Figure 3 for Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder
Figure 4 for Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder
Viaarxiv icon

Explanations for Automatic Speech Recognition

Feb 27, 2023
Xiaoliang Wu, Peter Bell, Ajitha Rajan

Figure 1 for Explanations for Automatic Speech Recognition
Figure 2 for Explanations for Automatic Speech Recognition
Viaarxiv icon

Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model

Mar 13, 2023
Shuangping Huang, Yu Luo, Zhenzhou Zhuang, Jin-Gang Yu, Mengchao He, Yongpan Wang

Figure 1 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 2 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 3 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 4 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Viaarxiv icon