Alert button

"speech": models, code, and papers
Alert button

Conversational Speech Separation: an Evaluation Study for Streaming Applications

May 31, 2022
Giovanni Morrone, Samuele Cornell, Enrico Zovato, Alessio Brutti, Stefano Squartini

Figure 1 for Conversational Speech Separation: an Evaluation Study for Streaming Applications
Figure 2 for Conversational Speech Separation: an Evaluation Study for Streaming Applications
Figure 3 for Conversational Speech Separation: an Evaluation Study for Streaming Applications
Figure 4 for Conversational Speech Separation: an Evaluation Study for Streaming Applications
Viaarxiv icon

Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection

May 06, 2022
Esma Balkir, Isar Nejadgholi, Kathleen C. Fraser, Svetlana Kiritchenko

Figure 1 for Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection
Figure 2 for Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection
Figure 3 for Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection
Figure 4 for Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection
Viaarxiv icon

Multimodal Robot Programming by Demonstration: A Preliminary Exploration

Jan 17, 2023
Gopika Ajaykumar, Chien-Ming Huang

Figure 1 for Multimodal Robot Programming by Demonstration: A Preliminary Exploration
Figure 2 for Multimodal Robot Programming by Demonstration: A Preliminary Exploration
Figure 3 for Multimodal Robot Programming by Demonstration: A Preliminary Exploration
Figure 4 for Multimodal Robot Programming by Demonstration: A Preliminary Exploration
Viaarxiv icon

Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation

Mar 26, 2022
Kohei Saijo, Tetsuji Ogawa

Figure 1 for Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation
Figure 2 for Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation
Figure 3 for Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation
Viaarxiv icon

Streaming Target-Speaker ASR with Neural Transducer

Sep 19, 2022
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki

Figure 1 for Streaming Target-Speaker ASR with Neural Transducer
Figure 2 for Streaming Target-Speaker ASR with Neural Transducer
Figure 3 for Streaming Target-Speaker ASR with Neural Transducer
Figure 4 for Streaming Target-Speaker ASR with Neural Transducer
Viaarxiv icon

ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation

Nov 23, 2022
Sara Atito, Muhammad Awais, Wenwu Wang, Mark D Plumbley, Josef Kittler

Figure 1 for ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation
Figure 2 for ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation
Figure 3 for ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation
Figure 4 for ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation
Viaarxiv icon

Modeling the Rhythm from Lyrics for Melody Generation of Pop Song

Jan 03, 2023
Daiyu Zhang, Ju-Chiang Wang, Katerina Kosta, Jordan B. L. Smith, Shicen Zhou

Figure 1 for Modeling the Rhythm from Lyrics for Melody Generation of Pop Song
Figure 2 for Modeling the Rhythm from Lyrics for Melody Generation of Pop Song
Figure 3 for Modeling the Rhythm from Lyrics for Melody Generation of Pop Song
Figure 4 for Modeling the Rhythm from Lyrics for Melody Generation of Pop Song
Viaarxiv icon

Utilising Bayesian Networks to combine multimodal data and expert opinion for the robust prediction of depression and its symptoms

Nov 09, 2022
Salvatore Fara, Orlaith Hickey, Alexandra Georgescu, Stefano Goria, Emilia Molimpakis, Nicholas Cummins

Figure 1 for Utilising Bayesian Networks to combine multimodal data and expert opinion for the robust prediction of depression and its symptoms
Figure 2 for Utilising Bayesian Networks to combine multimodal data and expert opinion for the robust prediction of depression and its symptoms
Figure 3 for Utilising Bayesian Networks to combine multimodal data and expert opinion for the robust prediction of depression and its symptoms
Figure 4 for Utilising Bayesian Networks to combine multimodal data and expert opinion for the robust prediction of depression and its symptoms
Viaarxiv icon

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

Feb 16, 2022
Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba

Figure 1 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 2 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 3 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 4 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Viaarxiv icon

Dynamic Kernels and Channel Attention with Multi-Layer Embedding Aggregation for Speaker Verification

Nov 03, 2022
Anna Ollerenshaw, Md Asif Jalal, Thomas Hain

Figure 1 for Dynamic Kernels and Channel Attention with Multi-Layer Embedding Aggregation for Speaker Verification
Figure 2 for Dynamic Kernels and Channel Attention with Multi-Layer Embedding Aggregation for Speaker Verification
Figure 3 for Dynamic Kernels and Channel Attention with Multi-Layer Embedding Aggregation for Speaker Verification
Figure 4 for Dynamic Kernels and Channel Attention with Multi-Layer Embedding Aggregation for Speaker Verification
Viaarxiv icon