Alert button

"speech": models, code, and papers
Alert button

Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech

Mar 01, 2022
Andrew Reece, Gus Cooney, Peter Bull, Christine Chung, Bryn Dawson, Casey Fitzpatrick, Tamara Glazer, Dean Knox, Alex Liebscher, Sebastian Marin

Figure 1 for Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech
Figure 2 for Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech
Figure 3 for Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech
Figure 4 for Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech
Viaarxiv icon

Effective and Differentiated Use of Control Information for Multi-speaker Speech Synthesis

Add code
Bookmark button
Alert button
Jul 08, 2021
Qinghua Wu, Quanbo Shen, Jian Luan, YuJun Wang

Figure 1 for Effective and Differentiated Use of Control Information for Multi-speaker Speech Synthesis
Figure 2 for Effective and Differentiated Use of Control Information for Multi-speaker Speech Synthesis
Figure 3 for Effective and Differentiated Use of Control Information for Multi-speaker Speech Synthesis
Figure 4 for Effective and Differentiated Use of Control Information for Multi-speaker Speech Synthesis
Viaarxiv icon

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning

Add code
Bookmark button
Alert button
Oct 04, 2022
Xu Yang, Hanwang Zhang, Chongyang Gao, Jianfei Cai

Figure 1 for Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Figure 2 for Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Figure 3 for Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Figure 4 for Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Viaarxiv icon

KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset of Stuttering

Mar 10, 2022
Sebastian P. Bayerl, Alexander Wolff von Gudenberg, Florian Hönig, Elmar Nöth, Korbinian Riedhammer

Figure 1 for KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset of Stuttering
Figure 2 for KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset of Stuttering
Figure 3 for KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset of Stuttering
Figure 4 for KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset of Stuttering
Viaarxiv icon

Cross-stitched Multi-modal Encoders

Add code
Bookmark button
Alert button
Apr 20, 2022
Karan Singla, Daniel Pressel, Ryan Price, Bhargav Srinivas Chinnari, Yeon-Jun Kim, Srinivas Bangalore

Figure 1 for Cross-stitched Multi-modal Encoders
Figure 2 for Cross-stitched Multi-modal Encoders
Figure 3 for Cross-stitched Multi-modal Encoders
Figure 4 for Cross-stitched Multi-modal Encoders
Viaarxiv icon

Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate Users

Add code
Bookmark button
Alert button
Apr 27, 2021
Moussa Doumbouya, Lisa Einstein, Chris Piech

Figure 1 for Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate Users
Figure 2 for Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate Users
Figure 3 for Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate Users
Figure 4 for Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate Users
Viaarxiv icon

Domain-aware Self-supervised Pre-training for Label-Efficient Meme Analysis

Sep 29, 2022
Shivam Sharma, Mohd Khizir Siddiqui, Md. Shad Akhtar, Tanmoy Chakraborty

Figure 1 for Domain-aware Self-supervised Pre-training for Label-Efficient Meme Analysis
Figure 2 for Domain-aware Self-supervised Pre-training for Label-Efficient Meme Analysis
Figure 3 for Domain-aware Self-supervised Pre-training for Label-Efficient Meme Analysis
Figure 4 for Domain-aware Self-supervised Pre-training for Label-Efficient Meme Analysis
Viaarxiv icon

Acoustic-Linguistic Features for Modeling Neurological Task Score in Alzheimer's

Sep 13, 2022
Saurav K. Aryal, Howard Prioleau, Legand Burge

Figure 1 for Acoustic-Linguistic Features for Modeling Neurological Task Score in Alzheimer's
Figure 2 for Acoustic-Linguistic Features for Modeling Neurological Task Score in Alzheimer's
Figure 3 for Acoustic-Linguistic Features for Modeling Neurological Task Score in Alzheimer's
Figure 4 for Acoustic-Linguistic Features for Modeling Neurological Task Score in Alzheimer's
Viaarxiv icon

Speech Recognition with no speech or with noisy speech

Mar 02, 2019
Gautam Krishna, Co Tran, Jianguo Yu, Ahmed H Tewfik

Figure 1 for Speech Recognition with no speech or with noisy speech
Figure 2 for Speech Recognition with no speech or with noisy speech
Figure 3 for Speech Recognition with no speech or with noisy speech
Figure 4 for Speech Recognition with no speech or with noisy speech
Viaarxiv icon

Two-Pass Low Latency End-to-End Spoken Language Understanding

Add code
Bookmark button
Alert button
Jul 14, 2022
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe

Figure 1 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 2 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 3 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 4 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Viaarxiv icon