Alert button

"speech": models, code, and papers
Alert button

From `Snippet-lects' to Doculects and Dialects: Leveraging Neural Representations of Speech for Placing Audio Signals in a Language Landscape

May 29, 2023
Séverine Guillaume, Guillaume Wisniewski, Alexis Michaud

Figure 1 for From `Snippet-lects' to Doculects and Dialects: Leveraging Neural Representations of Speech for Placing Audio Signals in a Language Landscape
Figure 2 for From `Snippet-lects' to Doculects and Dialects: Leveraging Neural Representations of Speech for Placing Audio Signals in a Language Landscape
Figure 3 for From `Snippet-lects' to Doculects and Dialects: Leveraging Neural Representations of Speech for Placing Audio Signals in a Language Landscape
Figure 4 for From `Snippet-lects' to Doculects and Dialects: Leveraging Neural Representations of Speech for Placing Audio Signals in a Language Landscape
Viaarxiv icon

The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech

May 25, 2023
Martti Vainio, Antti Suni, Juraj Šimko, Sofoklis Kakouros

Figure 1 for The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech
Figure 2 for The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech
Figure 3 for The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech
Figure 4 for The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech
Viaarxiv icon

Enhancing Gappy Speech Audio Signals with Generative Adversarial Networks

May 09, 2023
Deniss Strods, Alan F. Smeaton

Figure 1 for Enhancing Gappy Speech Audio Signals with Generative Adversarial Networks
Figure 2 for Enhancing Gappy Speech Audio Signals with Generative Adversarial Networks
Figure 3 for Enhancing Gappy Speech Audio Signals with Generative Adversarial Networks
Figure 4 for Enhancing Gappy Speech Audio Signals with Generative Adversarial Networks
Viaarxiv icon

Bootstrapping Developmental AIs: From Simple Competences to Intelligent Human-Compatible AIs

Aug 29, 2023
Mark Stefik, Robert Price

Figure 1 for Bootstrapping Developmental AIs: From Simple Competences to Intelligent Human-Compatible AIs
Figure 2 for Bootstrapping Developmental AIs: From Simple Competences to Intelligent Human-Compatible AIs
Figure 3 for Bootstrapping Developmental AIs: From Simple Competences to Intelligent Human-Compatible AIs
Figure 4 for Bootstrapping Developmental AIs: From Simple Competences to Intelligent Human-Compatible AIs
Viaarxiv icon

A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning

May 19, 2023
Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, Brian MacWhinney

Figure 1 for A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Figure 2 for A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Figure 3 for A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Figure 4 for A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Viaarxiv icon

RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition

Aug 31, 2023
Lin Yuan, Guoheng Huang, Fenghuan Li, Xiaochen Yuan, Chi-Man Pun, Guo Zhong

Viaarxiv icon

An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings

May 29, 2023
Luca Serafini, Samuele Cornell, Giovanni Morrone, Enrico Zovato, Alessio Brutti, Stefano Squartini

Figure 1 for An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
Figure 2 for An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
Figure 3 for An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
Figure 4 for An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
Viaarxiv icon

Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition

May 26, 2023
Hong Liu, Zhaobiao Lv, Zhijian Ou, Wenbo Zhao, Qing Xiao

Figure 1 for Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition
Figure 2 for Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition
Figure 3 for Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition
Figure 4 for Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition
Viaarxiv icon

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

May 16, 2023
Yuchen Hu, Ruizhe Li, Chen Chen, Heqing Zou, Qiushi Zhu, Eng Siong Chng

Figure 1 for Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Figure 2 for Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Figure 3 for Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Figure 4 for Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Viaarxiv icon

Emotion-Aligned Contrastive Learning Between Images and Music

Aug 24, 2023
Shanti Stewart, Tiantian Feng, Kleanthis Avramidis, Shrikanth Narayanan

Figure 1 for Emotion-Aligned Contrastive Learning Between Images and Music
Figure 2 for Emotion-Aligned Contrastive Learning Between Images and Music
Figure 3 for Emotion-Aligned Contrastive Learning Between Images and Music
Figure 4 for Emotion-Aligned Contrastive Learning Between Images and Music
Viaarxiv icon