Alert button

"speech": models, code, and papers
Alert button

Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features

Add code
Bookmark button
Alert button
Sep 14, 2023
Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena Baralis

Figure 1 for Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Figure 2 for Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Figure 3 for Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Figure 4 for Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Viaarxiv icon

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

Add code
Bookmark button
Alert button
Sep 15, 2023
Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro

Figure 1 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 2 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 3 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 4 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Viaarxiv icon

Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech

Add code
Bookmark button
Alert button
Sep 20, 2023
Bagus Tris Atmaja, Akira Sasou

Figure 1 for Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech
Figure 2 for Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech
Figure 3 for Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech
Figure 4 for Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech
Viaarxiv icon

Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement

Sep 20, 2023
Shafique Ahmed, Chia-Wei Chen, Wenze Ren, Chin-Jou Li, Ernie Chu, Jun-Cheng Chen, Amir Hussain, Hsin-Min Wang, Yu Tsao, Jen-Cheng Hou

Figure 1 for Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Figure 2 for Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Figure 3 for Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Figure 4 for Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Viaarxiv icon

BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing

Add code
Bookmark button
Alert button
Sep 02, 2023
Chen Wang, Minpeng Liao, Zhongqiang Huang, Jinliang Lu, Junhong Wu, Yuchen Liu, Chengqing Zong, Jiajun Zhang

Figure 1 for BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Figure 2 for BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Figure 3 for BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Figure 4 for BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Viaarxiv icon

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

Sep 21, 2023
Yang Li, Liangzhen Lai, Yuan Shangguan, Forrest N. Iandola, Ernie Chang, Yangyang Shi, Vikas Chandra

Figure 1 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Figure 2 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Figure 3 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Figure 4 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Viaarxiv icon

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Add code
Bookmark button
Alert button
Sep 20, 2023
Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondřej Bojar

Figure 1 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 2 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 3 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 4 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Viaarxiv icon

Spiking Structured State Space Model for Monaural Speech Enhancement

Sep 07, 2023
Yu Du, Xu Liu, Yansong Chua

Figure 1 for Spiking Structured State Space Model for Monaural Speech Enhancement
Figure 2 for Spiking Structured State Space Model for Monaural Speech Enhancement
Figure 3 for Spiking Structured State Space Model for Monaural Speech Enhancement
Figure 4 for Spiking Structured State Space Model for Monaural Speech Enhancement
Viaarxiv icon

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Add code
Bookmark button
Alert button
Sep 12, 2023
Ahmed Adel Attia, Jing Liu, Wei Ai, Dorottya Demszky, Carol Espy-Wilson

Figure 1 for Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
Figure 2 for Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
Figure 3 for Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
Figure 4 for Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
Viaarxiv icon

Evaluating Self-Supervised Speech Representations for Indigenous American Languages

Oct 05, 2023
Chih-Chen Chen, William Chen, Rodolfo Zevallos, John Ortega

Viaarxiv icon