Alert button

"speech": models, code, and papers
Alert button

Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech

Add code
Bookmark button
Alert button
Sep 20, 2023
Bagus Tris Atmaja, Akira Sasou

Figure 1 for Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech
Figure 2 for Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech
Figure 3 for Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech
Figure 4 for Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech
Viaarxiv icon

Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features

Add code
Bookmark button
Alert button
Sep 14, 2023
Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena Baralis

Figure 1 for Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Figure 2 for Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Figure 3 for Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Figure 4 for Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Viaarxiv icon

Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement

Sep 20, 2023
Shafique Ahmed, Chia-Wei Chen, Wenze Ren, Chin-Jou Li, Ernie Chu, Jun-Cheng Chen, Amir Hussain, Hsin-Min Wang, Yu Tsao, Jen-Cheng Hou

Figure 1 for Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Figure 2 for Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Figure 3 for Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Figure 4 for Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Viaarxiv icon

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

Add code
Bookmark button
Alert button
Sep 15, 2023
Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro

Figure 1 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 2 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 3 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 4 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Viaarxiv icon

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

Add code
Bookmark button
Alert button
Aug 21, 2023
Hakan Erdogan, Scott Wisdom, Xuankai Chang, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey

Figure 1 for TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Figure 2 for TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Figure 3 for TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Viaarxiv icon

Evaluating Self-Supervised Speech Representations for Indigenous American Languages

Oct 05, 2023
Chih-Chen Chen, William Chen, Rodolfo Zevallos, John Ortega

Viaarxiv icon

Soft Random Sampling: A Theoretical and Empirical Analysis

Nov 21, 2023
Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei Zhang, George Saon, Brian Kingsbury

Viaarxiv icon

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

Sep 21, 2023
Yang Li, Liangzhen Lai, Yuan Shangguan, Forrest N. Iandola, Ernie Chang, Yangyang Shi, Vikas Chandra

Figure 1 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Figure 2 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Figure 3 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Figure 4 for Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Viaarxiv icon

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

Add code
Bookmark button
Alert button
Oct 06, 2023
Junkun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li

Viaarxiv icon

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Add code
Bookmark button
Alert button
Sep 20, 2023
Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondřej Bojar

Figure 1 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 2 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 3 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 4 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Viaarxiv icon