Alert button

"speech": models, code, and papers
Alert button

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Add code
Bookmark button
Alert button
Aug 22, 2023
Harunori Kawano, Sota Shimizu

Figure 1 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 2 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 3 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 4 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Viaarxiv icon

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition

Jun 16, 2023
Yu Pan, Yanni Hu, Yuguang Yang, Jixun Yao, Wen Fei, Lei Ma, Heng Lu

Figure 1 for GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition
Figure 2 for GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition
Figure 3 for GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition
Viaarxiv icon

TRAVID: An End-to-End Video Translation Framework

Add code
Bookmark button
Alert button
Sep 20, 2023
Prottay Kumar Adhikary, Bandaru Sugandhi, Subhojit Ghimire, Santanu Pal, Partha Pakray

Figure 1 for TRAVID: An End-to-End Video Translation Framework
Figure 2 for TRAVID: An End-to-End Video Translation Framework
Figure 3 for TRAVID: An End-to-End Video Translation Framework
Figure 4 for TRAVID: An End-to-End Video Translation Framework
Viaarxiv icon

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

Add code
Bookmark button
Alert button
Sep 19, 2023
Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee

Figure 1 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 2 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 3 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Viaarxiv icon

Multimodal Modeling For Spoken Language Identification

Add code
Bookmark button
Alert button
Sep 19, 2023
Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa

Figure 1 for Multimodal Modeling For Spoken Language Identification
Figure 2 for Multimodal Modeling For Spoken Language Identification
Figure 3 for Multimodal Modeling For Spoken Language Identification
Figure 4 for Multimodal Modeling For Spoken Language Identification
Viaarxiv icon

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling

Add code
Bookmark button
Alert button
Jul 13, 2023
He Huang, Jagadeesh Balam, Boris Ginsburg

Figure 1 for Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling
Figure 2 for Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling
Figure 3 for Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling
Figure 4 for Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling
Viaarxiv icon

SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres

Add code
Bookmark button
Alert button
May 26, 2023
Shumin Deng, Shengyu Mao, Ningyu Zhang, Bryan Hooi

Figure 1 for SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres
Figure 2 for SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres
Figure 3 for SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres
Figure 4 for SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres
Viaarxiv icon

Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement

Jun 15, 2023
Liang Wan, Hongqing Liu, Yi Zhou, Jie Ji

Figure 1 for Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement
Figure 2 for Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement
Figure 3 for Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement
Figure 4 for Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement
Viaarxiv icon

Ripple sparse self-attention for monaural speech enhancement

May 15, 2023
Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li

Figure 1 for Ripple sparse self-attention for monaural speech enhancement
Figure 2 for Ripple sparse self-attention for monaural speech enhancement
Figure 3 for Ripple sparse self-attention for monaural speech enhancement
Figure 4 for Ripple sparse self-attention for monaural speech enhancement
Viaarxiv icon

Efficient Face Detection with Audio-Based Region Proposals

Add code
Bookmark button
Alert button
Sep 14, 2023
William Aris, François Grondin

Viaarxiv icon