Alert button

"speech": models, code, and papers
Alert button

Autism Detection in Speech -- A Survey

Feb 20, 2024
Nadine Probol, Margot Mieskes

Viaarxiv icon

UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL Models

Feb 14, 2024
Ruchao Fan, Natarajan Balaji Shanka, Abeer Alwan

Viaarxiv icon

Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models

Mar 17, 2024
Mohamed Taher Alrefaie, Nour Eldin Morsy, Nada Samir

Figure 1 for Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models
Figure 2 for Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models
Figure 3 for Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models
Figure 4 for Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models
Viaarxiv icon

Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models

Add code
Bookmark button
Alert button
Mar 14, 2024
Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia, Jungho Jung, Harshith Goka, Haejun Lee

Figure 1 for Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
Figure 2 for Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
Figure 3 for Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
Figure 4 for Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
Viaarxiv icon

CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean

Add code
Bookmark button
Alert button
Mar 15, 2024
Eunsu Kim, Juyoung Suk, Philhoon Oh, Haneul Yoo, James Thorne, Alice Oh

Figure 1 for CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
Figure 2 for CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
Figure 3 for CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
Figure 4 for CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
Viaarxiv icon

Persian Speech Emotion Recognition by Fine-Tuning Transformers

Feb 11, 2024
Minoo Shayaninasab, Bagher Babaali

Viaarxiv icon

Deep Learning-Based Speech and Vision Synthesis to Improve Phishing Attack Detection through a Multi-layer Adaptive Framework

Add code
Bookmark button
Alert button
Feb 27, 2024
Tosin Ige, Christopher Kiekintveld, Aritran Piplai

Viaarxiv icon

An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement

Feb 27, 2024
Tzu-Ting Yang, Hsin-Wei Wang, Yi-Cheng Wang, Chi-Han Lin, Berlin Chen

Viaarxiv icon

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Feb 15, 2024
Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, Alexis Moinet, Sri Karlapati, Ewa Muszyńska, Haohan Guo, Bartosz Putrycz, Soledad López Gambino, Kayeon Yoo, Elena Sokolova, Thomas Drugman

Viaarxiv icon

Real-Time Multimodal Cognitive Assistant for Emergency Medical Services

Add code
Bookmark button
Alert button
Mar 11, 2024
Keshara Weerasinghe, Saahith Janapati, Xueren Ge, Sion Kim, Sneha Iyer, John A. Stankovic, Homa Alemzadeh

Figure 1 for Real-Time Multimodal Cognitive Assistant for Emergency Medical Services
Figure 2 for Real-Time Multimodal Cognitive Assistant for Emergency Medical Services
Figure 3 for Real-Time Multimodal Cognitive Assistant for Emergency Medical Services
Figure 4 for Real-Time Multimodal Cognitive Assistant for Emergency Medical Services
Viaarxiv icon