Alert button

"speech": models, code, and papers
Alert button

ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models

Add code
Bookmark button
Alert button
Mar 29, 2024
Thibaut Thonet, Jos Rozen, Laurent Besacier

Figure 1 for ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Figure 2 for ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Figure 3 for ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Figure 4 for ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Viaarxiv icon

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

Mar 21, 2024
Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang

Figure 1 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 2 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 3 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 4 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Viaarxiv icon

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

Add code
Bookmark button
Alert button
Feb 28, 2024
Duo Ma, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

Viaarxiv icon

NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction

Mar 04, 2024
Snehesh Shrestha, Yantian Zha, Saketh Banagiri, Ge Gao, Yiannis Aloimonos, Cornelia Fermuller

Figure 1 for NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction
Figure 2 for NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction
Figure 3 for NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction
Figure 4 for NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction
Viaarxiv icon

KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

Mar 20, 2024
Huali Zhou, Yuke Lin, Dong Liu, Ming Li

Figure 1 for KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario
Figure 2 for KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario
Figure 3 for KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario
Figure 4 for KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario
Viaarxiv icon

Direct Punjabi to English speech translation using discrete units

Add code
Bookmark button
Alert button
Feb 25, 2024
Prabhjot Kaur, L. Andrew M. Bush, Weisong Shi

Viaarxiv icon

Multilingual Speech Models for Automatic Speech Recognition Exhibit Gender Performance Gaps

Add code
Bookmark button
Alert button
Feb 28, 2024
Giuseppe Attanasio, Beatrice Savoldi, Dennis Fucci, Dirk Hovy

Viaarxiv icon

VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

Mar 01, 2024
Weiwei Lin, Chenhang He, Man-Wai Mak, Jiachen Lian, Kong Aik Lee

Figure 1 for VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Figure 2 for VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Figure 3 for VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Figure 4 for VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Viaarxiv icon

Exploring language relations through syntactic distances and geographic proximity

Mar 27, 2024
Juan De Gregorio, Raúl Toral, David Sánchez

Viaarxiv icon

Advanced Artificial Intelligence Algorithms in Cochlear Implants: Review of Healthcare Strategies, Challenges, and Perspectives

Mar 17, 2024
Billel Essaid, Hamza Kheddar, Noureddine Batel, Abderrahmane Lakas, Muhammad E. H. Chowdhury

Viaarxiv icon