Alert button

"speech recognition": models, code, and papers
Alert button

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

Jul 14, 2023
Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur

Figure 1 for The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Figure 2 for The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Figure 3 for The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Figure 4 for The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Viaarxiv icon

SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation

Jul 14, 2023
Dohyun Kim, Yeseung Kim, Jaehwi Jang, Minjae Song, Woojin Choi, Daehyung Park

Figure 1 for SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation
Figure 2 for SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation
Figure 3 for SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation
Figure 4 for SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation
Viaarxiv icon

Speech-dependent Modeling of Own Voice Transfer Characteristics for In-ear Microphones in Hearables

Sep 15, 2023
Mattes Ohlenbusch, Christian Rollwage, Simon Doclo

Viaarxiv icon

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

Jul 06, 2023
Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu

Figure 1 for Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition
Figure 2 for Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition
Figure 3 for Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition
Figure 4 for Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition
Viaarxiv icon

Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning

Dec 10, 2022
Chen Chen, Yuchen Hu, Qiang Zhang, Heqing Zou, Beier Zhu, Eng Siong Chng

Figure 1 for Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning
Figure 2 for Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning
Figure 3 for Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning
Figure 4 for Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning
Viaarxiv icon

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

Jul 27, 2023
Kun Yuan, Vinkle Srivastav, Tong Yu, Joel Lavanchy, Pietro Mascagni, Nassir Navab, Nicolas Padoy

Figure 1 for Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Figure 2 for Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Figure 3 for Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Figure 4 for Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Viaarxiv icon

Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition

Feb 28, 2023
Zhijie Shen, Wu Guo, Bin Gu

Figure 1 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Figure 2 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Figure 3 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Figure 4 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Viaarxiv icon

A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Jul 20, 2023
Anand Kumar Rai, Siddharth D Jaiswal, Animesh Mukherjee

Figure 1 for A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos
Figure 2 for A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos
Figure 3 for A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos
Figure 4 for A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos
Viaarxiv icon

MASR: Metadata Aware Speech Representation

Jul 20, 2023
Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth

Figure 1 for MASR: Metadata Aware Speech Representation
Figure 2 for MASR: Metadata Aware Speech Representation
Figure 3 for MASR: Metadata Aware Speech Representation
Figure 4 for MASR: Metadata Aware Speech Representation
Viaarxiv icon

DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization

Aug 04, 2023
Wang Lu, Jindong Wang, Xinwei Sun, Yiqiang Chen, Xiangyang Ji, Qiang Yang, Xing Xie

Figure 1 for DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization
Figure 2 for DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization
Figure 3 for DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization
Figure 4 for DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization
Viaarxiv icon