Alert button

"speech": models, code, and papers
Alert button

Attention-based Multi-hypothesis Fusion for Speech Summarization

Add code
Bookmark button
Alert button
Nov 16, 2021
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe

Figure 1 for Attention-based Multi-hypothesis Fusion for Speech Summarization
Figure 2 for Attention-based Multi-hypothesis Fusion for Speech Summarization
Figure 3 for Attention-based Multi-hypothesis Fusion for Speech Summarization
Figure 4 for Attention-based Multi-hypothesis Fusion for Speech Summarization
Viaarxiv icon

Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

Add code
Bookmark button
Alert button
Jul 02, 2022
Guangzhi Sun, Chao Zhang, Philip C. Woodland

Figure 1 for Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition
Figure 2 for Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition
Figure 3 for Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition
Figure 4 for Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition
Viaarxiv icon

Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information

May 08, 2022
Chi-Luen Feng, Po-chun Hsu, Hung-yi Lee

Figure 1 for Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Figure 2 for Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Figure 3 for Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Figure 4 for Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Viaarxiv icon

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

Nov 01, 2022
Anusha Prakash, Arun Kumar, Ashish Seth, Bhagyashree Mukherjee, Ishika Gupta, Jom Kuriakose, Jordan Fernandes, K V Vikram, Mano Ranjith Kumar M, Metilda Sagaya Mary, Mohammad Wajahat, Mohana N, Mudit Batra, Navina K, Nihal John George, Nithya Ravi, Pruthwik Mishra, Sudhanshu Srivastava, Vasista Sai Lodagala, Vandan Mujadia, Kada Sai Venkata Vineeth, Vrunda Sukhadia, Dipti Sharma, Hema Murthy, Pushpak Bhattacharya, S Umesh, Rajeev Sangal

Figure 1 for Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages
Figure 2 for Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages
Viaarxiv icon

MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator

Add code
Bookmark button
Alert button
Sep 23, 2022
Tobias Cord-Landwehr, Thilo von Neumann, Christoph Boeddeker, Reinhold Haeb-Umbach

Figure 1 for MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator
Figure 2 for MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator
Figure 3 for MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator
Figure 4 for MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator
Viaarxiv icon

The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge

Oct 26, 2022
Yuhao Liang, Peikun Chen, Fan Yu, Xinfa Zhu, Tianyi Xu, Lei Xie

Figure 1 for The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge
Figure 2 for The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge
Figure 3 for The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge
Figure 4 for The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge
Viaarxiv icon

The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage

Add code
Bookmark button
Alert button
Nov 17, 2021
Daniel Galvez, Greg Diamos, Juan Ciro, Juan Felipe Cerón, Keith Achorn, Anjali Gopi, David Kanter, Maximilian Lam, Mark Mazumder, Vijay Janapa Reddi

Figure 1 for The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
Figure 2 for The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
Figure 3 for The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
Figure 4 for The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
Viaarxiv icon

Neural Architecture Search for Speech Emotion Recognition

Mar 31, 2022
Xixin Wu, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng

Figure 1 for Neural Architecture Search for Speech Emotion Recognition
Figure 2 for Neural Architecture Search for Speech Emotion Recognition
Figure 3 for Neural Architecture Search for Speech Emotion Recognition
Figure 4 for Neural Architecture Search for Speech Emotion Recognition
Viaarxiv icon

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

Add code
Bookmark button
Alert button
Nov 11, 2022
Jiangyan Yi, Chenglong Wang, Jianhua Tao, Zhengkun Tian, Cunhang Fan, Haoxin Ma, Ruibo Fu

Figure 1 for SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection
Figure 2 for SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection
Figure 3 for SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection
Figure 4 for SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection
Viaarxiv icon

Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers

Feb 16, 2022
Yotaro Kubo, Shigeki Karita, Michiel Bacchiani

Figure 1 for Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers
Figure 2 for Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers
Figure 3 for Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers
Figure 4 for Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers
Viaarxiv icon