Alert button

"speech": models, code, and papers
Alert button

OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation

Aug 08, 2023
Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An

Figure 1 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Figure 2 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Figure 3 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Figure 4 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Viaarxiv icon

Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters

Apr 24, 2023
Kristina Tesch, Timo Gerkmann

Figure 1 for Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters
Figure 2 for Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters
Figure 3 for Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters
Figure 4 for Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters
Viaarxiv icon

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Add code
Bookmark button
Alert button
Jul 24, 2023
Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen

Viaarxiv icon

Exploration on HuBERT with Multiple Resolutions

Add code
Bookmark button
Alert button
Jun 22, 2023
Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu GOng, Juan Pino, Shinji Watanabe

Figure 1 for Exploration on HuBERT with Multiple Resolutions
Figure 2 for Exploration on HuBERT with Multiple Resolutions
Figure 3 for Exploration on HuBERT with Multiple Resolutions
Figure 4 for Exploration on HuBERT with Multiple Resolutions
Viaarxiv icon

ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement

Add code
Bookmark button
Alert button
May 15, 2023
Feng Dang, Qi Hu, Pengyuan Zhang, Yonghong Yan

Figure 1 for ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement
Figure 2 for ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement
Figure 3 for ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement
Figure 4 for ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement
Viaarxiv icon

Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge

Mar 14, 2023
Mingshuai Liu, Shubo Lv, Zihan Zhang, Runduo Han, Xiang Hao, Xianjun Xia, Li Chen, Yijian Xiao, Lei Xie

Figure 1 for Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge
Figure 2 for Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge
Viaarxiv icon

Leveraging Large Text Corpora for End-to-End Speech Summarization

Add code
Bookmark button
Alert button
Mar 02, 2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura

Figure 1 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 2 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 3 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 4 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Viaarxiv icon

Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition

Add code
Bookmark button
Alert button
Feb 18, 2023
Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng

Figure 1 for Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition
Figure 2 for Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition
Figure 3 for Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition
Figure 4 for Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition
Viaarxiv icon

SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks

Add code
Bookmark button
Alert button
Mar 01, 2023
Kai-Wei Chang, Yu-Kai Wang, Hua Shen, Iu-thing Kang, Wei-Cheng Tseng, Shang-Wen Li, Hung-yi Lee

Figure 1 for SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks
Figure 2 for SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks
Figure 3 for SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks
Figure 4 for SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks
Viaarxiv icon

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

Feb 27, 2023
Yoohwan Kwon, Soo-Whan Chung

Figure 1 for MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition
Figure 2 for MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition
Figure 3 for MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition
Figure 4 for MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition
Viaarxiv icon