Alert button

"speech": models, code, and papers
Alert button

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Dec 30, 2022
Yukun Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

Figure 1 for Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition
Figure 2 for Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition
Figure 3 for Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition
Figure 4 for Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition
Viaarxiv icon

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

Oct 16, 2022
Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee

Figure 1 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 2 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 3 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 4 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Viaarxiv icon

Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization

Oct 26, 2022
Hexin Liu, Haihua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He, Sanjeev Khudanpur

Figure 1 for Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization
Figure 2 for Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization
Figure 3 for Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization
Figure 4 for Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization
Viaarxiv icon

Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

Oct 26, 2022
Marie Kunešová, Zbyněk Zajíc

Figure 1 for Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0
Figure 2 for Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0
Figure 3 for Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0
Figure 4 for Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0
Viaarxiv icon

Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages

Oct 07, 2022
Lei Wang, Rong Tong, Cheung Chi Leung, Sunil Sivadas, Chongjia Ni, Bin Ma

Figure 1 for Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages
Figure 2 for Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages
Figure 3 for Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages
Figure 4 for Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages
Viaarxiv icon

Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations

Jun 25, 2022
Chin-Cheng Hsu

Figure 1 for Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations
Figure 2 for Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations
Figure 3 for Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations
Figure 4 for Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations
Viaarxiv icon

PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping

Nov 08, 2022
Junhyeok Lee, Seungu Han, Hyunjae Cho, Wonbin Jung

Figure 1 for PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping
Figure 2 for PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping
Figure 3 for PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping
Figure 4 for PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping
Viaarxiv icon

I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue

Mar 02, 2023
Yuanchao Li, Koji Inoue, Leimin Tian, Changzeng Fu, Carlos Ishi, Hiroshi Ishiguro, Tatsuya Kawahara, Catherine Lai

Figure 1 for I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue
Figure 2 for I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue
Figure 3 for I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue
Figure 4 for I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue
Viaarxiv icon

Monolingual Recognizers Fusion for Code-switching Speech Recognition

Nov 02, 2022
Tongtong Song, Qiang Xu, Haoyu Lu, Longbiao Wang, Hao Shi, Yuqin Lin, Yanbing Yang, Jianwu Dang

Figure 1 for Monolingual Recognizers Fusion for Code-switching Speech Recognition
Figure 2 for Monolingual Recognizers Fusion for Code-switching Speech Recognition
Figure 3 for Monolingual Recognizers Fusion for Code-switching Speech Recognition
Figure 4 for Monolingual Recognizers Fusion for Code-switching Speech Recognition
Viaarxiv icon

An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

Oct 06, 2022
Andreas Triantafyllopoulos, Björn W. Schuller, Gökçe İymen, Metin Sezgin, Xiangheng He, Zijiang Yang, Panagiotis Tzirakis, Shuo Liu, Silvan Mertes, Elisabeth André, Ruibo Fu, Jianhua Tao

Figure 1 for An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Figure 2 for An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Figure 3 for An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Figure 4 for An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Viaarxiv icon