Alert button

"speech": models, code, and papers
Alert button

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

Add code
Bookmark button
Alert button
Mar 31, 2022
Zhihao Du, Shiliang Zhang, Siqi Zheng, Zhijie Yan

Figure 1 for Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios
Figure 2 for Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios
Figure 3 for Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios
Figure 4 for Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios
Viaarxiv icon

U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

Dec 11, 2021
Yi Li, Yang Sun, Syed Mohsen Naqvi

Figure 1 for U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement
Figure 2 for U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement
Figure 3 for U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement
Figure 4 for U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement
Viaarxiv icon

Predicting score distribution to improve non-intrusive speech quality estimation

Apr 13, 2022
Abu Zaher Md Faridee, Hannes Gamper

Figure 1 for Predicting score distribution to improve non-intrusive speech quality estimation
Figure 2 for Predicting score distribution to improve non-intrusive speech quality estimation
Figure 3 for Predicting score distribution to improve non-intrusive speech quality estimation
Figure 4 for Predicting score distribution to improve non-intrusive speech quality estimation
Viaarxiv icon

Can we still use PEAQ? A Performance Analysis of the ITU Standard for the Objective Assessment of Perceived Audio Quality

Dec 02, 2022
Pablo M. Delgado, Jürgen Herre

Figure 1 for Can we still use PEAQ? A Performance Analysis of the ITU Standard for the Objective Assessment of Perceived Audio Quality
Figure 2 for Can we still use PEAQ? A Performance Analysis of the ITU Standard for the Objective Assessment of Perceived Audio Quality
Figure 3 for Can we still use PEAQ? A Performance Analysis of the ITU Standard for the Objective Assessment of Perceived Audio Quality
Figure 4 for Can we still use PEAQ? A Performance Analysis of the ITU Standard for the Objective Assessment of Perceived Audio Quality
Viaarxiv icon

SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis

Add code
Bookmark button
Alert button
Apr 06, 2022
Georgia Maniati, Alexandra Vioni, Nikolaos Ellinas, Karolos Nikitaras, Konstantinos Klapsas, June Sig Sung, Gunu Jho, Aimilios Chalamandaris, Pirros Tsiakoulis

Figure 1 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Figure 2 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Figure 3 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Figure 4 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Viaarxiv icon

Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression

Mar 30, 2022
Salvatore Fara, Stefano Goria, Emilia Molimpakis, Nicholas Cummins

Figure 1 for Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression
Figure 2 for Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression
Figure 3 for Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression
Figure 4 for Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression
Viaarxiv icon

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS

Oct 20, 2022
Chunyu Qiang, Jianhua Tao, Ruibo Fu, Zhengqi Wen, Jiangyan Yi, Tao Wang, Shiming Wang

Viaarxiv icon

EEG-Transformer: Self-attention from Transformer Architecture for Decoding EEG of Imagined Speech

Dec 15, 2021
Young-Eun Lee, Seo-Hyun Lee

Figure 1 for EEG-Transformer: Self-attention from Transformer Architecture for Decoding EEG of Imagined Speech
Figure 2 for EEG-Transformer: Self-attention from Transformer Architecture for Decoding EEG of Imagined Speech
Viaarxiv icon

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

Jun 24, 2022
Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng

Figure 1 for Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Figure 2 for Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Figure 3 for Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Figure 4 for Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Viaarxiv icon

EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement

Add code
Bookmark button
Alert button
Feb 14, 2022
Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Yu Tsao

Figure 1 for EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement
Figure 2 for EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement
Figure 3 for EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement
Figure 4 for EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement
Viaarxiv icon