Alert button

"speech": models, code, and papers
Alert button

Bridging Modalities: Knowledge Distillation and Masked Training for Translating Multi-Modal Emotion Recognition to Uni-Modal, Speech-Only Emotion Recognition

Jan 04, 2024
Muhammad Muaz, Nathan Paull, Jahnavi Malagavalli

Viaarxiv icon

LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data

Dec 15, 2023
Hendrik Laux, Emil Mededovic, Ahmed Hallawa, Lukas Martin, Arne Peine, Anke Schmeink

Figure 1 for LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Figure 2 for LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Figure 3 for LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Figure 4 for LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Viaarxiv icon

MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis

Dec 28, 2023
Wenhao Guan, Yishuang Li, Tao Li, Hukai Huang, Feng Wang, Jiayan Lin, Lingyan Huang, Lin Li, Qingyang Hong

Viaarxiv icon

Efficiency-oriented approaches for self-supervised speech representation learning

Dec 18, 2023
Luis Lugo, Valentin Vielzeuf

Viaarxiv icon

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Jan 11, 2024
Jiaxin Guo, Minghan Wang, Xiaosong Qiao, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhengzhe Yu, Yinglu Li, Chang Su, Min Zhang, Shimin Tao, Hao Yang

Viaarxiv icon

Using Large Language Model for End-to-End Chinese ASR and NER

Jan 21, 2024
Yuang Li, Jiawei Yu, Yanqing Zhao, Min Zhang, Mengxin Ren, Xiaofeng Zhao, Xiaosong Qiao, Chang Su, Miaomiao Ma, Hao Yang

Viaarxiv icon

TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models

Dec 29, 2023
Felipe Oliveira, Victoria Reis, Nelson Ebecken

Viaarxiv icon

SELM: Speech Enhancement Using Discrete Tokens and Language Models

Add code
Bookmark button
Alert button
Dec 15, 2023
Ziqian Wang, Xinfa Zhu, Zihan Zhang, YuanJun Lv, Ning Jiang, Guoqing Zhao, Lei Xie

Viaarxiv icon

Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

Jan 11, 2024
Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

Viaarxiv icon

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

Dec 16, 2023
Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang

Viaarxiv icon