Alert button

"speech": models, code, and papers
Alert button

Robustness of Multi-Source MT to Transcription Errors

May 26, 2023
Dominik Macháček, Peter Polák, Ondřej Bojar, Raj Dabre

Figure 1 for Robustness of Multi-Source MT to Transcription Errors
Figure 2 for Robustness of Multi-Source MT to Transcription Errors
Figure 3 for Robustness of Multi-Source MT to Transcription Errors
Figure 4 for Robustness of Multi-Source MT to Transcription Errors
Viaarxiv icon

BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

May 26, 2023
Claytone Sikasote, Eunice Mukonde, Md Mahfuz Ibn Alam, Antonios Anastasopoulos

Figure 1 for BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
Figure 2 for BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
Figure 3 for BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
Figure 4 for BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
Viaarxiv icon

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models

Jun 12, 2023
Ji-Sang Hwang, Sang-Hoon Lee, Seong-Whan Lee

Figure 1 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Figure 2 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Figure 3 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Figure 4 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Viaarxiv icon

An automated method for the ontological representation of security directives

Jun 30, 2023
Giampaolo Bella, Gianpietro Castiglione, Daniele Francesco Santamaria

Figure 1 for An automated method for the ontological representation of security directives
Figure 2 for An automated method for the ontological representation of security directives
Figure 3 for An automated method for the ontological representation of security directives
Figure 4 for An automated method for the ontological representation of security directives
Viaarxiv icon

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

Jun 26, 2023
Mingyu Cui, Jiawen Kang, Jiajun Deng, Xi Yin, Yutao Xie, Xie Chen, Xunying Liu

Figure 1 for Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Figure 2 for Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Figure 3 for Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Figure 4 for Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Viaarxiv icon

Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker

Jan 31, 2023
Navjot Kaur, Paige Tuttosi

Figure 1 for Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker
Figure 2 for Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker
Figure 3 for Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker
Figure 4 for Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker
Viaarxiv icon

FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction

Jul 08, 2023
Ganglai Wang, Peng Zhang, Junwen Xiong, Feihan Yang, Wei Huang, Yufei Zha

Figure 1 for FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction
Figure 2 for FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction
Figure 3 for FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction
Figure 4 for FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction
Viaarxiv icon

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Jun 06, 2023
Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, Roger Dannenberg, Ruibo Liu, Wenhu Chen, Gus Xia, Yemin Shi, Wenhao Huang, Yike Guo, Jie Fu

Figure 1 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Figure 2 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Figure 3 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Figure 4 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Viaarxiv icon

CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation

Jun 27, 2023
Yuhao Cui, Xiongwei Wang, Zhongzhou Zhao, Wei Zhou, Haiqing Chen

Figure 1 for CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation
Figure 2 for CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation
Figure 3 for CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation
Figure 4 for CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation
Viaarxiv icon

Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture

Mar 17, 2023
Julien Hauret, Thomas Joubaud, Véronique Zimpfer, Éric Bavu

Figure 1 for Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Figure 2 for Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Figure 3 for Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Figure 4 for Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Viaarxiv icon