Alert button

"speech": models, code, and papers
Alert button

PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

Add code
Bookmark button
Alert button
Sep 17, 2023
Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, Jingjing Yin, Hongbin Zhou, Heng Lu, Lei Xie

Figure 1 for PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts
Figure 2 for PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts
Figure 3 for PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts
Figure 4 for PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts
Viaarxiv icon

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

Add code
Bookmark button
Alert button
Sep 28, 2023
Xiang Lyu, Yuhang Cao, Qing Wang, Jingjing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu

Figure 1 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 2 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 3 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Figure 4 for PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Viaarxiv icon

Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion

Jun 28, 2023
Zhe Ye, Terui Mao, Li Dong, Diqun Yan

Figure 1 for Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion
Figure 2 for Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion
Figure 3 for Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion
Figure 4 for Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion
Viaarxiv icon

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Add code
Bookmark button
Alert button
Jun 25, 2023
Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu

Figure 1 for DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Figure 2 for DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Figure 3 for DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Figure 4 for DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Viaarxiv icon

Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation

Add code
Bookmark button
Alert button
May 23, 2023
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Figure 1 for Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Figure 2 for Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Figure 3 for Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Figure 4 for Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Viaarxiv icon

The Art of Embedding Fusion: Optimizing Hate Speech Detection

Add code
Bookmark button
Alert button
Jun 26, 2023
Mohammad Aflah Khan, Neemesh Yadav, Mohit Jain, Sanyam Goyal

Figure 1 for The Art of Embedding Fusion: Optimizing Hate Speech Detection
Figure 2 for The Art of Embedding Fusion: Optimizing Hate Speech Detection
Figure 3 for The Art of Embedding Fusion: Optimizing Hate Speech Detection
Figure 4 for The Art of Embedding Fusion: Optimizing Hate Speech Detection
Viaarxiv icon

Prompt-to-OS (P2OS): Revolutionizing Operating Systems and Human-Computer Interaction with Integrated AI Generative Models

Oct 07, 2023
Gabriele Tolomei, Cesare Campagnano, Fabrizio Silvestri, Giovanni Trappolini

Viaarxiv icon

AudioSR: Versatile Audio Super-resolution at Scale

Add code
Bookmark button
Alert button
Sep 13, 2023
Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley

Figure 1 for AudioSR: Versatile Audio Super-resolution at Scale
Figure 2 for AudioSR: Versatile Audio Super-resolution at Scale
Figure 3 for AudioSR: Versatile Audio Super-resolution at Scale
Figure 4 for AudioSR: Versatile Audio Super-resolution at Scale
Viaarxiv icon

A Flexible Online Framework for Projection-Based STFT Phase Retrieval

Sep 13, 2023
Tal Peer, Simon Welker, Johannes Kolhoff, Timo Gerkmann

Viaarxiv icon

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

Add code
Bookmark button
Alert button
Jun 05, 2023
Yochai Yemini, Aviv Shamsian, Lior Bracha, Sharon Gannot, Ethan Fetaya

Figure 1 for LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Figure 2 for LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Figure 3 for LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Figure 4 for LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Viaarxiv icon