Picture for Songxiang Liu

Songxiang Liu

Kimi-Audio Technical Report

Add code
Apr 25, 2025
Viaarxiv icon

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Add code
Mar 03, 2025
Viaarxiv icon

Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models

Add code
Sep 21, 2024
Figure 1 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 2 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 3 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 4 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Viaarxiv icon

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Add code
Oct 11, 2023
Figure 1 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 2 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 3 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 4 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Viaarxiv icon

SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias

Add code
Sep 14, 2023
Figure 1 for SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias
Figure 2 for SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias
Figure 3 for SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias
Figure 4 for SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias
Viaarxiv icon

The Singing Voice Conversion Challenge 2023

Add code
Jul 06, 2023
Figure 1 for The Singing Voice Conversion Challenge 2023
Figure 2 for The Singing Voice Conversion Challenge 2023
Figure 3 for The Singing Voice Conversion Challenge 2023
Figure 4 for The Singing Voice Conversion Challenge 2023
Viaarxiv icon

Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model

Add code
May 26, 2023
Viaarxiv icon

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

Add code
May 07, 2023
Viaarxiv icon

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt

Add code
Jan 31, 2023
Figure 1 for InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Figure 2 for InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Figure 3 for InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Figure 4 for InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Viaarxiv icon

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS

Add code
Nov 04, 2022
Viaarxiv icon