Picture for Yiwei Guo

Yiwei Guo

Towards General Discrete Speech Codec for Complex Acoustic Environments: A Study of Reconstruction and Downstream Task Consistency

Add code
May 28, 2025
Viaarxiv icon

Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate

Add code
May 22, 2025
Viaarxiv icon

TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision

Add code
Mar 10, 2025
Viaarxiv icon

Recent Advances in Discrete Speech Tokens: A Review

Add code
Feb 10, 2025
Viaarxiv icon

Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective

Add code
Dec 22, 2024
Viaarxiv icon

Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding

Add code
Oct 29, 2024
Figure 1 for Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Figure 2 for Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Figure 3 for Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Figure 4 for Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Viaarxiv icon

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Add code
Oct 21, 2024
Viaarxiv icon

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

Add code
Oct 16, 2024
Viaarxiv icon

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

Add code
Sep 03, 2024
Figure 1 for vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Figure 2 for vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Figure 3 for vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Figure 4 for vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Viaarxiv icon

DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

Add code
Jul 18, 2024
Viaarxiv icon