Picture for Wen Wang

Wen Wang

Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation

Add code
May 30, 2025
Viaarxiv icon

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

Add code
May 29, 2025
Viaarxiv icon

Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

Add code
May 26, 2025
Viaarxiv icon

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

Add code
May 23, 2025
Viaarxiv icon

PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models

Add code
May 21, 2025
Viaarxiv icon

Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization

Add code
May 20, 2025
Viaarxiv icon

Task-Agnostic Semantic Communications Relying on Information Bottleneck and Federated Meta-Learning

Add code
Apr 30, 2025
Viaarxiv icon

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Add code
Apr 22, 2025
Viaarxiv icon

OmniAudio: Generating Spatial Audio from 360-Degree Video

Add code
Apr 21, 2025
Viaarxiv icon

UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook

Add code
Feb 27, 2025
Viaarxiv icon