Picture for Huadai Liu

Huadai Liu

STAR-VAE: Structured Topology-Aware Regularization for Audio Reconstruction and Generation

Add code
Jun 22, 2026
Viaarxiv icon

AudioCALM: Continuous Autoregressive Language Modeling for Universal Audio Generation

Add code
Jun 22, 2026
Viaarxiv icon

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Add code
Jun 01, 2026
Viaarxiv icon

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Add code
Oct 10, 2025
Viaarxiv icon

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

Add code
Jun 26, 2025
Figure 1 for ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
Figure 2 for ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
Figure 3 for ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
Figure 4 for ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
Viaarxiv icon

OmniAudio: Generating Spatial Audio from 360-Degree Video

Add code
Apr 21, 2025
Figure 1 for OmniAudio: Generating Spatial Audio from 360-Degree Video
Figure 2 for OmniAudio: Generating Spatial Audio from 360-Degree Video
Figure 3 for OmniAudio: Generating Spatial Audio from 360-Degree Video
Figure 4 for OmniAudio: Generating Spatial Audio from 360-Degree Video
Viaarxiv icon

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

Add code
Dec 13, 2024
Figure 1 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Figure 2 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Figure 3 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Figure 4 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Viaarxiv icon

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

Add code
Oct 16, 2024
Figure 1 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 2 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 3 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 4 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Viaarxiv icon

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Add code
Oct 14, 2024
Figure 1 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 2 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 3 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 4 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Viaarxiv icon

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

Add code
Jul 18, 2024
Viaarxiv icon