Picture for Yanqing Liu

Yanqing Liu

CAST: Modeling Visual State Transitions for Consistent Video Retrieval

Add code
Mar 09, 2026
Viaarxiv icon

Investigating Group Relative Policy Optimization for Diffusion Transformer based Text-to-Audio Generation

Add code
Mar 02, 2026
Viaarxiv icon

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Add code
Jan 21, 2026
Viaarxiv icon

A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation

Add code
Jan 18, 2026
Viaarxiv icon

Next Tokens Denoising for Speech Synthesis

Add code
Jul 30, 2025
Viaarxiv icon

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling

Add code
Jun 14, 2025
Viaarxiv icon

Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling

Add code
May 26, 2025
Viaarxiv icon

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Add code
May 07, 2025
Figure 1 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 2 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 3 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 4 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Viaarxiv icon

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

Add code
Apr 14, 2025
Figure 1 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 2 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 3 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 4 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Viaarxiv icon

Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners

Add code
Dec 06, 2024
Figure 1 for Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Figure 2 for Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Figure 3 for Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Figure 4 for Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Viaarxiv icon