Picture for Yuewen Cao

Yuewen Cao

dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models

Add code
Dec 22, 2025
Viaarxiv icon

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Add code
Aug 28, 2025
Figure 1 for A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
Figure 2 for A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
Figure 3 for A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
Figure 4 for A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
Viaarxiv icon

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

Add code
Jul 23, 2025
Figure 1 for Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling
Figure 2 for Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling
Figure 3 for Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling
Figure 4 for Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling
Viaarxiv icon

Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation

Add code
Jul 17, 2025
Figure 1 for Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation
Figure 2 for Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation
Figure 3 for Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation
Figure 4 for Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation
Viaarxiv icon

OmniCaptioner: One Captioner to Rule Them All

Add code
Apr 09, 2025
Viaarxiv icon

Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT

Add code
Feb 10, 2025
Viaarxiv icon

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion

Add code
May 28, 2021
Figure 1 for DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion
Figure 2 for DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion
Figure 3 for DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion
Viaarxiv icon

VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention

Add code
Feb 12, 2021
Figure 1 for VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Figure 2 for VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Figure 3 for VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Figure 4 for VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Viaarxiv icon

Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling

Add code
Sep 06, 2020
Figure 1 for Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling
Figure 2 for Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling
Figure 3 for Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling
Figure 4 for Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling
Viaarxiv icon