Picture for Yukun Ma

Yukun Ma

OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment

Add code
Jun 11, 2025
Viaarxiv icon

Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction

Add code
May 27, 2025
Viaarxiv icon

Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context Learning

Add code
Jan 17, 2025
Viaarxiv icon

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

Add code
Jan 17, 2025
Viaarxiv icon

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions

Add code
Sep 25, 2024
Viaarxiv icon

Imagen 3

Add code
Aug 13, 2024
Viaarxiv icon

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

Add code
Jun 17, 2024
Figure 1 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Figure 2 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Figure 3 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Figure 4 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Viaarxiv icon

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

Add code
Jun 04, 2024
Viaarxiv icon

LiqD: A Dynamic Liquid Level Detection Model under Tricky Small Containers

Add code
Mar 13, 2024
Figure 1 for LiqD: A Dynamic Liquid Level Detection Model under Tricky Small Containers
Figure 2 for LiqD: A Dynamic Liquid Level Detection Model under Tricky Small Containers
Figure 3 for LiqD: A Dynamic Liquid Level Detection Model under Tricky Small Containers
Figure 4 for LiqD: A Dynamic Liquid Level Detection Model under Tricky Small Containers
Viaarxiv icon

ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

Add code
Jan 04, 2024
Viaarxiv icon