Picture for Yayue Deng

Yayue Deng

MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models

Add code
Oct 31, 2025
Viaarxiv icon

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Figure 1 for Step-Audio 2 Technical Report
Figure 2 for Step-Audio 2 Technical Report
Figure 3 for Step-Audio 2 Technical Report
Figure 4 for Step-Audio 2 Technical Report
Viaarxiv icon

Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

Add code
Jun 06, 2024
Viaarxiv icon

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

Add code
Jun 06, 2024
Figure 1 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Figure 2 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Figure 3 for Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Viaarxiv icon

FMPAF: How Do Fed Chairs Affect the Financial Market? A Fine-grained Monetary Policy Analysis Framework on Their Language

Add code
Mar 10, 2024
Viaarxiv icon

Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

Add code
Jan 02, 2024
Viaarxiv icon

Frame-level emotional state alignment method for speech emotion recognition

Add code
Dec 27, 2023
Viaarxiv icon

CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis

Add code
Dec 16, 2023
Viaarxiv icon

Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Add code
Jun 05, 2023
Figure 1 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 2 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 3 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 4 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Viaarxiv icon

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis

Add code
May 03, 2023
Figure 1 for M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis
Figure 2 for M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis
Figure 3 for M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis
Viaarxiv icon