Picture for Youzheng Wu

Youzheng Wu

JoyAvatar: Real-time and Infinite Audio-Driven Avatar Generation with Autoregressive Diffusion

Add code
Dec 12, 2025
Viaarxiv icon

PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition

Add code
Sep 16, 2025
Figure 1 for PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
Figure 2 for PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
Figure 3 for PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
Figure 4 for PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
Viaarxiv icon

UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition

Add code
Dec 23, 2024
Figure 1 for UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition
Figure 2 for UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition
Figure 3 for UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition
Figure 4 for UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition
Viaarxiv icon

Leveraging Label Information for Multimodal Emotion Recognition

Add code
Sep 05, 2023
Viaarxiv icon

AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets

Add code
Jun 16, 2023
Figure 1 for AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets
Figure 2 for AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets
Figure 3 for AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets
Figure 4 for AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets
Viaarxiv icon

OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition

Add code
Jun 05, 2023
Figure 1 for OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition
Figure 2 for OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition
Figure 3 for OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition
Figure 4 for OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition
Viaarxiv icon

SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation

Add code
Nov 27, 2022
Figure 1 for SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation
Figure 2 for SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation
Figure 3 for SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation
Figure 4 for SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation
Viaarxiv icon

Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement

Add code
Nov 22, 2022
Viaarxiv icon

MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy

Add code
Nov 11, 2022
Viaarxiv icon

MoNET: Tackle State Momentum via Noise-Enhanced Training for Dialogue State Tracking

Add code
Nov 11, 2022
Figure 1 for MoNET: Tackle State Momentum via Noise-Enhanced Training for Dialogue State Tracking
Figure 2 for MoNET: Tackle State Momentum via Noise-Enhanced Training for Dialogue State Tracking
Figure 3 for MoNET: Tackle State Momentum via Noise-Enhanced Training for Dialogue State Tracking
Figure 4 for MoNET: Tackle State Momentum via Noise-Enhanced Training for Dialogue State Tracking
Viaarxiv icon