Picture for Xulong Zhang

Xulong Zhang

CycleFlow: Leveraging Cycle Consistency in Flow Matching for Speaker Style Adaptation

Add code
Jan 03, 2025
Viaarxiv icon

ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations

Add code
Nov 20, 2024
Viaarxiv icon

Semi-Supervised Self-Learning Enhanced Music Emotion Recognition

Add code
Oct 29, 2024
Figure 1 for Semi-Supervised Self-Learning Enhanced Music Emotion Recognition
Figure 2 for Semi-Supervised Self-Learning Enhanced Music Emotion Recognition
Viaarxiv icon

IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding

Add code
Sep 29, 2024
Figure 1 for IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding
Figure 2 for IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding
Figure 3 for IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding
Figure 4 for IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding
Viaarxiv icon

Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning

Add code
May 28, 2024
Figure 1 for Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning
Figure 2 for Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning
Figure 3 for Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning
Figure 4 for Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning
Viaarxiv icon

RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval

Add code
May 28, 2024
Figure 1 for RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval
Figure 2 for RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval
Figure 3 for RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval
Figure 4 for RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval
Viaarxiv icon

RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis

Add code
May 27, 2024
Figure 1 for RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
Figure 2 for RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
Figure 3 for RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
Figure 4 for RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
Viaarxiv icon

MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion

Add code
May 02, 2024
Figure 1 for MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
Figure 2 for MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
Figure 3 for MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
Figure 4 for MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
Viaarxiv icon

Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation

Add code
May 01, 2024
Viaarxiv icon

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

Add code
Apr 30, 2024
Viaarxiv icon