Picture for Shujie Hu

Shujie Hu

LongCat-AudioDiT: High-Fidelity Diffusion Text-to-Speech in the Waveform Latent Space

Add code
Mar 31, 2026
Viaarxiv icon

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Add code
Mar 25, 2026
Viaarxiv icon

Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis

Add code
Sep 26, 2025
Viaarxiv icon

MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition

Add code
May 30, 2025
Figure 1 for MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition
Figure 2 for MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition
Figure 3 for MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition
Figure 4 for MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition
Viaarxiv icon

Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

Add code
May 29, 2025
Figure 1 for Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition
Figure 2 for Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition
Figure 3 for Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition
Figure 4 for Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition
Viaarxiv icon

Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision

Add code
May 27, 2025
Viaarxiv icon

Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling

Add code
May 26, 2025
Viaarxiv icon

Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition

Add code
Jan 08, 2025
Figure 1 for Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
Figure 2 for Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
Figure 3 for Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
Figure 4 for Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
Viaarxiv icon

Effective and Efficient Mixed Precision Quantization of Speech Foundation Models

Add code
Jan 07, 2025
Figure 1 for Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
Figure 2 for Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
Figure 3 for Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
Figure 4 for Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
Viaarxiv icon

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Add code
Dec 30, 2024
Figure 1 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 2 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 3 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 4 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Viaarxiv icon