Picture for Heng Lu

Heng Lu

Long-Context Speech Synthesis with Context-Aware Memory

Add code
Aug 20, 2025
Figure 1 for Long-Context Speech Synthesis with Context-Aware Memory
Figure 2 for Long-Context Speech Synthesis with Context-Aware Memory
Figure 3 for Long-Context Speech Synthesis with Context-Aware Memory
Figure 4 for Long-Context Speech Synthesis with Context-Aware Memory
Viaarxiv icon

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model

Add code
Jun 04, 2025
Figure 1 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 2 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 3 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 4 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Viaarxiv icon

Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation

Add code
Jun 04, 2025
Viaarxiv icon

Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation

Add code
Jan 11, 2025
Figure 1 for Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
Figure 2 for Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
Figure 3 for Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
Figure 4 for Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
Viaarxiv icon

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

Add code
Oct 16, 2024
Figure 1 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 2 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 3 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 4 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Viaarxiv icon

IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities

Add code
Oct 09, 2024
Figure 1 for IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Figure 2 for IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Figure 3 for IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Figure 4 for IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Viaarxiv icon

Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives

Add code
Aug 02, 2024
Figure 1 for Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives
Figure 2 for Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives
Figure 3 for Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives
Figure 4 for Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives
Viaarxiv icon

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

Add code
Jul 09, 2024
Figure 1 for CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Figure 2 for CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Figure 3 for CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Figure 4 for CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Viaarxiv icon

The Impact of Quantization and Pruning on Deep Reinforcement Learning Models

Add code
Jul 05, 2024
Figure 1 for The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Figure 2 for The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Figure 3 for The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Figure 4 for The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Viaarxiv icon

GMP-ATL: Gender-augmented Multi-scale Pseudo-label Enhanced Adaptive Transfer Learning for Speech Emotion Recognition via HuBERT

Add code
May 03, 2024
Figure 1 for GMP-ATL: Gender-augmented Multi-scale Pseudo-label Enhanced Adaptive Transfer Learning for Speech Emotion Recognition via HuBERT
Figure 2 for GMP-ATL: Gender-augmented Multi-scale Pseudo-label Enhanced Adaptive Transfer Learning for Speech Emotion Recognition via HuBERT
Figure 3 for GMP-ATL: Gender-augmented Multi-scale Pseudo-label Enhanced Adaptive Transfer Learning for Speech Emotion Recognition via HuBERT
Figure 4 for GMP-ATL: Gender-augmented Multi-scale Pseudo-label Enhanced Adaptive Transfer Learning for Speech Emotion Recognition via HuBERT
Viaarxiv icon