Picture for Kun Wei

Kun Wei

LVD-GS: Gaussian Splatting SLAM for Dynamic Scenes via Hierarchical Explicit-Implicit Representation Collaboration Rendering

Add code
Oct 26, 2025
Viaarxiv icon

DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis

Add code
Sep 18, 2025
Figure 1 for DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
Figure 2 for DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
Figure 3 for DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
Viaarxiv icon

Efficient Scaling for LLM-based ASR

Add code
Aug 06, 2025
Figure 1 for Efficient Scaling for LLM-based ASR
Figure 2 for Efficient Scaling for LLM-based ASR
Figure 3 for Efficient Scaling for LLM-based ASR
Figure 4 for Efficient Scaling for LLM-based ASR
Viaarxiv icon

HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models

Add code
Sep 30, 2024
Figure 1 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Figure 2 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Figure 3 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Figure 4 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Viaarxiv icon

Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text

Add code
Sep 17, 2024
Figure 1 for Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
Figure 2 for Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
Figure 3 for Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
Figure 4 for Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
Viaarxiv icon

Advancing Multi-talker ASR Performance with Large Language Models

Add code
Aug 30, 2024
Figure 1 for Advancing Multi-talker ASR Performance with Large Language Models
Figure 2 for Advancing Multi-talker ASR Performance with Large Language Models
Figure 3 for Advancing Multi-talker ASR Performance with Large Language Models
Figure 4 for Advancing Multi-talker ASR Performance with Large Language Models
Viaarxiv icon

MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

Add code
May 06, 2024
Figure 1 for MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Figure 2 for MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Figure 3 for MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Figure 4 for MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Viaarxiv icon

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

Add code
May 06, 2024
Figure 1 for Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Figure 2 for Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Figure 3 for Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Figure 4 for Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Viaarxiv icon

Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons

Add code
Jan 24, 2024
Viaarxiv icon

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

Add code
Oct 22, 2023
Viaarxiv icon