Picture for Kun Wei

Kun Wei

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

Add code
May 06, 2024
Viaarxiv icon

MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

Add code
May 06, 2024
Viaarxiv icon

Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons

Add code
Jan 24, 2024
Viaarxiv icon

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

Add code
Oct 22, 2023
Viaarxiv icon

The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

Add code
Jul 10, 2023
Figure 1 for The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Figure 2 for The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Figure 3 for The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Figure 4 for The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Viaarxiv icon

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation

Add code
Jun 01, 2023
Figure 1 for StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Figure 2 for StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Figure 3 for StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Figure 4 for StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Viaarxiv icon

ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis

Add code
Apr 13, 2023
Figure 1 for ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis
Figure 2 for ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis
Figure 3 for ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis
Figure 4 for ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis
Viaarxiv icon

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

Add code
Oct 31, 2022
Figure 1 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Figure 2 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Figure 3 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Figure 4 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Viaarxiv icon

ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases

Add code
Jul 21, 2022
Figure 1 for ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases
Figure 2 for ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases
Figure 3 for ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases
Figure 4 for ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases
Viaarxiv icon

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

Add code
Jul 03, 2022
Figure 1 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Figure 2 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Figure 3 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Figure 4 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Viaarxiv icon