Picture for Helen Meng

Helen Meng

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

Add code
Jun 14, 2024
Figure 1 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 2 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 3 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 4 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Viaarxiv icon

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

Add code
Jun 14, 2024
Figure 1 for Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Figure 2 for Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Figure 3 for Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Figure 4 for Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Viaarxiv icon

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

Add code
Jun 14, 2024
Figure 1 for Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Figure 2 for Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Figure 3 for Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Figure 4 for Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Viaarxiv icon

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction

Add code
Jun 12, 2024
Figure 1 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 2 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 3 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 4 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Viaarxiv icon

Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

Add code
Jun 11, 2024
Figure 1 for Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
Figure 2 for Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
Figure 3 for Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
Figure 4 for Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
Viaarxiv icon

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

Add code
Jun 05, 2024
Figure 1 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 2 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 3 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 4 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Viaarxiv icon

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models

Add code
Jun 04, 2024
Figure 1 for SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Figure 2 for SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Figure 3 for SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Figure 4 for SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Viaarxiv icon

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy

Add code
Mar 24, 2024
Viaarxiv icon

Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information

Add code
Mar 09, 2024
Figure 1 for Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information
Figure 2 for Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information
Figure 3 for Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information
Figure 4 for Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information
Viaarxiv icon

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

Add code
Feb 14, 2024
Viaarxiv icon