Picture for Yinghao Aaron Li

Yinghao Aaron Li

AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking

Add code
Jan 25, 2026
Viaarxiv icon

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion

Add code
Sep 16, 2024
Figure 1 for StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Figure 2 for StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Figure 3 for StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Figure 4 for StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Viaarxiv icon

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation

Add code
Aug 13, 2024
Figure 1 for Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Figure 2 for Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Figure 3 for Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Figure 4 for Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Viaarxiv icon

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

Add code
Jul 13, 2024
Viaarxiv icon

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience

Add code
Feb 06, 2024
Figure 1 for Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience
Figure 2 for Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience
Figure 3 for Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience
Figure 4 for Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience
Viaarxiv icon

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

Add code
Jan 31, 2024
Viaarxiv icon

Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation

Add code
Sep 27, 2023
Figure 1 for Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation
Figure 2 for Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation
Figure 3 for Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation
Viaarxiv icon

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Add code
Sep 18, 2023
Figure 1 for HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Figure 2 for HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Figure 3 for HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Viaarxiv icon

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

Add code
Jul 18, 2023
Figure 1 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Figure 2 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Figure 3 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Figure 4 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Viaarxiv icon

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Add code
Jun 13, 2023
Figure 1 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Figure 2 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Figure 3 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Figure 4 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Viaarxiv icon