Alert button
Picture for Yinghao Aaron Li

Yinghao Aaron Li

Alert button

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience

Add code
Bookmark button
Alert button
Feb 06, 2024
Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

Viaarxiv icon

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

Add code
Bookmark button
Alert button
Jan 31, 2024
Gavin Mischler, Yinghao Aaron Li, Stephan Bickel, Ashesh D. Mehta, Nima Mesgarani

Viaarxiv icon

Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation

Add code
Bookmark button
Alert button
Sep 27, 2023
Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

Viaarxiv icon

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Add code
Bookmark button
Alert button
Sep 18, 2023
Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Figure 1 for HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Figure 2 for HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Figure 3 for HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Viaarxiv icon

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

Add code
Bookmark button
Alert button
Jul 18, 2023
Yinghao Aaron Li, Cong Han, Nima Mesgarani

Figure 1 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Figure 2 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Figure 3 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Figure 4 for SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Viaarxiv icon

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Add code
Bookmark button
Alert button
Jun 13, 2023
Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani

Figure 1 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Figure 2 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Figure 3 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Figure 4 for StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Viaarxiv icon

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

Add code
Bookmark button
Alert button
May 29, 2023
Xilin Jiang, Yinghao Aaron Li, Nima Mesgarani

Figure 1 for DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Figure 2 for DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Figure 3 for DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Figure 4 for DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Viaarxiv icon

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation

Add code
Bookmark button
Alert button
Feb 11, 2023
Cong Han, Vishal Choudhari, Yinghao Aaron Li, Nima Mesgarani

Figure 1 for Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation
Figure 2 for Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation
Figure 3 for Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation
Figure 4 for Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation
Viaarxiv icon

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Add code
Bookmark button
Alert button
Jan 20, 2023
Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Figure 1 for Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Figure 2 for Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Figure 3 for Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Figure 4 for Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Viaarxiv icon

StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models

Add code
Bookmark button
Alert button
Dec 29, 2022
Yinghao Aaron Li, Cong Han, Nima Mesgarani

Figure 1 for StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Figure 2 for StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Figure 3 for StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Figure 4 for StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Viaarxiv icon