Picture for Haizhou Li

Haizhou Li

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction

Add code
Sep 24, 2024
Figure 1 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 2 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 3 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 4 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Viaarxiv icon

M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions

Add code
Sep 24, 2024
Figure 1 for M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
Figure 2 for M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
Figure 3 for M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
Figure 4 for M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
Viaarxiv icon

Aligning Language Models Using Follow-up Likelihood as Reward Signal

Add code
Sep 20, 2024
Figure 1 for Aligning Language Models Using Follow-up Likelihood as Reward Signal
Figure 2 for Aligning Language Models Using Follow-up Likelihood as Reward Signal
Figure 3 for Aligning Language Models Using Follow-up Likelihood as Reward Signal
Figure 4 for Aligning Language Models Using Follow-up Likelihood as Reward Signal
Viaarxiv icon

On the effectiveness of enrollment speech augmentation for Target Speaker Extraction

Add code
Sep 15, 2024
Figure 1 for On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
Figure 2 for On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
Figure 3 for On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
Figure 4 for On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
Viaarxiv icon

MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion

Add code
Sep 14, 2024
Figure 1 for MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Figure 2 for MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Figure 3 for MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Figure 4 for MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Viaarxiv icon

E1 TTS: Simple and Fast Non-Autoregressive TTS

Add code
Sep 14, 2024
Figure 1 for E1 TTS: Simple and Fast Non-Autoregressive TTS
Figure 2 for E1 TTS: Simple and Fast Non-Autoregressive TTS
Figure 3 for E1 TTS: Simple and Fast Non-Autoregressive TTS
Figure 4 for E1 TTS: Simple and Fast Non-Autoregressive TTS
Viaarxiv icon

Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection

Add code
Sep 11, 2024
Figure 1 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 2 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 3 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 4 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Viaarxiv icon

NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention

Add code
Sep 04, 2024
Figure 1 for NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention
Figure 2 for NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention
Figure 3 for NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention
Figure 4 for NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention
Viaarxiv icon

Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing

Add code
Aug 29, 2024
Figure 1 for Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing
Figure 2 for Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing
Figure 3 for Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing
Figure 4 for Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing
Viaarxiv icon

Generative Expressive Conversational Speech Synthesis

Add code
Aug 01, 2024
Figure 1 for Generative Expressive Conversational Speech Synthesis
Figure 2 for Generative Expressive Conversational Speech Synthesis
Figure 3 for Generative Expressive Conversational Speech Synthesis
Figure 4 for Generative Expressive Conversational Speech Synthesis
Viaarxiv icon