Picture for Haizhou Li

Haizhou Li

Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis

Add code
May 15, 2024
Figure 1 for Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
Figure 2 for Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
Figure 3 for Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
Figure 4 for Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
Viaarxiv icon

Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems

Add code
May 03, 2024
Figure 1 for Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems
Figure 2 for Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems
Figure 3 for Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems
Figure 4 for Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems
Viaarxiv icon

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

Add code
Apr 29, 2024
Figure 1 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 2 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 3 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 4 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Viaarxiv icon

An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder

Add code
Apr 26, 2024
Figure 1 for An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
Figure 2 for An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
Figure 3 for An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
Figure 4 for An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
Viaarxiv icon

Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

Add code
Apr 01, 2024
Figure 1 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 2 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 3 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 4 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Viaarxiv icon

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

Add code
Apr 01, 2024
Figure 1 for Voice Conversion Augmentation for Speaker Recognition on Defective Datasets
Figure 2 for Voice Conversion Augmentation for Speaker Recognition on Defective Datasets
Figure 3 for Voice Conversion Augmentation for Speaker Recognition on Defective Datasets
Figure 4 for Voice Conversion Augmentation for Speaker Recognition on Defective Datasets
Viaarxiv icon

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy

Add code
Mar 24, 2024
Viaarxiv icon

CrossTune: Black-Box Few-Shot Classification with Label Enhancement

Add code
Mar 19, 2024
Figure 1 for CrossTune: Black-Box Few-Shot Classification with Label Enhancement
Figure 2 for CrossTune: Black-Box Few-Shot Classification with Label Enhancement
Figure 3 for CrossTune: Black-Box Few-Shot Classification with Label Enhancement
Figure 4 for CrossTune: Black-Box Few-Shot Classification with Label Enhancement
Viaarxiv icon

Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

Add code
Mar 09, 2024
Figure 1 for Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
Figure 2 for Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
Figure 3 for Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
Figure 4 for Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
Viaarxiv icon

sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

Add code
Mar 09, 2024
Viaarxiv icon