Picture for Haizhou Li

Haizhou Li

GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis

Add code
Jul 15, 2024
Viaarxiv icon

Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset

Add code
Jul 03, 2024
Figure 1 for Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset
Figure 2 for Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset
Figure 3 for Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset
Figure 4 for Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset
Viaarxiv icon

SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

Add code
Jul 03, 2024
Viaarxiv icon

DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models

Add code
Jul 01, 2024
Viaarxiv icon

RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging

Add code
Jun 24, 2024
Viaarxiv icon

Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models

Add code
Jun 20, 2024
Viaarxiv icon

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Add code
Jun 19, 2024
Figure 1 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Figure 2 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Figure 3 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Figure 4 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Viaarxiv icon

An Exploration of Length Generalization in Transformer-Based Speech Enhancement

Add code
Jun 17, 2024
Viaarxiv icon

Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis

Add code
Jun 16, 2024
Figure 1 for Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis
Figure 2 for Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis
Figure 3 for Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis
Figure 4 for Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis
Viaarxiv icon

ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting

Add code
Jun 14, 2024
Figure 1 for ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting
Figure 2 for ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting
Figure 3 for ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting
Figure 4 for ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting
Viaarxiv icon