Picture for Haizhou Li

Haizhou Li

Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis

Add code
Jan 11, 2025
Figure 1 for Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis
Figure 2 for Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis
Figure 3 for Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis
Figure 4 for Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis
Viaarxiv icon

Binary Event-Driven Spiking Transformer

Add code
Jan 10, 2025
Figure 1 for Binary Event-Driven Spiking Transformer
Figure 2 for Binary Event-Driven Spiking Transformer
Figure 3 for Binary Event-Driven Spiking Transformer
Figure 4 for Binary Event-Driven Spiking Transformer
Viaarxiv icon

Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition

Add code
Jan 03, 2025
Figure 1 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Figure 2 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Figure 3 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Figure 4 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Viaarxiv icon

SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor

Add code
Dec 18, 2024
Figure 1 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 2 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 3 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 4 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Viaarxiv icon

Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech

Add code
Dec 17, 2024
Viaarxiv icon

Hierarchical Control of Emotion Rendering in Speech Synthesis

Add code
Dec 17, 2024
Figure 1 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Figure 2 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Figure 3 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Figure 4 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Viaarxiv icon

Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion

Add code
Dec 16, 2024
Figure 1 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Figure 2 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Figure 3 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Figure 4 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Viaarxiv icon

MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues

Add code
Dec 11, 2024
Viaarxiv icon

Alignment at Pre-training! Towards Native Alignment for Arabic LLMs

Add code
Dec 04, 2024
Figure 1 for Alignment at Pre-training! Towards Native Alignment for Arabic LLMs
Figure 2 for Alignment at Pre-training! Towards Native Alignment for Arabic LLMs
Figure 3 for Alignment at Pre-training! Towards Native Alignment for Arabic LLMs
Figure 4 for Alignment at Pre-training! Towards Native Alignment for Arabic LLMs
Viaarxiv icon

Transferable Adversarial Attacks against ASR

Add code
Nov 14, 2024
Figure 1 for Transferable Adversarial Attacks against ASR
Figure 2 for Transferable Adversarial Attacks against ASR
Figure 3 for Transferable Adversarial Attacks against ASR
Viaarxiv icon