Picture for Haizhou Li

Haizhou Li

SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model

Add code
Nov 12, 2024
Figure 1 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 2 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 3 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 4 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Viaarxiv icon

Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Add code
Nov 05, 2024
Figure 1 for Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Figure 2 for Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Figure 3 for Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Figure 4 for Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Viaarxiv icon

VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions

Add code
Oct 29, 2024
Figure 1 for VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions
Figure 2 for VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions
Figure 3 for VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions
Figure 4 for VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions
Viaarxiv icon

VoiceBench: Benchmarking LLM-Based Voice Assistants

Add code
Oct 22, 2024
Figure 1 for VoiceBench: Benchmarking LLM-Based Voice Assistants
Figure 2 for VoiceBench: Benchmarking LLM-Based Voice Assistants
Figure 3 for VoiceBench: Benchmarking LLM-Based Voice Assistants
Figure 4 for VoiceBench: Benchmarking LLM-Based Voice Assistants
Viaarxiv icon

Multi-Level Speaker Representation for Target Speaker Extraction

Add code
Oct 21, 2024
Viaarxiv icon

Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech

Add code
Oct 18, 2024
Figure 1 for Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech
Figure 2 for Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech
Figure 3 for Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech
Viaarxiv icon

Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement

Add code
Oct 18, 2024
Viaarxiv icon

Roadmap towards Superhuman Speech Understanding using Large Language Models

Add code
Oct 17, 2024
Figure 1 for Roadmap towards Superhuman Speech Understanding using Large Language Models
Figure 2 for Roadmap towards Superhuman Speech Understanding using Large Language Models
Figure 3 for Roadmap towards Superhuman Speech Understanding using Large Language Models
Figure 4 for Roadmap towards Superhuman Speech Understanding using Large Language Models
Viaarxiv icon

Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling

Add code
Oct 12, 2024
Figure 1 for Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling
Figure 2 for Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling
Figure 3 for Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling
Figure 4 for Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling
Viaarxiv icon

Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models

Add code
Sep 27, 2024
Figure 1 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 2 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 3 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 4 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Viaarxiv icon