Picture for Xianghu Yue

Xianghu Yue

Listening for "You": Enhancing Speech Image Retrieval via Target Speaker Extraction

Add code
Sep 11, 2025
Viaarxiv icon

Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture

Add code
Apr 21, 2025
Figure 1 for Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Figure 2 for Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Figure 3 for Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Figure 4 for Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Viaarxiv icon

UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook

Add code
Feb 27, 2025
Viaarxiv icon

PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning

Add code
Jan 16, 2025
Viaarxiv icon

VoiceBench: Benchmarking LLM-Based Voice Assistants

Add code
Oct 22, 2024
Figure 1 for VoiceBench: Benchmarking LLM-Based Voice Assistants
Figure 2 for VoiceBench: Benchmarking LLM-Based Voice Assistants
Figure 3 for VoiceBench: Benchmarking LLM-Based Voice Assistants
Figure 4 for VoiceBench: Benchmarking LLM-Based Voice Assistants
Viaarxiv icon

Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models

Add code
Sep 27, 2024
Figure 1 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 2 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 3 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 4 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Viaarxiv icon

Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection

Add code
Sep 11, 2024
Figure 1 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 2 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 3 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 4 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Viaarxiv icon

TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations

Add code
Jul 02, 2024
Figure 1 for TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
Figure 2 for TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
Figure 3 for TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
Figure 4 for TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
Viaarxiv icon

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

Add code
Feb 28, 2024
Figure 1 for Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks
Figure 2 for Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks
Figure 3 for Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks
Figure 4 for Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks
Viaarxiv icon

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

Add code
Jan 22, 2024
Figure 1 for CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing
Figure 2 for CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing
Figure 3 for CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing
Figure 4 for CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing
Viaarxiv icon