Picture for Xiaopeng Wang

Xiaopeng Wang

MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning

Add code
Jan 08, 2026
Viaarxiv icon

Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning

Add code
Jan 06, 2026
Viaarxiv icon

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Add code
Jun 24, 2025
Figure 1 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 2 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 3 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 4 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Viaarxiv icon

Artificial Protozoa Optimizer (APO): A novel bio-inspired metaheuristic algorithm for engineering optimization

Add code
May 06, 2025
Figure 1 for Artificial Protozoa Optimizer (APO): A novel bio-inspired metaheuristic algorithm for engineering optimization
Figure 2 for Artificial Protozoa Optimizer (APO): A novel bio-inspired metaheuristic algorithm for engineering optimization
Figure 3 for Artificial Protozoa Optimizer (APO): A novel bio-inspired metaheuristic algorithm for engineering optimization
Figure 4 for Artificial Protozoa Optimizer (APO): A novel bio-inspired metaheuristic algorithm for engineering optimization
Viaarxiv icon

Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception

Add code
Apr 09, 2025
Figure 1 for Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception
Figure 2 for Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception
Figure 3 for Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception
Figure 4 for Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception
Viaarxiv icon

Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition

Add code
Jan 11, 2025
Figure 1 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Figure 2 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Figure 3 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Figure 4 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Viaarxiv icon

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0

Add code
Sep 18, 2024
Figure 1 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 2 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 3 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 4 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Viaarxiv icon

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Add code
Sep 18, 2024
Figure 1 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 2 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 3 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 4 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Viaarxiv icon

Reconstruct Spine CT from Biplanar X-Rays via Diffusion Learning

Add code
Aug 21, 2024
Figure 1 for Reconstruct Spine CT from Biplanar X-Rays via Diffusion Learning
Figure 2 for Reconstruct Spine CT from Biplanar X-Rays via Diffusion Learning
Figure 3 for Reconstruct Spine CT from Biplanar X-Rays via Diffusion Learning
Viaarxiv icon

A Noval Feature via Color Quantisation for Fake Audio Detection

Add code
Aug 20, 2024
Figure 1 for A Noval Feature via Color Quantisation for Fake Audio Detection
Figure 2 for A Noval Feature via Color Quantisation for Fake Audio Detection
Figure 3 for A Noval Feature via Color Quantisation for Fake Audio Detection
Figure 4 for A Noval Feature via Color Quantisation for Fake Audio Detection
Viaarxiv icon