Picture for Wenwu Wang

Wenwu Wang

DGFM: Full Body Dance Generation Driven by Music Foundation Models

Add code
Feb 27, 2025
Viaarxiv icon

GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music

Add code
Feb 25, 2025
Figure 1 for GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music
Figure 2 for GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music
Figure 3 for GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music
Figure 4 for GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music
Viaarxiv icon

The ICME 2025 Audio Encoder Capability Challenge

Add code
Jan 25, 2025
Viaarxiv icon

Graph-Enhanced Dual-Stream Feature Fusion with Pre-Trained Model for Acoustic Traffic Monitoring

Add code
Dec 26, 2024
Figure 1 for Graph-Enhanced Dual-Stream Feature Fusion with Pre-Trained Model for Acoustic Traffic Monitoring
Figure 2 for Graph-Enhanced Dual-Stream Feature Fusion with Pre-Trained Model for Acoustic Traffic Monitoring
Figure 3 for Graph-Enhanced Dual-Stream Feature Fusion with Pre-Trained Model for Acoustic Traffic Monitoring
Figure 4 for Graph-Enhanced Dual-Stream Feature Fusion with Pre-Trained Model for Acoustic Traffic Monitoring
Viaarxiv icon

AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models

Add code
Nov 28, 2024
Figure 1 for AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
Figure 2 for AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
Figure 3 for AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
Figure 4 for AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
Viaarxiv icon

PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection

Add code
Nov 10, 2024
Figure 1 for PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
Figure 2 for PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
Figure 3 for PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
Figure 4 for PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
Viaarxiv icon

Differentiable Interacting Multiple Model Particle Filtering

Add code
Oct 01, 2024
Figure 1 for Differentiable Interacting Multiple Model Particle Filtering
Figure 2 for Differentiable Interacting Multiple Model Particle Filtering
Figure 3 for Differentiable Interacting Multiple Model Particle Filtering
Viaarxiv icon

FlowSep: Language-Queried Sound Separation with Rectified Flow Matching

Add code
Sep 11, 2024
Figure 1 for FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Figure 2 for FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Figure 3 for FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Figure 4 for FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Viaarxiv icon

Efficient Audio Captioning with Encoder-Level Knowledge Distillation

Add code
Jul 19, 2024
Figure 1 for Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Figure 2 for Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Figure 3 for Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Figure 4 for Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Viaarxiv icon

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

Add code
Jul 16, 2024
Viaarxiv icon