Picture for Feng Deng

Feng Deng

AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation

Add code
Aug 01, 2025
Viaarxiv icon

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Add code
Jun 24, 2025
Viaarxiv icon

LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation

Add code
Jun 12, 2024
Figure 1 for LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Figure 2 for LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Figure 3 for LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Figure 4 for LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Viaarxiv icon

SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification

Add code
Sep 18, 2021
Figure 1 for SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification
Figure 2 for SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification
Figure 3 for SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification
Figure 4 for SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification
Viaarxiv icon

Multi-Task Audio Source Separation

Add code
Jul 14, 2021
Figure 1 for Multi-Task Audio Source Separation
Figure 2 for Multi-Task Audio Source Separation
Figure 3 for Multi-Task Audio Source Separation
Figure 4 for Multi-Task Audio Source Separation
Viaarxiv icon