Picture for Jun Du

Jun Du

VSE-MOT: Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Enhancement

Add code
Sep 17, 2025
Viaarxiv icon

Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training

Add code
Sep 16, 2025
Viaarxiv icon

MEAN-RIR: Multi-Modal Environment-Aware Network for Robust Room Impulse Response Estimation

Add code
Sep 05, 2025
Viaarxiv icon

EGGCodec: A Robust Neural Encodec Framework for EGG Reconstruction and F0 Extraction

Add code
Aug 12, 2025
Viaarxiv icon

Exploring Speaker Diarization with Mixture of Experts

Add code
Jun 17, 2025
Viaarxiv icon

M3SD: Multi-modal, Multi-scenario and Multi-language Speaker Diarization Dataset

Add code
Jun 17, 2025
Viaarxiv icon

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge

Add code
May 12, 2025
Viaarxiv icon

Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration

Add code
Apr 17, 2025
Viaarxiv icon

Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition

Add code
Feb 10, 2025
Figure 1 for Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition
Figure 2 for Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition
Figure 3 for Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition
Figure 4 for Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition
Viaarxiv icon

Latent Swap Joint Diffusion for Long-Form Audio Generation

Add code
Feb 07, 2025
Figure 1 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 2 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 3 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 4 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Viaarxiv icon