Picture for Slim Essid

Slim Essid

IDS, S2A, LTCI

S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models

Add code
Apr 27, 2026
Viaarxiv icon

Contrastive Knowledge Distillation for Embedding Refinement in Personalized Speech Enhancement

Add code
Jan 21, 2026
Viaarxiv icon

O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization

Add code
Dec 17, 2025
Viaarxiv icon

Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping

Add code
Feb 24, 2025
Figure 1 for Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping
Figure 2 for Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping
Figure 3 for Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping
Figure 4 for Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping
Viaarxiv icon

Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning

Add code
Feb 17, 2025
Figure 1 for Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
Figure 2 for Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
Figure 3 for Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
Figure 4 for Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
Viaarxiv icon

TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization

Add code
Dec 02, 2024
Figure 1 for TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization
Figure 2 for TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization
Figure 3 for TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization
Figure 4 for TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization
Viaarxiv icon

Multiple Choice Learning for Efficient Speech Separation with Many Speakers

Add code
Nov 27, 2024
Viaarxiv icon

A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning

Add code
Nov 06, 2024
Figure 1 for A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning
Figure 2 for A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning
Figure 3 for A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning
Figure 4 for A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning
Viaarxiv icon

An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment

Add code
Oct 08, 2024
Figure 1 for An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Figure 2 for An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Figure 3 for An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Figure 4 for An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Viaarxiv icon

SALT: Standardized Audio event Label Taxonomy

Add code
Sep 18, 2024
Figure 1 for SALT: Standardized Audio event Label Taxonomy
Figure 2 for SALT: Standardized Audio event Label Taxonomy
Figure 3 for SALT: Standardized Audio event Label Taxonomy
Figure 4 for SALT: Standardized Audio event Label Taxonomy
Viaarxiv icon