Picture for Jonathan Le Roux

Jonathan Le Roux

MERL

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

Add code
Oct 30, 2023
Figure 1 for Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction
Figure 2 for Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction
Figure 3 for Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction
Figure 4 for Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction
Viaarxiv icon

Generation or Replication: Auscultating Audio Latent Diffusion Models

Add code
Oct 16, 2023
Figure 1 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 2 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 3 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 4 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Viaarxiv icon

Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation

Add code
Sep 29, 2023
Viaarxiv icon

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

Add code
Aug 14, 2023
Figure 1 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Figure 2 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Figure 3 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Figure 4 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Viaarxiv icon

Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos

Add code
Jun 27, 2023
Figure 1 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 2 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 3 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 4 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Viaarxiv icon

Pac-HuBERT: Self-Supervised Music Source Separation via Primitive Auditory Clustering and Hidden-Unit BERT

Add code
Apr 04, 2023
Viaarxiv icon

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

Add code
Mar 08, 2023
Figure 1 for TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
Figure 2 for TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
Figure 3 for TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
Figure 4 for TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
Viaarxiv icon

Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

Add code
Dec 14, 2022
Figure 1 for Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Figure 2 for Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Figure 3 for Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Figure 4 for Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Viaarxiv icon

Hyperbolic Audio Source Separation

Add code
Dec 09, 2022
Viaarxiv icon

Latent Iterative Refinement for Modular Source Separation

Add code
Nov 22, 2022
Viaarxiv icon