Picture for Erik Visser

Erik Visser

Aligning Audio Captions with Human Preferences

Add code
Sep 18, 2025
Viaarxiv icon

Spatial Audio Motion Understanding and Reasoning

Add code
Sep 18, 2025
Viaarxiv icon

Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation

Add code
Sep 18, 2025
Viaarxiv icon

Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework

Add code
May 21, 2025
Viaarxiv icon

Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding

Add code
Dec 05, 2024
Figure 1 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Figure 2 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Figure 3 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Figure 4 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Viaarxiv icon

Confidence Calibration for Audio Captioning Models

Add code
Sep 13, 2024
Figure 1 for Confidence Calibration for Audio Captioning Models
Figure 2 for Confidence Calibration for Audio Captioning Models
Figure 3 for Confidence Calibration for Audio Captioning Models
Figure 4 for Confidence Calibration for Audio Captioning Models
Viaarxiv icon

Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models

Add code
Sep 10, 2024
Viaarxiv icon

VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion

Add code
Sep 10, 2024
Viaarxiv icon

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

Add code
Sep 12, 2023
Figure 1 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 2 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 3 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 4 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Viaarxiv icon

Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature

Add code
Sep 06, 2023
Figure 1 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 2 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 3 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 4 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Viaarxiv icon