Picture for Erik Visser

Erik Visser

Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework

Add code
May 21, 2025
Viaarxiv icon

Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding

Add code
Dec 05, 2024
Figure 1 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Figure 2 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Figure 3 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Figure 4 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Viaarxiv icon

Confidence Calibration for Audio Captioning Models

Add code
Sep 13, 2024
Figure 1 for Confidence Calibration for Audio Captioning Models
Figure 2 for Confidence Calibration for Audio Captioning Models
Figure 3 for Confidence Calibration for Audio Captioning Models
Figure 4 for Confidence Calibration for Audio Captioning Models
Viaarxiv icon

VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion

Add code
Sep 10, 2024
Viaarxiv icon

Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models

Add code
Sep 10, 2024
Viaarxiv icon

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

Add code
Sep 12, 2023
Figure 1 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 2 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 3 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 4 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Viaarxiv icon

Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation

Add code
Sep 06, 2023
Figure 1 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 2 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 3 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 4 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Viaarxiv icon

Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature

Add code
Sep 06, 2023
Figure 1 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 2 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 3 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 4 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Viaarxiv icon

Improved Beam Search for Hallucination Mitigation in Abstractive Summarization

Add code
Dec 06, 2022
Viaarxiv icon

Application of Knowledge Distillation to Multi-task Speech Representation Learning

Add code
Oct 29, 2022
Viaarxiv icon