Picture for Erik Visser

Erik Visser

Confidence Calibration for Audio Captioning Models

Add code
Sep 13, 2024
Figure 1 for Confidence Calibration for Audio Captioning Models
Figure 2 for Confidence Calibration for Audio Captioning Models
Figure 3 for Confidence Calibration for Audio Captioning Models
Figure 4 for Confidence Calibration for Audio Captioning Models
Viaarxiv icon

Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models

Add code
Sep 10, 2024
Viaarxiv icon

VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion

Add code
Sep 10, 2024
Figure 1 for VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
Figure 2 for VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
Figure 3 for VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
Figure 4 for VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
Viaarxiv icon

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

Add code
Sep 12, 2023
Figure 1 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 2 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 3 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 4 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Viaarxiv icon

Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature

Add code
Sep 06, 2023
Figure 1 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 2 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 3 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 4 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Viaarxiv icon

Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation

Add code
Sep 06, 2023
Figure 1 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 2 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 3 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 4 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Viaarxiv icon

Improved Beam Search for Hallucination Mitigation in Abstractive Summarization

Add code
Dec 06, 2022
Figure 1 for Improved Beam Search for Hallucination Mitigation in Abstractive Summarization
Figure 2 for Improved Beam Search for Hallucination Mitigation in Abstractive Summarization
Figure 3 for Improved Beam Search for Hallucination Mitigation in Abstractive Summarization
Figure 4 for Improved Beam Search for Hallucination Mitigation in Abstractive Summarization
Viaarxiv icon

Application of Knowledge Distillation to Multi-task Speech Representation Learning

Add code
Oct 29, 2022
Figure 1 for Application of Knowledge Distillation to Multi-task Speech Representation Learning
Figure 2 for Application of Knowledge Distillation to Multi-task Speech Representation Learning
Figure 3 for Application of Knowledge Distillation to Multi-task Speech Representation Learning
Figure 4 for Application of Knowledge Distillation to Multi-task Speech Representation Learning
Viaarxiv icon

Activity report analysis with automatic single or multispan answer extraction

Add code
Sep 09, 2022
Figure 1 for Activity report analysis with automatic single or multispan answer extraction
Figure 2 for Activity report analysis with automatic single or multispan answer extraction
Figure 3 for Activity report analysis with automatic single or multispan answer extraction
Figure 4 for Activity report analysis with automatic single or multispan answer extraction
Viaarxiv icon

Multi-task Voice Activated Framework using Self-supervised Learning

Add code
Oct 12, 2021
Figure 1 for Multi-task Voice Activated Framework using Self-supervised Learning
Figure 2 for Multi-task Voice Activated Framework using Self-supervised Learning
Figure 3 for Multi-task Voice Activated Framework using Self-supervised Learning
Viaarxiv icon