Picture for Hsin-Min Wang

Hsin-Min Wang

A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models

Add code
Sep 16, 2024
Viaarxiv icon

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement

Add code
Sep 16, 2024
Figure 1 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 2 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 3 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 4 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Viaarxiv icon

Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages

Add code
Sep 13, 2024
Viaarxiv icon

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction

Add code
Sep 11, 2024
Figure 1 for The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Figure 2 for The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Figure 3 for The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Figure 4 for The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Viaarxiv icon

Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation

Add code
Sep 03, 2024
Viaarxiv icon

SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models

Add code
Jun 12, 2024
Figure 1 for SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
Figure 2 for SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
Figure 3 for SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
Figure 4 for SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
Viaarxiv icon

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Add code
May 07, 2024
Figure 1 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Figure 2 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Figure 3 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Figure 4 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Viaarxiv icon

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

Add code
Feb 10, 2024
Viaarxiv icon

HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids

Add code
Jan 02, 2024
Figure 1 for HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids
Figure 2 for HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids
Figure 3 for HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids
Figure 4 for HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids
Viaarxiv icon

D4AM: A General Denoising Framework for Downstream Acoustic Models

Add code
Nov 28, 2023
Viaarxiv icon