Picture for Hsin-Min Wang

Hsin-Min Wang

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement

Add code
Sep 16, 2024
Figure 1 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 2 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 3 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 4 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Viaarxiv icon

A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models

Add code
Sep 16, 2024
Viaarxiv icon

Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages

Add code
Sep 13, 2024
Viaarxiv icon

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction

Add code
Sep 11, 2024
Figure 1 for The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Figure 2 for The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Figure 3 for The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Figure 4 for The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Viaarxiv icon

Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation

Add code
Sep 03, 2024
Viaarxiv icon

SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models

Add code
Jun 12, 2024
Viaarxiv icon

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Add code
May 07, 2024
Figure 1 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Figure 2 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Figure 3 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Figure 4 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Viaarxiv icon

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

Add code
Feb 10, 2024
Viaarxiv icon

HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids

Add code
Jan 02, 2024
Viaarxiv icon

LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models

Add code
Nov 28, 2023
Viaarxiv icon