Picture for Hsin-Min Wang

Hsin-Min Wang

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

Add code
Feb 10, 2024
Viaarxiv icon

HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids

Add code
Jan 02, 2024
Figure 1 for HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids
Figure 2 for HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids
Figure 3 for HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids
Figure 4 for HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids
Viaarxiv icon

D4AM: A General Denoising Framework for Downstream Acoustic Models

Add code
Nov 28, 2023
Viaarxiv icon

LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models

Add code
Nov 28, 2023
Viaarxiv icon

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

Add code
Nov 15, 2023
Viaarxiv icon

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection

Add code
Nov 05, 2023
Figure 1 for AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Figure 2 for AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Figure 3 for AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Figure 4 for AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Viaarxiv icon

AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection

Add code
Oct 19, 2023
Figure 1 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Figure 2 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Figure 3 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Figure 4 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Viaarxiv icon

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains

Add code
Oct 07, 2023
Figure 1 for The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains
Figure 2 for The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains
Figure 3 for The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains
Figure 4 for The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains
Viaarxiv icon

A Study on Incorporating Whisper for Robust Speech Assessment

Add code
Sep 22, 2023
Figure 1 for A Study on Incorporating Whisper for Robust Speech Assessment
Figure 2 for A Study on Incorporating Whisper for Robust Speech Assessment
Figure 3 for A Study on Incorporating Whisper for Robust Speech Assessment
Figure 4 for A Study on Incorporating Whisper for Robust Speech Assessment
Viaarxiv icon

Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement

Add code
Sep 20, 2023
Viaarxiv icon