Picture for Hsin-Min Wang

Hsin-Min Wang

SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models

Add code
Jun 12, 2024
Viaarxiv icon

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Add code
May 07, 2024
Figure 1 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Figure 2 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Figure 3 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Figure 4 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Viaarxiv icon

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

Add code
Feb 10, 2024
Figure 1 for SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Figure 2 for SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Figure 3 for SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Figure 4 for SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Viaarxiv icon

HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids

Add code
Jan 02, 2024
Viaarxiv icon

D4AM: A General Denoising Framework for Downstream Acoustic Models

Add code
Nov 28, 2023
Figure 1 for D4AM: A General Denoising Framework for Downstream Acoustic Models
Figure 2 for D4AM: A General Denoising Framework for Downstream Acoustic Models
Figure 3 for D4AM: A General Denoising Framework for Downstream Acoustic Models
Figure 4 for D4AM: A General Denoising Framework for Downstream Acoustic Models
Viaarxiv icon

LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models

Add code
Nov 28, 2023
Figure 1 for LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models
Figure 2 for LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models
Figure 3 for LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models
Figure 4 for LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models
Viaarxiv icon

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

Add code
Nov 15, 2023
Figure 1 for Multi-objective Non-intrusive Hearing-aid Speech Assessment Model
Figure 2 for Multi-objective Non-intrusive Hearing-aid Speech Assessment Model
Figure 3 for Multi-objective Non-intrusive Hearing-aid Speech Assessment Model
Figure 4 for Multi-objective Non-intrusive Hearing-aid Speech Assessment Model
Viaarxiv icon

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection

Add code
Nov 05, 2023
Figure 1 for AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Figure 2 for AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Figure 3 for AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Figure 4 for AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Viaarxiv icon

AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection

Add code
Oct 19, 2023
Figure 1 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Figure 2 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Figure 3 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Figure 4 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Viaarxiv icon

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains

Add code
Oct 07, 2023
Figure 1 for The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains
Figure 2 for The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains
Figure 3 for The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains
Figure 4 for The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains
Viaarxiv icon