Picture for Kangwook Jang

Kangwook Jang

HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization

Add code
Aug 17, 2025
Viaarxiv icon

ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction

Add code
Aug 10, 2025
Figure 1 for ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
Figure 2 for ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
Figure 3 for ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
Figure 4 for ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
Viaarxiv icon

Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis

Add code
Jan 12, 2025
Figure 1 for Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
Figure 2 for Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
Figure 3 for Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
Figure 4 for Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
Viaarxiv icon

Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition

Add code
Jul 04, 2024
Figure 1 for Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Figure 2 for Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Figure 3 for Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Figure 4 for Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Viaarxiv icon

One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection

Add code
Jun 24, 2024
Figure 1 for One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection
Figure 2 for One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection
Figure 3 for One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection
Figure 4 for One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection
Viaarxiv icon

STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models

Add code
Dec 14, 2023
Figure 1 for STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
Figure 2 for STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
Figure 3 for STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
Figure 4 for STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
Viaarxiv icon

Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation

Add code
May 19, 2023
Viaarxiv icon

FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

Add code
Jul 01, 2022
Figure 1 for FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Figure 2 for FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Figure 3 for FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Figure 4 for FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Viaarxiv icon