Picture for Zili Huang

Zili Huang

A Large-Scale Evaluation of Speech Foundation Models

Add code
Apr 15, 2024
Figure 1 for A Large-Scale Evaluation of Speech Foundation Models
Figure 2 for A Large-Scale Evaluation of Speech Foundation Models
Figure 3 for A Large-Scale Evaluation of Speech Foundation Models
Figure 4 for A Large-Scale Evaluation of Speech Foundation Models
Viaarxiv icon

UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

Add code
Oct 25, 2023
Figure 1 for UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing
Figure 2 for UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing
Figure 3 for UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing
Figure 4 for UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing
Viaarxiv icon

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

Add code
Nov 10, 2022
Figure 1 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 2 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 3 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 4 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Viaarxiv icon

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Add code
Nov 01, 2022
Figure 1 for Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Figure 2 for Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Figure 3 for Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Figure 4 for Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Viaarxiv icon

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

Add code
Oct 16, 2022
Figure 1 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 2 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 3 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 4 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Viaarxiv icon

Investigating self-supervised learning for speech enhancement and separation

Add code
Mar 15, 2022
Figure 1 for Investigating self-supervised learning for speech enhancement and separation
Figure 2 for Investigating self-supervised learning for speech enhancement and separation
Figure 3 for Investigating self-supervised learning for speech enhancement and separation
Figure 4 for Investigating self-supervised learning for speech enhancement and separation
Viaarxiv icon

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

Add code
Mar 14, 2022
Figure 1 for SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Figure 2 for SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Figure 3 for SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Figure 4 for SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Viaarxiv icon

Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker

Add code
Aug 07, 2021
Figure 1 for Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
Figure 2 for Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
Figure 3 for Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
Figure 4 for Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
Viaarxiv icon

SUPERB: Speech processing Universal PERformance Benchmark

Add code
May 03, 2021
Figure 1 for SUPERB: Speech processing Universal PERformance Benchmark
Figure 2 for SUPERB: Speech processing Universal PERformance Benchmark
Viaarxiv icon

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

Add code
Feb 02, 2021
Figure 1 for The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
Figure 2 for The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
Figure 3 for The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
Figure 4 for The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
Viaarxiv icon