Picture for Zhaoheng Ni

Zhaoheng Ni

ICASSP 2026 URGENT Speech Enhancement Challenge

Add code
Jan 20, 2026
Viaarxiv icon

SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training

Add code
Jan 18, 2026
Viaarxiv icon

Interspeech 2025 URGENT Speech Enhancement Challenge

Add code
May 29, 2025
Viaarxiv icon

Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding

Add code
Dec 24, 2024
Figure 1 for Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
Figure 2 for Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
Figure 3 for Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
Viaarxiv icon

Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition

Add code
Sep 01, 2024
Figure 1 for Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
Figure 2 for Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
Figure 3 for Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
Figure 4 for Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
Viaarxiv icon

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching

Add code
Jul 04, 2024
Figure 1 for High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Figure 2 for High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Figure 3 for High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Figure 4 for High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Viaarxiv icon

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

Add code
Jun 07, 2024
Figure 1 for URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
Figure 2 for URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
Figure 3 for URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
Figure 4 for URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
Viaarxiv icon

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

Add code
Jan 18, 2024
Figure 1 for An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement
Figure 2 for An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement
Figure 3 for An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement
Figure 4 for An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement
Viaarxiv icon

On The Open Prompt Challenge In Conditional Audio Generation

Add code
Nov 01, 2023
Figure 1 for On The Open Prompt Challenge In Conditional Audio Generation
Figure 2 for On The Open Prompt Challenge In Conditional Audio Generation
Figure 3 for On The Open Prompt Challenge In Conditional Audio Generation
Figure 4 for On The Open Prompt Challenge In Conditional Audio Generation
Viaarxiv icon

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Add code
Oct 27, 2023
Figure 1 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 2 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 3 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 4 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Viaarxiv icon