Picture for Haibin Wu

Haibin Wu

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Add code
Mar 03, 2025
Figure 1 for Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Figure 2 for Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Figure 3 for Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Figure 4 for Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Viaarxiv icon

CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset

Add code
Jan 14, 2025
Figure 1 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Figure 2 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Figure 3 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Figure 4 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Viaarxiv icon

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

Add code
Dec 23, 2024
Figure 1 for VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
Figure 2 for VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
Figure 3 for VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
Figure 4 for VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
Viaarxiv icon

TS3-Codec: Transformer-Based Simple Streaming Single Codec

Add code
Nov 27, 2024
Figure 1 for TS3-Codec: Transformer-Based Simple Streaming Single Codec
Figure 2 for TS3-Codec: Transformer-Based Simple Streaming Single Codec
Figure 3 for TS3-Codec: Transformer-Based Simple Streaming Single Codec
Figure 4 for TS3-Codec: Transformer-Based Simple Streaming Single Codec
Viaarxiv icon

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Add code
Nov 08, 2024
Figure 1 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 2 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 3 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 4 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Viaarxiv icon

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

Add code
Sep 24, 2024
Viaarxiv icon

Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models

Add code
Sep 21, 2024
Figure 1 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 2 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 3 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Figure 4 for Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Viaarxiv icon

Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance

Add code
Sep 16, 2024
Viaarxiv icon

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement

Add code
Sep 16, 2024
Figure 1 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 2 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 3 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 4 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Viaarxiv icon

Ultra-Low Latency Speech Enhancement - A Comprehensive Study

Add code
Sep 16, 2024
Figure 1 for Ultra-Low Latency Speech Enhancement - A Comprehensive Study
Figure 2 for Ultra-Low Latency Speech Enhancement - A Comprehensive Study
Figure 3 for Ultra-Low Latency Speech Enhancement - A Comprehensive Study
Viaarxiv icon