Picture for Wenze Ren

Wenze Ren

TaigiSpeech: A Low-Resource Real-World Speech Intent Dataset and Preliminary Results with Scalable Data Mining In-the-Wild

Add code
Mar 23, 2026
Viaarxiv icon

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

Add code
Mar 19, 2026
Viaarxiv icon

MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment

Add code
Mar 11, 2026
Viaarxiv icon

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Add code
Jul 03, 2025
Figure 1 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 2 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 3 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 4 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Viaarxiv icon

ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality

Add code
May 21, 2025
Figure 1 for ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality
Figure 2 for ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality
Figure 3 for ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality
Figure 4 for ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality
Viaarxiv icon

CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset

Add code
Jan 14, 2025
Figure 1 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Figure 2 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Figure 3 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Figure 4 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Viaarxiv icon

MC-SEMamba: A Simple Multi-channel Extension of SEMamba

Add code
Sep 26, 2024
Figure 1 for MC-SEMamba: A Simple Multi-channel Extension of SEMamba
Figure 2 for MC-SEMamba: A Simple Multi-channel Extension of SEMamba
Figure 3 for MC-SEMamba: A Simple Multi-channel Extension of SEMamba
Figure 4 for MC-SEMamba: A Simple Multi-channel Extension of SEMamba
Viaarxiv icon

Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing

Add code
Sep 22, 2024
Figure 1 for Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing
Figure 2 for Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing
Figure 3 for Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing
Figure 4 for Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing
Viaarxiv icon

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement

Add code
Sep 16, 2024
Figure 1 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 2 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 3 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Figure 4 for Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Viaarxiv icon

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset

Add code
Sep 13, 2024
Figure 1 for DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Figure 2 for DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Figure 3 for DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Figure 4 for DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Viaarxiv icon