Picture for Zhehuai Chen

Zhehuai Chen

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

Add code
Mar 19, 2026
Viaarxiv icon

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception

Add code
Jan 14, 2026
Viaarxiv icon

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Add code
Jul 03, 2025
Figure 1 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 2 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 3 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 4 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Viaarxiv icon

Word Level Timestamp Generation for Automatic Speech Recognition and Translation

Add code
May 21, 2025
Figure 1 for Word Level Timestamp Generation for Automatic Speech Recognition and Translation
Figure 2 for Word Level Timestamp Generation for Automatic Speech Recognition and Translation
Figure 3 for Word Level Timestamp Generation for Automatic Speech Recognition and Translation
Viaarxiv icon

Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model

Add code
May 21, 2025
Figure 1 for Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Figure 2 for Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Figure 3 for Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Figure 4 for Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Viaarxiv icon

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

Add code
Jan 27, 2025
Viaarxiv icon

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

Add code
Jan 07, 2025
Figure 1 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Figure 2 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Figure 3 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Figure 4 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Viaarxiv icon

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Add code
Nov 08, 2024
Figure 1 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 2 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 3 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 4 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Viaarxiv icon

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts

Add code
Nov 08, 2024
Figure 1 for NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Figure 2 for NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Figure 3 for NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Figure 4 for NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Viaarxiv icon

Anticipating Future with Large Language Model for Simultaneous Machine Translation

Add code
Oct 29, 2024
Figure 1 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 2 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 3 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 4 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Viaarxiv icon