Picture for Zhehuai Chen

Zhehuai Chen

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Add code
Jul 03, 2025
Viaarxiv icon

Word Level Timestamp Generation for Automatic Speech Recognition and Translation

Add code
May 21, 2025
Figure 1 for Word Level Timestamp Generation for Automatic Speech Recognition and Translation
Figure 2 for Word Level Timestamp Generation for Automatic Speech Recognition and Translation
Figure 3 for Word Level Timestamp Generation for Automatic Speech Recognition and Translation
Viaarxiv icon

Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model

Add code
May 21, 2025
Figure 1 for Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Figure 2 for Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Figure 3 for Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Figure 4 for Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Viaarxiv icon

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

Add code
Jan 27, 2025
Viaarxiv icon

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

Add code
Jan 07, 2025
Figure 1 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Figure 2 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Figure 3 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Figure 4 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Viaarxiv icon

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Add code
Nov 08, 2024
Figure 1 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 2 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 3 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 4 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Viaarxiv icon

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts

Add code
Nov 08, 2024
Figure 1 for NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Figure 2 for NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Figure 3 for NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Figure 4 for NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Viaarxiv icon

Anticipating Future with Large Language Model for Simultaneous Machine Translation

Add code
Oct 29, 2024
Figure 1 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 2 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 3 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 4 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Viaarxiv icon

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning

Add code
Oct 23, 2024
Figure 1 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Figure 2 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Figure 3 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Figure 4 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Viaarxiv icon

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

Add code
Sep 30, 2024
Figure 1 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Figure 2 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Figure 3 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Figure 4 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Viaarxiv icon