Picture for Yingzhi Wang

Yingzhi Wang

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs

Add code
May 26, 2025
Viaarxiv icon

Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down

Add code
May 19, 2025
Viaarxiv icon

Open Universal Arabic ASR Leaderboard

Add code
Dec 18, 2024
Viaarxiv icon

What Are They Doing? Joint Audio-Speech Co-Reasoning

Add code
Sep 22, 2024
Figure 1 for What Are They Doing? Joint Audio-Speech Co-Reasoning
Figure 2 for What Are They Doing? Joint Audio-Speech Co-Reasoning
Figure 3 for What Are They Doing? Joint Audio-Speech Co-Reasoning
Viaarxiv icon

Open-Source Conversational AI with SpeechBrain 1.0

Add code
Jul 02, 2024
Figure 1 for Open-Source Conversational AI with SpeechBrain 1.0
Figure 2 for Open-Source Conversational AI with SpeechBrain 1.0
Viaarxiv icon

Speech Emotion Diarization: Which Emotion Appears When?

Add code
Jun 22, 2023
Viaarxiv icon

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

Add code
Nov 04, 2021
Figure 1 for A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Figure 2 for A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Figure 3 for A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Figure 4 for A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Viaarxiv icon