Picture for Yannan Wang

Yannan Wang

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Add code
Apr 24, 2026
Viaarxiv icon

Bridging What the Model Thinks and How It Speaks: Self-Aware Speech Language Models for Expressive Speech Generation

Add code
Apr 13, 2026
Viaarxiv icon

Controllable Accent Normalization via Discrete Diffusion

Add code
Mar 15, 2026
Viaarxiv icon

CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data

Add code
Feb 22, 2026
Viaarxiv icon

Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data

Add code
Jul 23, 2025
Viaarxiv icon

SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms

Add code
Jun 16, 2025
Viaarxiv icon

Multi-Level Speaker Representation for Target Speaker Extraction

Add code
Oct 21, 2024
Viaarxiv icon

Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization

Add code
Dec 07, 2023
Figure 1 for Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization
Figure 2 for Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization
Figure 3 for Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization
Figure 4 for Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization
Viaarxiv icon

The FlySpeech Audio-Visual Speaker Diarization System for MISP Challenge 2022

Add code
Jul 28, 2023
Figure 1 for The FlySpeech Audio-Visual Speaker Diarization System for MISP Challenge 2022
Figure 2 for The FlySpeech Audio-Visual Speaker Diarization System for MISP Challenge 2022
Figure 3 for The FlySpeech Audio-Visual Speaker Diarization System for MISP Challenge 2022
Viaarxiv icon

MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation

Add code
Jun 28, 2023
Figure 1 for MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation
Figure 2 for MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation
Figure 3 for MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation
Figure 4 for MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation
Viaarxiv icon