Picture for Yiwen Shao

Yiwen Shao

Spatial-Omni: Spatial Audio Understanding Integration in Multimodal LLMs via FOA Encoding

Add code
Jun 09, 2026
Viaarxiv icon

Escape the Language Prior: Mitigating Late-Stage Modality Collapse in Audio Reasoning via Modality-Aware Policy Optimization

Add code
May 26, 2026
Viaarxiv icon

Unlocking Strong Supervision: A Data-Centric Study of General-Purpose Audio Pre-Training Methods

Add code
Mar 26, 2026
Viaarxiv icon

SpatialEmb: Extract and Encode Spatial Information for 1-Stage Multi-channel Multi-speaker ASR on Arbitrary Microphone Arrays

Add code
Jan 25, 2026
Viaarxiv icon

Towards Comprehensive Semantic Speech Embeddings for Chinese Dialects

Add code
Jan 12, 2026
Viaarxiv icon

TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding

Add code
Jan 11, 2026
Viaarxiv icon

Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding

Add code
Nov 19, 2025
Viaarxiv icon

TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation

Add code
Nov 18, 2025
Viaarxiv icon

Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference

Add code
Aug 27, 2025
Figure 1 for Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
Figure 2 for Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
Figure 3 for Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
Figure 4 for Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
Viaarxiv icon

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models

Add code
Aug 12, 2025
Viaarxiv icon